Dysregulation of Alternative Splicing was Associated with the Pathogenesis of Ulcerative Colitis

Background: Although numerous risk loci of Ulcerative Colitis (UC) in the human genome have been identied, the pathogenesis of UC remains not fully understood. Recently, multiple transcriptomic analyses have shown that aberrant gene expression in UC patients’ colon tissues were associated with the disease progressing. A pioneer study also demonstrated that altered post-transcriptional regulation may be involved in the progression of UC. Methods: Herein, we provided a comprehensive analysis of alternative splicing (AS) signature on UC patients. We analyzed three datasets, which include 74 tissue samples from UC patients in total, and identied over 2,000 signicant AS events in these datasets. Results: Skipped Exon (SE) and Alternative rst exon (AFE) were found to be the top two signicantly altered AS events. Immune response related pathways were signicantly enriched. Genes that showed signicantly AS events were more likely to be dysregulated at the expression level. Conclusion: Overall, these results suggested that alteration of AS may play crucial roles in the pathogenesis of UC. We presented a landscape of AS events in UC based on a combination analysis of two cohorts. Our results indicated that the dysregulation of AS may play roles in determining the pathogenesis of UC.


Introduction
Ulcerative Colitis (UC), a subtype of in ammatory bowel disease (IBD), has become a global emergence disease [21]. The pathogenesis of UC is complicated, including interactions among microbial shifts, the host's genetic background, and the environmental cues, resulting in activating mucosal immune system in a chronic way [15,22]. Although multiple genome-wide association studies have identi ed hundreds of associated genetic risk loci in patients with UC [8], the pathogenesis of UC remains not fully understood.
Recently, several transcriptomics-focused studies have been carried out to determine the alteration of gene expression in patients with different subtypes of IBD. Planell et al. performed transcriptomic analysis of colonic biopsies from patients with UC and the healthy control using microarrays. They found that several genes and pathways were permanently dysregulated despite of the histological recovery of the patients [25]. Another microarray-based study demonstrated that expression of Interferon-γ and interleukin-17 were comparably elevated in both in amed and unaffected colon mucosa from patients, indicating that in ammation response was not limited to the endoscopic lesions in patients with IBD [33]. Smith et al. identi ed dysregulation of BRINP3 may play roles in the pathogenesis of UC [28]. A more recent study that focused on analyzing colonic mucosal transcriptome of patients with long-duration UC also identi ed that genes and pathways were dysregulated in long-duration UC patients compared to short-duration UC patients [20]. Overall, all these studies suggested that transcriptional regulation may play crucial roles in determining the etiology of UC.
Alternative splicing (AS) of the precursor mRNA works as one of the essential mechanisms for increasing the protein diversity as well as regulating the intricate protein-RNA interaction network [32,6]. Almost 95% of the total human genes with multi-exons, were discovered to involve AS events [4]. More and more evidence demonstrated that AS plays crucial roles in many biological events, such as oncogenic processes including cell proliferation, cell apoptosis, hypoxia, immune escape as well as metastasis [7] [24]. AS were also found to play essential roles in basic developmental process and tissue identity [2].
Recently, a pioneer study, which focused on pro ling the AS events in IBDs, showed 47 splicing factors and 33 intron retention events were dysregulated in the mucosal tissue of patients with IBD patients [13].
Another array-based pioneer study also found more than 392 genes were differentially expressed in longduration UC patients, which were associated with a dysregulated AS network [20]. However, the former study lacks a comprehensive analysis due to lacking next generation sequencing approach, the latter study drew special attention to the comparison between long-duration UC and short-duration UC.
In order to provide a landscape of how AS is involved in the pathogenesis of UC, we rst obtained a public mRNA-seq data from NCBI GEO dataset (GSE137344), which included 44 mRNA expression data from UC patients and 37 mRNA expression data from the controls. Using Miso and related AS event analysis software [16], we identi ed 8 types and 2,385 signi cant AS events in UC patients compared to the control. 110 biological pathways were signi cantly enriched for these AS events in the UC patients, of which some were highly involved in chronic in ammation/immune response pathways. We next performed another mRNA-seq experiment on colon tissues from our own cohort of UC patients and the control (4 UC vs 4 control). 57% genes that were identi ed to involve signi cant AS events in our experiment also had signi cant AS events in the GSE137344 study. Immune response related pathways were found to be the most signi cantly enriched terms in the validation experiment dataset as well. To further demonstrate the potential role of AS in the UC progression, we also performed a validation experiment on comparing the AS of the two clusters of UC patients with different disease progressions.
Overall, we provided a comprehensive analysis of AS events on UC patients combining multiple datasets. We believed our results could shed a light on understanding the mechanism of post-transcriptional regulation in the pathogenesis of UC.
Compared to the normal samples, we identi ed 2,358 signi cant AS events in eight different AS types in total (Table S1). Among these types, SE type has the most signi cant events, but the ratio over the total number of SE events is rather low ( Figure 1B). In fact, 8.1% of tandem 3'UTR events and 7.2% of ALE events are recognized as signi cant AS events, over numbering others. The distribution and intersection of gene symbols between 8 events types for related AS events are shown in Figure.1C. Since most of ALE and AFE AS events were identi ed across multiple genes, ALE events and AFE events were mapped to over 5,000 and 3,000 genes respectively, and hence having the most intersections.
Pathway Analysis of the GSE137344 public dataset The AS events related to UC contribute to the enrichment of 110 different biological pathways (Table S2). Most of pathways was associated with only one splicing type. Only antigen processing and presentation and ribosome are affected by three splicing type ( Figure 2A). "Systemic lupus erythematosus (SLE)" was the most signi cantly enriched term with a p-value less than 5 * 10-20 . Interestingly, a recent study showed that patients with SLE had a greater prevalence of IBD than matched controls [27]. The second most enriched term was "Alcoholism", with a p-value less than 3 * 10 -6 . We then combined related AS events pathway analysis results with RNA-seq expression results. And we found two in ammation related genes containing UC related AS events in these pathways. HDAC6 and LIPA were observed with an ALE event and a tandem 3'UTR event, respectively. According to the previous study, HDAC6 was found to be involved in alcoholism pathway which was associated with chronic in ammation [17]. LIPA was previously associated with steroid biosynthesis which was also considered to be involved in modulating in ammation [14]. Two genes in different sample groups showed signi cant differential expression on RNA level ( Figure 2B, 2C).
Splicing types in the 8-sample mRNA-seq experiment.
To validate AS events that indeed exists comprehensively in the UC, we performed another RNA-seq experiment on four UC patients and four normal samples from Shengjing Hospital of China Medical University. Same strategies were applied in the data analysis of validation RNA-seq experiments. Eventually, A set of 2,352 signi cant AS events were discovered in our dataset (Table S3). Interestingly, SE and AFE were still found to be the top two AS types, while MXE was the least one ( Figure 3A). But the ratio over the total number of all events type were relatively different comparing to the result from public dataset ( Figure 3B). AFE and ALE were identi ed across multiple genes, which exhibited the most intersections ( Figure 3C). Interestingly, we also noticed that 57% genes that were identi ed to exhibit signi cant AS events in our dataset also showed signi cant AS events in the public dataset ( Figure S1), though they are from different tissues. This result indicated that the AS regulation could be more related to the disease progression than the tissues.  Figure 4A). We also performed a PCA analysis on the AS events in our cohort in order to characterize the AS events between the disease and normal samples. We summarized 1,731 related AS events which occurred across all 8 samples with Percent Spliced In (PSI) values generated by MISO software. PC1-PC3 accounted for 60% of the variance ( Figure 4B). The biological difference between UC disease patients and normal samples is captured by the rst, the second and third principal component (PC). These results indicated that AS patterns and the expression pro les could both demonstrate the biological differences between the samples from the patients and the normal.
We next examined whether the expression level of the genes that showed signi cant AS events also had signi cant changes of expression. Venn diagram showed that 140 of 667 downregulated genes and 233 of 846 upregulated genes also had AS events in UC patients, indicating some of the gene expression changes may be due to the dysregulation of AS ( Figure 4C).

Biological Process Pathway analysis
We performed biological process pathway analysis on our 8-sample experiment based on the gene list of 2,352 signi cant AS events. Since only less than 200 related genes had MXE, RI and tandem 3' UTR event types, we failed to identify any signi cantly enriched biological process among these genes. Among the other ve AS type events, multiple pathways were highly enriched in terms of biological process ( Figure  5). Among those pathways, "immune response" was enriched most signi cantly. Besides, LIPA, which was identi ed to be differentially expressed as well as had signi cant AS events in the public dataset, was also identi ed to involve a tandem 3'UTR event and showed the signi cant differential expression on RNA level in the validation dataset (Table S4) ( Figure S3). These results suggested that the dysregulated AS, which was similar to the expression level itself, was strongly associated with altered immune response in UC patients.
We next performed gene ontologies (GO) analysis on the 8-sample dataset based on the enrichment of 201 unique AS events that were only discovered in the normal group or in the UC group. The top 10 enriched GO biological process terms ( Figure 6A, Table S5) re ect the immune system response and cell chemotaxis in the UC patients. GNLY is one of the genes that has unique AS events in term GO:0061844. The product of this gene is a member of the saposin-like protein (SAPLIP) family. It is an antimicrobial protein that present in the granules of human cytotoxic T lymphocytes, as well as in the natural killer (NK) cells, that can also activate antigen-presenting cells through TLR4 [29]. In Figure 6B, we presented different AS events of this gene between the normal and the UC patients using the read coverage track gure ( Figure 6B). Two 3' intron retention events were identi ed only in the normal tissues. As the recent studies showed intron retention may affect the transcription e ciency [23], we speculated that this unique AS event in the control may limit the expression of this gene, while the UC patients may abandon this AS event to increase the transcription. Finally, we also compared the AS events of two clusters of UC patients with different disease progression status from the GSE130038 study. We identi ed 111 signi cant AS events between the two clusters. Pathway enrichment analysis also identi ed certain GO terms that are related to the progression of the UC ( Figure 6C). These results suggested that AS may play vital roles in the UC pathogenesis, such as acting as an indicator of the disease progression.

Discussion
Dysregulation of AS is strongly associated with the pathogenesis of many human diseases. Emerging evidence have showed that targeting the cellular mRNA splicing may lead to development of novel therapeutics [34]. In fact, multiple studies have already demonstrated the potential roles of dysregulated AS in many human diseases, including cancer, immune and infectious diseases, and neurodegenerative disorders [31,26,18]. Indeed, two pioneer studies [20] [13] have already identi ed dysregulation of AS may be associated with the pathogenesis of IBD. However, neither these two studies did the enrichment analysis for these AS related genes, nor these studies focused on the comparison between disease samples and control samples using a non-array-based method. In fact, one of these two studies only paid attention to the intron retention events, while our study demonstrated all eight major types of AS events.
In addition, we also analyzed two different datasets and found many AS related genes were overlapped between two cohorts, indicating that dysregulation of AS was strongly associated with the pathogenesis of UC comprehensively.
In our study, we found SE and AFE are the top two most signi cantly enriched AS events in both datasets. This indicated that the UC patients' transcriptome may be much more complex and chaotic compared to the healthy controls. Both SE and AFE will result in aberrant mRNA isoform, which will possibly activate a feedback response to the transcription, for instance, degradation [3]. In fact, many genes which showed signi cant AS events, also were changed signi cantly in terms of expression level. Although we cannot conclude whether the expression change is due to the dysregulated AS events, we still could conclude a signi cant association. Among the 23,519 genes detected in our mRNA-seq dataset, 1,513 of them were signi cantly dysregulated (padj < 0.05); In contrast, among the 4,514 genes with AS events, 373 of them were signi cantly dysregulated at expression level. These results suggested a signi cantly association that genes who showed AS events were more likely to have altered gene expression (Fisher's exact test pvalue = 0.0001) as well.
Among the AS related signi cantly enriched pathways in our validation dataset, immune response and related pathways owned the largest network and most signi cantly enriched terms. We were particularly interested in two genes HDAC6 and LIPA, as both genes showed signi cant AS events (tandem 3'UTR and ALE) and were also involved in in ammation and immune response pathways. HDAC6 was one of the histone deacetylase family members. Studies have supported that targeting HDAC6 as an antiin ammatory strategy for treating colon in ammation, which possibly progresses to IBD [9,19]. LIPA encodes the lysosomal acid lipase, which functions in the lysosome for catalyzing the hydrolysis of triglycerides and cholesteryl esters. A recent study demonstrated lysosomal cholesterol hydrolysis works as a critical process for preventing metabolic in ammation. LIPA works as the key regulator, as inhibition of LIPA causes defective clearance of apoptotic lymphocytes [30]. In the analysis of two datasets, both expression level and AS pattern of LIPA were signi cantly dysregulated. Overall, our results suggested that targeting the dysregulated AS of HDAC6 and LIPA may be a good opportunity for developing novel therapies in the future.
Although we presented a comprehensive AS analysis on ileum and colon tissues of UC patients, our studies had certain limitations. First, we did not include a validation of AS events experimentally though some of the AS events were also identi ed in our validation experiments. Secondly, whether these AS events affect protein diversity will also need further experimental validation. Finally, a RIP-seq of CLIP-seq study should be performed to detecting the potential responsible splicing factors, for better illustrating the underlying mechanism that how AS events were altered in UC patients.

Conclusion
Nonetheless, our data suggested a strong association between dysregulation of AS and the pathogenesis of UC. Future drug screening study could focus on these AS related genes and AS related splicing factors.

Sample Information and Collection
A total of 4 patients with UC from Shengjing Hospital of China Medical University were collected between June 2020 and September 2020, and served as the experimental group. All eligible patients had an established diagnosis of UC according to endoscopic and histologic assessments. Colonic biopsy specimens were taken from the rectum, ulcer margin of sigmoid colon and in amed portions. 4 patients with normal distal colon con rmed by surgical pathology served as the control group.
The study was approved by the institutional review board of Shengjing Hospital of China Medical University, and informed consent was obtained from each patient.

Public Dataset Preparation
We obtained RNA-seq data from NCBI (GSE137344). This dataset included 44 mRNA expression data of of ileum tissues from Ulcerative Colitis patients and 37 mRNA expression data from the normal (Here we chose this dataset is because that though UC does not typically involve other areas of the gastrointestinal tract, further extension of the in ammatory process into the terminal ileum is common [1,12]). Fastq les were aligned on human Hg19 genome by STAR-2.7.1a. The indexed .bam le were generated by Samtools (1.10). We analyzed the GSE130038 [10], which presented the RNA-seq data on colon tissues, using the same pipeline.

Identi cation of Important AS Events
For GSE137344 RNA dataset, 8 types and 60,690 AS events were identi ed by MISO (0.5.4) [16] for 81 samples in total, using an in-house MISO pipeline. The level of alternative splicing events was de ned as Percentage Spliced In (Psi). To get differentially expressed alternative splicing events between UC disease patients and normal samples, we applied Wilcoxon test on all alternative splicing events with at least 3 patients and 1 normal sample. Signi cant events were de ned as ones with p-values < 0.05. Among them, 2,385 AS events have wilcox test p-values less than 0.05 between the 44 UC samples and 37 normal samples and thus were de ned as related/signi cant AS events. We removed AS events which occurred in less than 3 patients and 1 normal sample.
Same method was applied in validation experiment part. In our validation experiment RNA-seq dataset, 8 types and 94,815 AS events were identi ed for 8 samples in total. 2,352 AS events are signi cant with wilcox test p-values less than 0.05 between the 4 UC samples and 4 normal samples. AS events which occurred in less than 2 patients and 1 normal sample were also removed.

Pathway Analysis
Pathway analysis for the GSE137344 data was performed using Enrichr with Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway library. Only the pathways with p-value less than 0.05 were considered related. In validation experiments pathway analysis, WebGestalt (2019) and SUMER were applied. We selected Over-Representation Analysis (ORA) as enrichment method and geneontology biological process as functional database in WebGestalt analysis. The enriched category with the gene sizes less than 5 and false discovery rate (FDR) above 0.05 was removed in WebGestalt results.
The result of SUMER was input into Cytoscape (3.8.0) to modify color and text size of the network.

Gene Ontology Analysis
Gene ontology analysis in validation experiment was performed using Enrichr and the bar plot of top 10 enriched terms was created by Enrichr Appyter.
RNA-seq analysis pipeline RNA-seq library was prepared using the NEBNext Ultra RNA with Poly-A selection kit and was sequenced on an Illumina Hi-Seq 4000 (Genergy, Shanghai). Kallisto [5] software was used to quantify RNA-seq counts. Differential gene expression was determined with log2foldchange >1.5 and P < 0.05 genes with >1 count per million. Any genewith a P-value greater than False Discovery Rate (FDR), after Benjamini-Hochberg correction for multi-testing, was deemed signi cantly differentially expressed under the test conditions as compared to the controls.

Visualization
We used Plotly to generate the Sankey diagram. UpSet plots were generated by ComplexHeatmap [11] in RStudio (1.2.5042). In order to avoid random error, we randomly selected and merged RNA-seq alignment data of three Ulcerative Colitis disease samples and three normal samples respectively from GSE137344 RNA data to generate genome Sashimi plots by IGV (version 2.8.3). Other graphs were plotted in RStudio (1.2.5042). Venn diagram was plotted using an online software from http://bioinformatics.psb.ugent.be/webtools/Venn/.