Types of Alternative Splicing in the GSE137344 public dataset
Eight types of alternative splicing (AS) events were discovered, which include alternative 3’splice site (A3SS), alternative 5’splice site (A5SS), alternative first exon (AFE), alternative last exon (ALE), mutually exclusive exons (MXE), retrained introns (RI), skipped exon (SE), and tandem 3’ UTR (Figure 1A). Compared to the normal samples, we identified 2,358 significant AS events in eight different AS types in total (Table S1). Among these types, SE type has the most significant events, but the ratio over the total number of SE events is rather low (Figure 1B). In fact, 8.1% of tandem 3’UTR events and 7.2% of ALE events are recognized as significant AS events, over numbering others. The distribution and intersection of gene symbols between 8 events types for related AS events are shown in Figure.1C. Since most of ALE and AFE AS events were identified across multiple genes, ALE events and AFE events were mapped to over 5,000 and 3,000 genes respectively, and hence having the most intersections.
Pathway Analysis of the GSE137344 public dataset
The AS events related to UC contribute to the enrichment of 110 different biological pathways (Table S2). Most of pathways was associated with only one splicing type. Only antigen processing and presentation and ribosome are affected by three splicing type (Figure 2A). “Systemic lupus erythematosus (SLE)” was the most significantly enriched term with a p-value less than 5 * 10-20. Interestingly, a recent study showed that patients with SLE had a greater prevalence of IBD than matched controls [27]. The second most enriched term was “Alcoholism”, with a p-value less than 3 * 10-6. We then combined related AS events pathway analysis results with RNA-seq expression results. And we found two inflammation related genes containing UC related AS events in these pathways. HDAC6 and LIPA were observed with an ALE event and a tandem 3’UTR event, respectively. According to the previous study, HDAC6 was found to be involved in alcoholism pathway which was associated with chronic inflammation [17]. LIPA was previously associated with steroid biosynthesis which was also considered to be involved in modulating inflammation [14]. Two genes in different sample groups showed significant differential expression on RNA level (Figure 2B, 2C).
Splicing types in the 8-sample mRNA-seq experiment.
To validate AS events that indeed exists comprehensively in the UC, we performed another RNA-seq experiment on four UC patients and four normal samples from Shengjing Hospital of China Medical University. Same strategies were applied in the data analysis of validation RNA-seq experiments. Eventually, A set of 2,352 significant AS events were discovered in our dataset (Table S3). Interestingly, SE and AFE were still found to be the top two AS types, while MXE was the least one (Figure 3A). But the ratio over the total number of all events type were relatively different comparing to the result from public dataset (Figure 3B). AFE and ALE were identified across multiple genes, which exhibited the most intersections (Figure 3C). Interestingly, we also noticed that 57% genes that were identified to exhibit significant AS events in our dataset also showed significant AS events in the public dataset (Figure S1), though they are from different tissues. This result indicated that the AS regulation could be more related to the disease progression than the tissues.
Combined analysis of expression and splicing in the 8-sample experiment.
mRNA-seq experiments on those 8 samples identified over 1,500 differentially expressed genes (p-adj < 0.05, log2Foldchange > 0.5) in UC patients compared to the control (Figure S2). Principal component analysis (PCA) based on the mRNA expression was performed using the top 2000 gene expression data (Figure 4A). We also performed a PCA analysis on the AS events in our cohort in order to characterize the AS events between the disease and normal samples. We summarized 1,731 related AS events which occurred across all 8 samples with Percent Spliced In (PSI) values generated by MISO software. PC1-PC3 accounted for 60% of the variance (Figure 4B). The biological difference between UC disease patients and normal samples is captured by the first, the second and third principal component (PC). These results indicated that AS patterns and the expression profiles could both demonstrate the biological differences between the samples from the patients and the normal.
We next examined whether the expression level of the genes that showed significant AS events also had significant changes of expression. Venn diagram showed that 140 of 667 downregulated genes and 233 of 846 upregulated genes also had AS events in UC patients, indicating some of the gene expression changes may be due to the dysregulation of AS (Figure 4C).
Biological Process Pathway analysis
We performed biological process pathway analysis on our 8-sample experiment based on the gene list of 2,352 significant AS events. Since only less than 200 related genes had MXE, RI and tandem 3’ UTR event types, we failed to identify any significantly enriched biological process among these genes. Among the other five AS type events, multiple pathways were highly enriched in terms of biological process (Figure 5). Among those pathways, “immune response” was enriched most significantly. Besides, LIPA, which was identified to be differentially expressed as well as had significant AS events in the public dataset, was also identified to involve a tandem 3’UTR event and showed the significant differential expression on RNA level in the validation dataset (Table S4) (Figure S3). These results suggested that the dysregulated AS, which was similar to the expression level itself, was strongly associated with altered immune response in UC patients.
We next performed gene ontologies (GO) analysis on the 8-sample dataset based on the enrichment of 201 unique AS events that were only discovered in the normal group or in the UC group. The top 10 enriched GO biological process terms (Figure 6A, Table S5) reflect the immune system response and cell chemotaxis in the UC patients. GNLY is one of the genes that has unique AS events in term GO:0061844. The product of this gene is a member of the saposin-like protein (SAPLIP) family. It is an antimicrobial protein that present in the granules of human cytotoxic T lymphocytes, as well as in the natural killer (NK) cells, that can also activate antigen-presenting cells through TLR4 [29]. In Figure 6B, we presented different AS events of this gene between the normal and the UC patients using the read coverage track figure (Figure 6B). Two 3’ intron retention events were identified only in the normal tissues. As the recent studies showed intron retention may affect the transcription efficiency [23], we speculated that this unique AS event in the control may limit the expression of this gene, while the UC patients may abandon this AS event to increase the transcription. Finally, we also compared the AS events of two clusters of UC patients with different disease progression status from the GSE130038 study. We identified 111 significant AS events between the two clusters. Pathway enrichment analysis also identified certain GO terms that are related to the progression of the UC (Figure 6C). These results suggested that AS may play vital roles in the UC pathogenesis, such as acting as an indicator of the disease progression.