Overall mapping rate after genome alignments
Reads from each clinical case were mapped to human genome sequences. The results are shown in Table 1, where the total alignment number of the sequenced fragments ranged from 15 to 39 million, and the comparison rate ranged between 83%-92%.
Table 1
Summary of TopHat2 genomic alignments
Number of Reads
|
Total reads
|
Mapped reads
|
Overall reads
|
Left
|
Right
|
Left
|
Right
|
Mapping_rate
|
Aligned_pairs
|
B782N
|
51906891
|
51906891
|
45391832
|
42285862
|
84.50%
|
39419636
|
B782T
|
20187921
|
20187921
|
18022068
|
16864424
|
86.40%
|
15851609
|
B783N
|
22772025
|
22772025
|
20876811
|
20292461
|
90.40%
|
19220668
|
B783T
|
22026613
|
22026613
|
20199656
|
19665894
|
90.50%
|
18669500
|
B785N
|
23881681
|
23881681
|
21879991
|
20946206
|
89.70%
|
19806614
|
B785T
|
27545958
|
27545958
|
25310820
|
24617731
|
90.60%
|
23314980
|
B786N
|
19526756
|
19526756
|
18043943
|
17643909
|
91.40%
|
16752687
|
B786T
|
20424033
|
20424033
|
18841925
|
18319670
|
91.00%
|
17453969
|
B788N
|
22166170
|
22166170
|
20319140
|
19346438
|
89.50%
|
18276699
|
B788T
|
19329401
|
19329401
|
17949865
|
17468161
|
91.60%
|
16651551
|
B791N
|
25487339
|
25487339
|
22803788
|
22091751
|
88.10%
|
20629163
|
B791T
|
19730469
|
19730469
|
17587161
|
16688030
|
86.90%
|
15713877
|
B794N
|
21773037
|
21773037
|
20089962
|
18134197
|
87.80%
|
17197455
|
B794T
|
27630784
|
27630784
|
24796070
|
22364260
|
85.30%
|
20946236
|
B797N
|
24393756
|
24393756
|
22517747
|
21882580
|
91.00%
|
20713525
|
B797T
|
20082035
|
20082035
|
18745753
|
18269290
|
92.20%
|
17451339
|
B798N
|
32153939
|
32153939
|
29786198
|
26930352
|
88.20%
|
25572896
|
B798T
|
32468547
|
32468547
|
30246172
|
27414170
|
88.80%
|
26154479
|
B799N
|
25194910
|
25194910
|
23448172
|
21151887
|
88.50%
|
20095525
|
B799T
|
20897751
|
20897751
|
19481307
|
17526890
|
88.50%
|
16701290
|
B800N
|
26328835
|
26328835
|
24474467
|
22048959
|
88.40%
|
20927261
|
B800T
|
24186510
|
24186510
|
22481444
|
20362666
|
88.60%
|
19317914
|
B801N
|
22799054
|
22799054
|
21390618
|
19363520
|
89.40%
|
18518368
|
B801T
|
26946581
|
26946581
|
25259555
|
22713737
|
89.00%
|
21717932
|
B804N
|
27121722
|
27121722
|
25391238
|
22801422
|
88.80%
|
21774130
|
B804T
|
24985226
|
24985226
|
23383899
|
21149641
|
89.10%
|
20212413
|
C199N
|
29776640
|
29776640
|
26261213
|
25254804
|
86.50%
|
23366546
|
C199T
|
35651673
|
35651673
|
30502787
|
29013594
|
83.50%
|
26708316
|
C200N
|
21494047
|
21494047
|
19784129
|
19242503
|
90.80%
|
18270154
|
C200T
|
19588861
|
19588861
|
17953109
|
17510470
|
90.50%
|
16566068
|
B782N
|
51906891
|
51906891
|
45391832
|
42285862
|
84.50%
|
39419636
|
B782T
|
20187921
|
20187921
|
18022068
|
16864424
|
86.40%
|
15851609
|
SHEE
|
55085029
|
55085029
|
47868214
|
44812830
|
84.1%
|
41691361
|
SHEEC
|
54582526
|
54582526
|
47769458
|
44796267
|
84.8%
|
42325127
|
Changes in AS patterns between ESCC tissue, matched normal tissue and cell lines
From the results of splicing analysis, a total of 45,439 AS events were detected using MISO. Of these, 6019 AS events were differently spliced with significance in paired ESCC tissues and matched normal tissues, with a cut-off of BF > 5, |ΔΨ|> 0.2, and number of inclusion and exclusion reads > 10, accounting for 13.25% of the total number of AS events (Fig. 1A). After filtering out the AS events with inconsistent ΔΨ trends in the paired samples, there were 5150 differential splicing events, including 2324 exon skipping (SE) events, 1014 intron retention (RI) events, and 693 mutually exclusive exon (MXE) events, 592 alternative 3’ splice site (A3SS) events, and 527 alternative 5’ splice site (A5SS) events. The AS results of ESCC tissue samples showed that the proportion of SE events that could be detected was the largest, being about 33.6%, RI events comprised 6.9%, MXE events comprised 6.8%, A3SS comprised 8.5%, and A5SS events comprised 6.4%. The percentage of SE events in differential splicing events was still the largest, comprising 4.8%, RI events comprising 2.5%, MXE events comprising 1.5%, A3SS events comprising 1.2%, and A5SS events comprising approximately 1.1%. Individual differences were detected during MISO analysis. These individualized differences were more obvious in differential splicing events (Fig. 1B). From the volcano plot, it could be seen that the ΔΨ values and log2(BF) values in detectable AS events and the differentially variably spliced events were symmetrically distributed (Fig. 1C). Also, it could be seen from the bar plot that the ΔΨ trend did not have a significant difference in each type of AS event (Fig. 1D).
A total of 32,891 AS events were detected, of which a total of 928 AS events were differently spliced with significance in the SHEE/SHEEC cell lines, accounting for approximately 2.8% of the total detectable AS events (Fig. 1E). The threshold for judging the differences in AS events in the SHEE/SHEEC cell lines was the same as used for ESCC AS event filtration. In detectable AS events, the proportion of SE was the largest, being about 55.9%, RI comprising 9.2%, MXE comprising 10.9%, A3SS accounting for 13.6%, and A5SS accounting for 10.3%. In differential splicing events, the number of SE, RI, MXE, A3SS, and A5SS events were 455, 131, 152, 96, and 94, accounting for 49%, 14.1%, 16.4%, 10.3%, and 10.1% of the total number of differential splicing events separately (Fig. 1F).
Re-annotation of AS events in paired ESCC tissues
To identify the genes and transcripts representing the differential AS events and filter out incorrect AS events, as described in the methods, we re-annotated all differentially splicing events found in at least 3 paired samples. For SE events, there were a total of 222 and 151 significant alternatively spliced events before and after re-annotation, reducing the number of AS events to approximately 68% of all events. For RI events, there were a total of 134 significant AS events before re-annotation, 115 after re-annotation, and a reduction of 14.2% of the total number of all events. For MXE events, there were a total of 62 significant alternative splicing events before re-annotation and 6 after re-annotation, reducing the number of AS events to 9.7% of the total. For the A3SS events, there were 57 significant AS events before re-annotation and 42 after re-annotation, reducing the number of AS events to approximately 73.7% of all events. For the A5SS events, there were a total of 52 significant AS events before re-annotation and 35 after re-annotation, reducing the number of AS events to approximately 67.3% of all events (Fig. 2A). The MXE events were reduced the most because the two exons that should have been mutually exclusive in the MISO official MXE event annotation were observed to exist in the same transcript. A total of 349 AS events were identified after re-annotation in 15 pairs of ESCC tissue samples, of which transcripts and genes representing 169 AS events could be found in the Ensembl and RefSeq database gene annotations, whereas 114 could only be identified having representing transcripts and genes in the RefSeq database gene annotations, and 66 AS events could only be identified having representing transcripts and genes in the Ensembl database gene annotations (Fig. 2B). We have also calculated the fractions of AS events representing mRNA and ncRNA (Fig. 2C). The top 10 splicing events with the most paired sample counts are shown in the following table (Table 2). Splicing modes of CALD1 (Fig. 3B), KIAA1217 (Fig. 3C), TPM1 (Fig. 3D), HNRNPC (Fig. 3A), VCL (Fig. 3E), and ZNF207 (Fig. 3F) were visualized with Sashimi plotting.
Table 2
Summary of alternative splicing events with top 10 sample count (paired ) in ESCC tissues
Event
|
Gene_Name
|
Count
(paired)
|
Sample
|
Type
|
chr1:150483352:150483674:+ @chr1:150483933:150484307: +@chr1:150484828:150485048:+
|
ECM1
|
14
|
B783, B798, C199, B788, B782, B799, B797, B791, B800, B785, B786, B804, C200, B801
|
SE
|
chr10:24833910–24834032:+ @chr10:24834756–24836772:+
|
KIAA1217
|
13
|
C199, B783, B785, B782, C200, B786, B799, B788, B797, B794, B804, B791, B800
|
RI
|
chr9:117835882:117836145:- @chr9:117808689:117808961:- @chr9:117804498:117804620:-
|
TNC
|
12
|
B804, B785, B783, B794, B786, B798, B800, B797, C199, B782, C200, B788
|
SE
|
chr15:63334838:63335142:+@chr15:63335905:63336030:+ @chr15:63336226:63336351:+@chr15:63349184:63349317:+
|
TPM1
|
11
|
C200, B788, B785, B800, B794, B801, C199, B782, B797, B804, B783
|
MXE
|
chr2:238289558:238290142:-@chr2:238287279:238287878:-@chr2:238285415:238285987:-
|
COL6A3
|
11
|
B797, B783, B786, C199, B782, B800, B788, C200, B785, B798, B794
|
SE
|
chr22:31495731:31495882:+@chr22:31496871:31497035:+ @chr22:31500302:31500610:+
|
SMTN
|
10
|
B785, B797, B782, C199, B788, C200, B804, B800, B794, B801
|
SE
|
chr16:15808766:15808938:-@chr16:15802660:15802698:- @chr16:15796992:15797980:-
|
MYH11
|
9
|
C200, B799, B794, B788, B800, B801, B785, C199,B782
|
SE
|
chr17:30692349:30692506:+@chr17:30693684:30693776:+ @chr17:30694791:30695033:+
|
ZNF207
|
9
|
B782, C200, B788, B794, B786, B785, C199, B797, B800
|
SE
|
chr7:134617739:134618141|134618828:+ @chr7:134620439:134620516:+
|
CALD1
|
9
|
B788, C200, B800, B785, B804, B782, C199, B797, B786
|
A5SS
|
chr9:117835882:117836145:-@chr9:117819432:117819704:-@chr9:117808689:117808961:-
|
TNC
|
9
|
B798, B782, B800, B794, C199, C200, B788, B786, B785
|
SE
|
Overlapping AS events in ESCC tissues and SHEE/SHEEC cell lines
A total of 37 AS events were identified to be differentially spliced in both ESCC tissues and cell lines, based on statistical significance, 15 of which had the same trend in ΔΨ values in ESCC tissues and cell lines. Among these 15 AS events, there were 4 SE events, 7 RI events, 2 MXE events, and 2 A5SS events (Supplementary Table 1).
Classification of AS events according to representative transcript types
Of the 349 AS events after re-annotation, 312 events represented at least one protein-coding transcript, which accounted for 89.4% of the total AS events. The remaining 37 AS events did not represent any protein-coding transcripts, including some processed protein coding transcripts, lncRNAs, and microRNAs (Supplementary Table 2). Among the 235 AS events re-annotated with Ensembl transcript annotations, 200 events represented at least one protein-coding transcript, which accounted for 85.1% of the total AS events, and the remaining 35 AS events represented non-protein-coding transcripts. Of the 283 variable splicing events re-annotated with RefSeq transcript annotations, 266 events represented at least one protein-coding transcript, which accounted for 94% of the total AS events, and the remaining 17 AS events represented non-protein-coding transcripts.
Functional enrichment revealed the potential role of AS transcripts
To identify the biological processes and pathways affected by splicing abnormalities in ESCC, we performed enrichment analysis. A total of 270 protein-coding genes were selected for functional and pathway enrichment analysis from the results of alternative splicing events after re-annotation of differential AS events with gene annotations from the Ensembl and RefSeq databases. A total of 234 genes were enriched in 543 biological processes. There were 382 biological processes satisfying the threshold of P < 0.05, and 39 genes were enriched in 12 pathways, of which 10 pathways satisfied P < 0.05. Furthermore, there were 4 KEGG pathways and 20 biological processes that related to invasion and metastasis (Fig. 4).
Splicing regulatory networks in ESCC
To find the regulators of the splicing alterations in ESCC, we constructed a splicing regulatory network with FPKM values of splicing factors and PSI values of AS events. Filtered with a cut-off of P < 0.05 after Spearman correlation analysis, 5,999 splice factor-splicing event pairs and 5,511 splice factor-target gene pairs were chosen to build a splicing regulatory network. This network involved 81 splicing factors and 223 differential AS events (Fig. 5). Splice factors that were differentially expressed in ESCC and matched normal tissues were measured by two public microarray datasets and the RNA-seq dataset generated from our laboratory. It turned out that SF3B4 was significantly differentially expressed in ESCC with a fold-change of about 1.5 (Table 3).
Table 3
Summary of SF3B4 expression using three ESCC datasets
Dataset
|
Num of Samples
|
FoldChange
|
Pvalue
|
Qvalue
|
R packages
|
RNA-seq
|
15 pairs
|
1.422555267
|
0.196605694
|
0.458244843
|
DESeq
|
GSE53624
|
119 pairs
|
1.642700989
|
0
|
0
|
siggenes
|
GSE23400
|
53 pairs
|
1.49618277
|
0
|
0
|
siggenes
|
The splicing regulatory network with SF3B4 and its target genes were also provided and contained 92 target genes and 102 abnormal AS events (Fig. 6A). SF3B4 was found to be a survival-related gene in ESCC. ESCC patients with a lower expression level of SF3B4 had a longer survival time (Fig. 6B). The PSI value of SRSF5’s RI event was found to have the smallest p-value with Spearman correlation analysis with the FPKM of SF3B4 (Fig. 6C). Functional enrichment analysis was also performed to identify the biological processes for SF3B4’s target genes, resulting in a dot plot of 9 biological processes with P < 0.05 (Fig. 6D).