Overview of AS events in EOGC cohort
The corresponding RNA-Seq data of 80 EOGC patients were used to establish the integrated AS event profiling. We identified 66,075 AS events from 11,282 genes, which accounted for 53.0% of non-redundant human protein-coding genes . Beside two special patterns of AS events: tandem transcription start site (TSS) and tandem alternative polyadenylation Site (TAPA), these AS events were classified into six canonical splicing patterns: core exon (CE), alternative acceptor splice site (AA), alternative donor splice site (AD), retained intron (RI), alternative first exon (AF), and alternative last exon (AL), as illustrated in Fig. 1a. Among these splicing patterns, TAPA occurred most frequently (71.0%) (Fig. 1b).
Some gene isoforms showed very low redundancy, thus, we screened the AS events with a series of filters (percentage of samples with PSI values ≥ 75, PSI value ≥ 0.05). Consequently, a total of 15,803 AS events from 8,359 genes were obtained. After filtering, AF events were excluded. The analysis indicated that TAPA was still the top AS pattern (54.0%) (Fig. 1c). After removal of the duplicated genes in one AS pattern, the Upset plots were created to quantitatively analyze interactive sets of the remaining seven patterns of AS events. As shown in Fig. 1c, most of the genes had more than one pattern of AS events, among which some genes had up to three different splicing patterns .
Identification of ESAS events
To identify the EOGC-specific AS events, we compared the PSI values between 80 paired tumor tissues and 79 adjacent normal tissues. A total of 267 EOGC-specific AS events (ESASs) from 228 genes were identified (Fig. 2a, Table S1). Although a large number of AS events were detected in the EOGC cohort, a relatively small proportion of AS events were identified as ESAS. TSS and TAPA events accounted for 36.3% and 24.0% of ESAS, respectively (Fig. 2b). The uneven distribution of AS patterns in tumor and adjacent normal tissues indicated that they might play important roles in early-onset gastric tumorigenesis.
For the single gene with multiple ESAS events, the regulatory directions of varied ESAS events between tumor and normal tissues can be either the same or opposite. Eight genes (COL5A1, CBWD1, SLC47A2, SSPO, AL020989.1, RPL5P1, AACSP1, and AC022905.1) exhibited the same regulatory direction of AS events; while the other 22 genes showed opposite regulatory direction (Table S2). Interestingly, the proportions of certain AS patterns between the ESAS and entire AS events were inconsistent. TAPA, a top pattern among all AS events (54.0%), only contributed to 24.0% of the ESAS events.
Profile of EOGC-specific genes and gene isoforms
Based on the above results, we further studied how the dysregulated AS events affected the expressions of genes and gene isoforms. We identified 4809 genes from a total of 16383 genes,and also identified 6152 gene isoforms from a total of 16283 gene isoforms (CPM > 0.5 and the percentage of samples with read count values > 95%).
To generate EOGC-specific genes and gene isoforms, we compared the read count value of genes or gene isoforms from the tumor and normal or adjacent non-tumor tissues by limma-voom . After refining, 373 genes and 469 gene isoforms were identified as EOGC-specific genes and gene isoforms, respectively (absolute fold change ≥ 1 and FDR < 0.05, Fig. 3a &3b, Table S3). In addition, we profiled the enriched TF binding motifs in promoters of the EOGC-specific genes and gene isoforms (Table S4). A total of four genes and three gene isoforms with ESASs were differentially expressed in the EOGC (Fig. 3c).
Pathway and protein-protein interaction (PPI) network of ESAS events and EOGC-specific genes and gene isoforms
To understand the biological roles of the ESAS and EOGC-specific genes and gene isoforms, the GO and KEGG pathway analyses were performed. The results showed that the ESAS parent genes were linked to the GO terms of carboxylic acid biosynthetic process (GO:0046394), organic acid biosynthetic process (GO:0016053), purine-containing compound metabolic process (GO:0072521), collagen-containing extracellular matrix (GO:0062023), etc.. These ESAS parent genes were also found to be enriched in KEGG signaling pathways, including Nicotine addiction (hsa05033) and Arginine and proline metabolism (hsa00330) (Fig. 4a and Table S5). The GO and KEGG pathway results of the EOGC-specific genes and gene isoforms were also included in the supplementary table (Table S5).
The pathway analysis results showed that the parent genes of ESAS might play a vital role in regulating the cancer-related biological processes. The parent genes of ESAS were further analyzed by the PPI network. Using the Zero-order network, we built the PPI network of the ESAS parent genes, which contained 15 nodes, 14 edges and 15 seeds (Fig. 4b, Table S6). We found that UBC was the hub gene in the network. Meanwhile, based on the first-order network, we also built 29 and 27 PPI networks using EOGC-specific genes and gene isoforms, respectively (Fig. 4c&4d, Table S6). UBC, NEK2, EPHB2, and DCTN1 genes were identified as the hub genes in two or more of these networks (DCTN1 was identified as hub gene in the section of protein modification analysis).
Relationship between ESAS and immune cell infiltration
To untangle the relationship between the hub genes and gastric tumorigenesis, we analyzed the hub genes in TCGA STAD dataset using TIMER2 database. We found that UBC, NEK2, EPHB2, and DCTN1 genes were dysregulated in TCGA STAD dataset (p-value < 0.001, Fig. 5a). The correlation between the expression of the hub genes and cell abundances in STAD dataset was analyzed. We found that the expression of the UBC gene was positively correlated with the infiltration level of common lymphoid progenitor cells but was negatively correlated with the infiltration level of neutrophil cells. The expression of NEK2 gene was positively correlated with the infiltration level of myeloid-derived suppressor cells (MDSC) but was negatively correlated with the infiltration level of hematopoietic stem cells. The expression of EPHB2 gene was positively correlated with the infiltration level of NK cells but was negatively correlated with the infiltration level of activated myeloid dendritic cells. The expression of DCTN1 gene was positively correlated with the infiltration level of endothelial cells but was negatively correlated with the infiltration level of common lymphoid progenitor cells (Fig. 5b, Table S7). The above findings supported our conclusion that the hub genes played vital roles in gastric tumorigenesis and indicated the potential relationships between the AS events and cell abundance.
Here, we postulated that the ESAS might be also involved in the immune cell infiltration. To test the hypothesis, the immune cell abundancy in EOGC was calculated ImmuCellAI database. Then, Spearman's rank correlation analyses were performed to indicate the relationship between the ESAS and immune cell infiltration in EOGC. In total, 24 types of immune cells (including the infiltration score) were correlated with 77 ESAS events (Spearman's rank correlation coefficients > 0.4 or < -0.4, BH adjusted p-value < 0.05, Fig. 5c, Table S8).
Network of ESAS and regulatory factors
The AS events are regulated by various factors, including splicing factors (SFs) and APA core factors. However, it is unknown how ESAS are regulated by these factors in EOGC. To gain insights into this, we built the correlation network between the expression of 71 experimentally validated SFs, 22 APA core factors, and the PSI values of ESASs [25, 26].
In the network of ESAS and SFs, 70 ESASs were associated with 25 SFs (Fig. 6a, Table S9). All SFs were significantly correlated with at least five AS events. Moreover, one AS event could be regulated by up to 28 different SFs. We also constructed a network of APA core factors and TAPAs, which included seven APA core factors and ten ESAS events, respectively (Fig. 6b, Table S9).
Clinically relevant and protein modification associated AS events
There are few analyses to identify clinically relevant and protein modification associated AS events in EOGC or GC. In this study, we identified 864 differentially expressed AS events from 577 genes between EBV positive and negative tumor tissues, 3335 AS events from 2056 genes associated with MSI-High (MSI-H) and MSI-Low (MSI-L) tumor tissues (absolute fold change ≥ 1 and FDR < 0.05) (Fig. 7a, Table S10).
The patients were subtyped according to the status of protein modifications, including phosphorylation and glycosylation, by a previous study . We found that 82 AS events from 60 genes were differentially expressed among phosphorylation subtypes one to three, while the other 85 AS events from 65 genes were differentially expressed among glycosylation subtypes one to three (absolute fold change ≥ 1 and FDR < 0.05) (Fig. 7b&7c, Table S10). In addition, a total of 27 AS events were dysregulated in both the phosphorylation and glycosylation subtypes (Fig. 7a). DCTN1 gene is a key hub gene in both phosphorylation and glycosylation networks.