Genome-Wide Proling and Clinical Signicance of Alternative Splicing Events in Glioblastoma

Aims: The purpose of this study was to depict alternative splicing (AS) proles in GBM and identify their clinical signicance in the progression of GBM. Materials and Methods: RNA sequence data and clinical information were downloaded from The Cancer Genome Atlas portal (https://portal.gdc.cancer.gov/projects). Genome-wide alternative splicing events were obtained using SpliceSeq tool. Analyses were performed using GraphPad Prism 7 and R software. Key ndings: Univariate Cox analysis identied 2406 AS events with prognostic signicance. We built an interaction network based on these survival-related AS events. Unsupervised clustering analysis showed that patients in cluster 2 had a better prognosis than those in other clusters. The prognostic splicing factors and AS events were used to generate a splicing network. Seven prognostic signatures, developed based on the top three survival-related AS events, predicted the survival risk and may serve as independent indicators of unfavorable prognosis. Among these risk signatures, only the alternate promoter (AP) signature was upregulated in the mesenchymal subtype, which is characterized by a complex immune microenvironment. A high AP risk score indicated an overloaded local immune response and enriched immune cell inltration, which may accelerate the progression of GBM. Signicance: AS-related signatures may serve as predictors of prognosis as well as provide treatment targets and guidance for GBM patients.


Introduction
Glioblastoma (GBM) is the most common and lethal type of tumor of the central nervous system [1].
Despite advances in treatment options, the median survival of patients with GBM is approximately 14 months. Recent studies highlighted immune disorders as key drivers promoting GBM malignancy.
Previous work from our group showed that an overloaded immune response and disorganized immune microenvironment contribute to poor outcomes and resistance to therapy in GBM [2][3][4].
Alternative splicing (AS) is a common post-transcriptional regulatory mechanism that promotes protein diversity through differential pre-mRNA splicing. More than 90% of multi-exon genes undergo AS [5].
Recent evidence supports a relationship between AS and the occurrence or progression of cancer [6]. In addition, unbalanced expression of splicing variants or failure to properly express the correct isoform is a hallmark of cancer [7,8]. However, the role of AS in GBM progression remains elusive Here, we collected splicing data of 157 GBM samples from the SpliceSeq database. We conducted a systematic analysis by merging the AS data with clinical information and the expression matrix. In addition, we developed seven prognostic signatures based on the top three survival-related AS events, and identi ed a mesenchymal-speci c and immune-related AS signature with prognostic signi cance.
This study is the rst to examine the relationship between AS and the malignant biological features of GBM in a large cohort. The results revealed the clinical value of different AS events, and may provide a new prognostic indicator and potential targets for immune therapy in GBM.

AS data curation process
RNA sequence data and clinical information were downloaded from The Cancer Genome Atlas (TCGA) portal (https://portal.gdc.cancer.gov/projects). The genome-wide AS events were obtained using SpliceSeq tool, a java application that was used to analyze the AS pro les and mRNA splicing patterns of TCGA samples [9]. The percent spliced in (PSI) value was used to quantify AS events in GBM and ranged from 0 to 1. Seven types of AS events were analyzed: alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME), and retained intron (RI).

Survival analysis
Univariate Cox regression analysis was performed to identify the prognostic AS events and splicing factors. Multivariate Cox regression analysis was used to identify independent prognostic factors. The Kaplan-Meier survival curve was used to analyze the prognostic differences between groups. Receiver operating characteristic (ROC) curves were generated to assess the survival prediction accuracy.
Patients with a survival time of < 30 days were excluded.

Functional enrichment analysis
The Limma R package was used to screen out differentially expressed genes based on an absolute fold change value >1.5 and adjusted p value < 0.05. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed using the ClueGo plugin of Cytoscape and the Database for Annotation, Visualization, and Integrated Discovery (DAVID) website [10][11][12]. Gene set enrichment analyses (GSEA) were performed to estimate the enrichment of speci c gene sets [13]. Principal components analysis (PCA) was used to identify the expression patterns of different groups under a speci c gene set background.

Tumor microenvironment analysis
Tumor purity was estimated as described previously and calculated according to the formula reported by Yoshihara [14]. The estimation of eight types of immune cells and two prominent stromal cells was performed using Microenvironment Cell Populations-counter (MCP-counter) [15].
2.5 AS-related network, cluster, and predictive signature construction The interaction network of survival-associated AS genes in GBM was generated by Reactome FI plugin of Cytoscape to explore the potential interactions among survival-related AS events and identify the hub genes. Splicing factor information was downloaded from the SpliceAid2 database (http://www.introni.it/splicing.html). The correlation network between the survival-related splicing factors and the corresponding AS events was visualized by Cytoscape. The cluster analyses based on the survival-related AS events were performed by consensus unsupervised analysis according to the ClusterPro ler R package [16]. Seven predictive signatures were established with different types of AS events according to the results of univariate Cox regression analysis. For each AS event, the top three survival-related events with a PSI value > 0.5 were identi ed to construct the predictive model.

Statistical analysis
UpSet plot was used to visualize the intersections between the seven types of prognostic AS events in GBM using UpSet R package [17]. The correlation between two parameters was determined by Spearman correlation analysis. All statistical analyses were performed using GraphPad Prism 7 and R software. A two sided P value <0.05 was considered statistically signi cant.

Overview of AS events pro ling in GBM
To pro le the AS events in GBM, seven types of AS events present in GBM patients were analyzed ( Figure. 1A). A total of 45610 AS events from 21134 genes were detected, which indicated that one gene was associated with approximately two AS events. We detected 3827 AA events in 2684 genes, 3269 AD events in 2269 genes, 8686 AP events in 3476 genes, 8456 AT events in 3696 genes, 18360 ES events in 6933 genes, 184 ME events in 180 genes, and 2828 RI events in 1896 genes. ES events were the most frequently detected, accounting for nearly half of the total AS events. The frequency of ME events was lower than that of other types of AS events (Figure. 1B).

Identi cation of prognostic AS events in GBM
The relationship between AS events and the prognosis of GBM patients was examined by univariate Cox regression analysis. Among the 45610 AS events, 1643 were identi ed as survival-associated AS events (P < 0.05). Because one gene might have more than one event, we visualized the intersecting survivalassociated AS events using UpSet PlotR. The results showed that one gene could have up to three types of AS events signi cantly associated with GBM survival ( Figure. 1C). Construction of a gene network by Reactome of Cytoscape including the 300 most signi cant survival-associated genes identi ed several genes, such as UBC, STAT3, and FYN, that could serve as the hub genes in the network. The pathways mediated by these genes may play critical roles in regulating the AS network ( Figure. 1D).

Cluster analysis based on prognostic AS events in GBM
To better predict the prognosis of GBM patients based on the scale of AS events, consensus unsupervised clustering was performed based on the 1643 prognostic AS events. We found that the Page 5/20 3.4 Regulatory network between prognostic splicing factors and prognostic AS events AS pro les are regulated by several splicing factors that affect exon selection and splicing site choice by binding to pre-mRNAs [18]. To explore the correlation between the prognostic AS events and known splicing factors, data on splicing factors was obtained from SpliceAid2 and TCGA databases. Univariate Cox regression analysis showed that only ve splicing factors (KHDRBS2, TIA1, TIAL1, ZRANB2, and PTBP2) were signi cantly associated with overall survival (Supplementary Table S1). The relationship between the expression of the ve splicing factors and the PSI value of prognostic AS events was determined to build a splicing regulatory network (Figure.  Table 1. The AS signatures were calculated as follows: risk score = (lnHR of the top AS event × PSI value of the top AS event + lnHR of the top two AS events × PSI value of the top two AS events + lnHR of the top three AS events × PSI value of the top three AS events). We further combined the seven signatures into one model and calculated the risk score as follows = (lnHR of AA × AA score + lnHR of AD × AD score + lnHR of AP × AP score + lnHR of AT × AT score + lnHR of ES × ES score + lnHR of ME × ME score + lnHR of RI × RI score). The median risk score was used as the cutoff to divide the patients into high or low risk groups. The eight predictive models indicated that the survival status of the high-risk groups was worse than that of the corresponding low risk groups ( Figure. 3A-G). optimal number of clusters was ve ( Figure. 2A). GBM patients were assigned into ve clusters as follows: cluster 1 (n = 100, 62.5%), cluster 2 (n = 24, 15%), cluster 3 (n = 26, 16.25%), cluster 4 (n = 7, 4.375%), and cluster 5 (n = 3, 1.875%). Because of the small number of patients in clusters 4 and 5, we performed survival analysis including clusters 1-3. Patients in cluster 2 showed signi cantly better survival than those in the other two clusters (Figure. 2B). The results of proportional analysis to characterize the molecular subtype of AS clusters suggested that cluster 2 was robustly correlated with the proneural subtype ( Figure. 2C, D), which may be related to the better prognosis of cluster 2. However, assessing the prognosis of the other four clusters was challenging.   Table S2). The results of multivariate Cox analyses indicated that the eight signatures were independently correlated with GBM survival after adjusting for age, IDH1 mutation status, MGMT methylation status, chemotherapy, and radiotherapy ( Table 2). These ndings suggest that AS-associated signatures can predict the survival of GBM patients in a stable and independent manner. The AP signature was identi ed as an immune relevant signature To assess the association between the seven predictive signatures and GBM molecular subtypes, the risk scores for four molecular subtypes were analyzed. The AP risk score was signi cantly increased in the mesenchymal subtype ( Figure. 5A), which is an immune relevant subtype [19]. PCA based on immune related genes showed that the distribution pattern of the mesenchymal samples was different from that of non-mesenchymal samples, suggesting that the immune status of the mesenchymal subtype differed from that the other three subtypes ( Figure. 5B). GO and KEGG analyses of signi cantly upregulated genes in the mesenchymal subtype were performed in the DAVID website, and the enrichment results showed that the most relevant biological processes were immune-related terms such as immune response and leukocyte chemotaxis (Figure. 5C). The close association between the AP risk score and the mesenchymal subtype indicated that the AP signature could be considered as an immune-related AS signature. To validate this hypothesis, differentially expressed genes between high AP risk and low AP risk patients were identi ed, and the ClueGo plugin of Cytoscape was used to analyze the functional enrichment. The results showed that the most relevant biological processes were immune response and immune cell chemotaxis (Figure. 6A). Moreover, we depicted the most common somatic mutation events in GBM strati ed by AP score (Figure.  6B). The results showed that the low AP score group occupied more IDH1 mutation, which was another indication of that AP score involved with immune response and microenvironment (Fig. 6C). These ndings were con rmed by GSEA ( Figure. 6D-F). Consistent with our previous study [4], tumor purity was a key factor affecting the prognosis of glioma patients. To explore the effect of the AP risk on the composition of the immune microenvironment, tumor purity was assessed for each patient, as well as the enrichment score of several hub non-tumor cells. Correlation analysis showed a signi cant negative correlation between the AP risk score and GBM purity (Figure. 6G). Analysis of the correlations between AP risk and the scores of eight hub non-tumor cells suggested that neutrophils were the most relevant components of the microenvironment involved in AP risk (Figure. 6H). These ndings indicated that the AP signature may predict the prognosis of GBM through its effect on the immune response and the composition of the immune microenvironment.

Discussion
Increasing evidence supports the contribution of aberrant AS events or splicing factors to the malignant progression of glioma. AS can modulate transcriptional activity in glioma [20], and AS of telomerase reverse transcriptase is modulated by CX-5461 in glioma [21]. Splicing factors are also involved in glioma progression; SRSF1 is a splicing factor that promotes gliomagenesis via oncogenic splice-switching of MYO1B [22]. Studies suggest that apoptotic cell-derived extracellular vesicles promote GBM malignancy by mediating the intracellular transfer of splicing factors [23]. AS contributes to the variability of isoforms, which may play different roles in the progression of GBM [24,25]. Therefore, it is of great importance to elucidate the ambiguous functions of AS events and to determine the levels of splicing factors in GBM. However, comprehensive genome-wide pro ling or analysis of the clinical signi cance of AS events in a large glioma cohort is limited, especially in GBM patients. In this study, pro ling of AS events was performed in a large cohort of GBM patients, and AS related signatures were established to predict survival outcomes.
The present AS related genome-wide analysis identi ed seven AS modes in GBM patients, and showed that one gene was generally associated with two AS events, which is consistent with data in other tumors [26,27]. ES events accounted for approximately 50% of all AS events, indicating that they may be the most important AS events contributing to the malignancy of GBM; however, further functional experiments are necessary to con rm these ndings. To identify AS events with prognostic value that may serve as promising targets, gene network analysis by Cytoscape was performed including the top 300 survival related events, and several AS-related genes such as UBC, STAT3, and FYN were identi ed as hub genes. Then, we attempted to perform unsupervised analysis, which is considered as a better method to predict survival based on AS events [28]. However, the results showed that only one cluster was associated with survival status, which accounted for a small proportion of GBM patients. The predictive ability of the cluster was limited, and the majority of GBM patients could not be distinguished. Because splicing factors play critical roles in malignant progression [29,30], we constructed a splicing factor related regulatory network. However, the predictive ability was low even for survival-associated spicing factors.
Considering stable traits and the applicability of risk signatures [26][27][28]31], we developed seven prognostic signatures based on the top three survival-related AS events. The results showed that all the AS-related signatures could signi cantly distinguish the survival status and serve as independent indicators of unfavorable prognosis. Studies indicate that the central nervous system is not immuneprivileged [32]. Moreover, previous studies from our group demonstrated that the overloaded immune response as well as the disorganized immune microenvironment contribute to the malignancy of GBM [2][3][4]. There is limited data on the in uence of AS events on immune regulation. In the present study, we showed that STAT3 and FYN, which are involved in immune regulation, occupy hub positions among the network of survival related AS events [33,34]. These results supported the existence of a robust correlation between AS and immune responses. Because the molecular subtype of GBM is associated with the immune status, we compared the risk scores for four molecular subtypes. The results showed that only the AP risk score was considerably increased in the mesenchymal subtype, which is the subtype with the most involved immune status and microenvironment [19]. Further biological enrichment analyses con rmed that the AP risk score was associated with an enhanced immune response. Analysis of the immune microenvironment showed that the AP score was negatively correlated with tumor purity, whereas it was positively correlated with several immune components, such as NK cells and neutrophils. In previous work, we identi ed neutrophils as components of the microenvironment associated with AP risk in glioma. These ndings indicate that interfering with AP related events may increase the e cacy of immunotherapy in GBM patients.

Conclusion
In this study, we performed genome-wide pro ling of AS events and identi ed AS events with signi cant prognostic value in GBM patients. We also established AS-related predictive signatures based on different AS modes, and found an association between the AP signature and immune responses. AS-related signatures may have predictive ability and provide useful treatment targets and guidance for GBM patients. One limitation of the study was the lack of validation cohorts. Future studies should focus on functional experiments and on exploring the molecular mechanisms underlying the effect of AP signature-related events on the immune response and microenvironment.    The AP signature was associated with the mesenchymal subtype of GBM. (A) Seven types of AS risk scores were analyzed among four GBM molecular subtypes, and only the AP risk score was signi cantly elevated in the mesenchymal subtype. (B) PCA analysis showed that mesenchymal samples were distributed with a different pattern than non-mesenchymal samples based on immune-related genes. (C) GO and KEGG analyses were performed based on signi cantly upregulated genes in the mesenchymal subtype, and the enrichment results showed that the most relevant biological processes were immunerelated terms. (*P < 0.05, ** P < 0.01, *** P < 0.001, **** P < 0.0001, NS, not signi cant)