Role of survival-associated alternative splicing events in the prognosis of ovarian cancer

Objective Alternative splicing (AS) events play a crucial role in the tumorigenesis and progression of various cancers. In the present study, we aimed to identify specic AS events, which might be prognostic markers and therapeutic targets for ovarian cancer (OV). Transcriptome data, clinical information, and Percent Spliced In (PSI) values were downloaded from TCGA database and TCGA SpliceSeq to explore the role of AS events in the prognosis of OV patients. Univariate and multivariate Cox regression analyses were performed to identify survival-associated AS events and develop multi-AS-based prognostic models. The K-M curves and ROC curves were conducted based on prognostic AS event models. Moreover, a splicing regulatory network was established according to the correlation between AS events and splicing factors (SFs). Finally, we performed functional enrichment analysis by GO terms and KEGG pathways. AS event1 β AS event2 × AS event2 + · · · + eventn eventn test


Introduction
Ovarian cancer (OV) is the leading cause of gynecologic cancer-related death in developed countries, and it ranks the sixth most prevalent form of cancer worldwide. Nearly 70% of OV patients are diagnosed at an advanced stage, and the 5-year survival rate is below 30% [1,2]. It is generally believed that dysregulation of gene expression plays an important role in the occurrence and development of tumors.
An increasing body of evidence suggests that through the analysis of gene expression patterns of various cancer types, diagnosis, prognostic markers, and even new therapeutic targets can be identi ed [3].
Although a large number of studies have focused on the alterations of gene expression to explore OV, the underlying molecular mechanism associated with tumorigenesis and progression remains largely unexplored.
Alternative splicing (AS) events constitute a prevalent mechanism in expanding the genetic diversity of eukaryotic cells. Spatiotemporal expression pro les of AS transcripts substantially contribute to cell differentiation, speci cation, and organogenesis [4]. In 1977, Phillip Sharp and Richard Roberts discovered the concept of "split genes" almost at the same time. Since then, AS events have been found to play important roles in many human diseases, including cancer. Several studies have identi ed that AS events were associated with the occurrence, development, and metastasis of multiple types of cancers [5][6][7][8]. These ndings suggest that AS events are valuable targets for cancer diagnosis, treatment, and prognosis prediction. With the development of next-generation sequencing technologies and the construction of The Cancer Genome Atlas (TCGA) database, the integrative analysis of RNA-sequencing data and prognosis information of patients makes it possible to systematically analyze the survivalrelated AS events in several types of cancers [9]. The prognostic values of AS events have been reported in patients with glioblastoma, breast cancer, lung cancer, and so on [10]. Furthermore, dysregulated splicing factors (SFs) have been reported to cause global alteration of AS events in cancer [3]. In OV, a few studies have reported AS events by comparing cancer tissues with normal tissues [1,11,12]. However, the comprehensive pattern of AS events in OV has not been elucidated.
In the present study, we conducted systematic pro ling of genome-wide AS events in OV patients, which included 544 OV cases that were shared in TCGA SpliceSeq database and RNA-seq expression spectra.
Moreover, we studied the relationship between aberrant AS events and the prognosis of OV patients. Univariate and Multivariate Cox regression analyses were used to identify survival-associated AS events.
A total of 1,472 AS events associated with the survival of OV patients were identi ed, and prognostic models of seven types of AS events and the whole cohort were constructed. The Kaplan-Meier (K-M) curves and receiver operating characteristic (ROC) curves were performed to demonstrate the prognostic value of each type of prognostic model. We found that all these eight prognostic models could discriminate the high-risk group from the low-risk group. Notably, the AUC value of AD, AP, AT, ES, ME, and the whole cohort was more than 0.70, indicating that these six models had valuable prediction strength.
Furthermore, we also used the clinical data from TCGA database to explore the relationship between AS events and clinical features. The risk score of prognostic models was identi ed as an independent prognostic factor. Besides, SFs associated with AS events were also identi ed, and the relationship network between AS events and SFs was established. The AS-SF correlation network revealed several hub SF genes, including DDX39B, PNN, LUC7L3, ZC3H4, and SRSF11. Collectively, our ndings provided valuable insights into the potential functions of AS events and identi ed reliable prognostic biomarkers for OV.

Data collection and processing
The RNA transcriptome pro les and clinical information of the OV cohorts were retrieved from TCGA database [13], while the Percent-spliced-in (PSI) data of AS events were obtained from TCGA SpliceSeq database. PSI values ranging from 0 to 1 were used to quantify the AS events [14]. Subsequently, the AS events were annotated by combining the splicing type, ID number in the SpliceSeq, and the corresponding parent gene symbol [15]. Seven types of AS events were included in the present study, such as exon skipping (ES), mutually exclusive exon (ME), retained intron (RI), alternate promoter (AP), alternate terminator (AT), alternative donor site (AD), and alternative acceptor site (AA).

Survival-associated AS events in OV patients
A total of 544OV patients were included in the present study. Patients with a follow-up of fewer than 90 days were excluded because these patients might die due to other factors, such as surgical complications. Univariate Cox regression was used to screen the survival-associated AS events with a Pvalue < 0.05 [16]. Upset plots were created by UpSet R to analyze the intersections of all seven types of survival-related AS events in OV. Besides, the bubble charts were used to summarize the top 20 AS events of each type, except for the ME events, which only had eight survival-related AS events.
Construction of the prognostic model for OV patients Lasso regression analysis was employed to select survival-associated AS events in each splicing type, which could avoid over-tting. Then the prognostic models based on each splicing type and total events were constructed by multivariate Cox analysis. In downloaded AS events, 412 OV patients met the condition, which is non-null >75%. Then they were combined with 544 OV patients who were followed for more than 90 days. A total of 384 people were nally eligible. The prognostic risk score of each prognostic model was calculated for the prediction of OV, and the formula used for calculating the risk score for each patient was as follows: Riskscore = β AS event1 × PSI AS event1 + β AS event2 × PSI AS event2 + · · · + β AS eventn × PSI AS eventn . The patients were divided into two subgroups (high-risk and low-risk) according to the median risk score. There were 192 cases in each subgroup. K-M test was performed to estimate the predictive accuracy of each prognostic model. ROC curve and AUC were calculated by the "survival ROC" R package to assess the predictive power of each prognostic model. The detailed information of AS events, including the distribution of risk score, the distribution of survival time, and the expression heatmap, was also visualized.

Estimation of independent prognostic value
The risk scores of each prognostic model and two important clinical features, including grade and age, were integrated into the univariate and multivariate Cox regression analyses to evaluate whether these features could be used as independent risk factors.
Potential correlation network of survival-associated AS and SFs SFs can regulate AS events by binding to pre-mRNAs, affecting exon selection and choice of splicing site [17]. To analyze the correlation between survival-associated AS events and SFs, a regulatory network was constructed between SF genes and AS events. The expression data of SFs were extracted from TCGA database. The correlation between the SFs and these survival-associated AS events was analyzed using Pearson's correlation test. The AS-SF correlation network was plotted and visualized using the Cytoscape (3.7.1) software.

Functional enrichment analysis
To further explore the underlying mechanisms of AS in OV, we identi ed corresponding SF genes of AS events. Functional enrichment analysis was carried out using the Database for Annotation, Visualization, and Integrated Discovery (DAVID) online functional annotation tool. GO terms and KEGG pathways with P< 0.05 were considered a statistically signi cant difference. The SFs related to survival-associated AS events were selected as candidates for GO and KEGG pathway enrichment analysis. Both GO analysis and KEGG analysis were conducted using R x64 3.6.1 software.

Details of AS events
A total of 544 OV patients were included in the present study, and 47,922 AS events in 21,794 gene symbols were identi ed (Supplementary Table 1 Figure 1A). The results showed that ES was the main splicing pattern, while ME was the least frequent event among the seven types of AS events in OV patients. It was important to note that the number of AS events far exceeded the number of genes. Figure  1B shows that one gene could undergo up to ve types of AS events ( Figure 1B).

Survival-associated AS events
We identi ed 1,472 AS events using the AS event pro les in the OV cohort, which were signi cantly associated with overall survival (OS) of OV patients by univariate Cox regression analysis (P< 0.05). Figure 2A lists the number of each type of AS event. To better visualize the intersection, an UpSet plot was created as shown in Figure 2B, and We found that up to three survival-associated AS events could occur in the same gene ( Figure 2B). Speci cally, ES, AP, AT, AA, AD, and RI were all signi cantly linked to the OS of patients. Figure 2C indicates the AS events that were associated with survival of patients (red dots) and not associated with survival of patients (blue dots), showing that most AS events were signi cantly associated with patients' survival. Figure 2D-J show the top 20 most signi cant survivalassociated AS events of each type. For ME events, only eight AS events were related to survival ( Figure  2I). The results were shown in Supplementary Table 1.

Prognostic model selection and survival analysis
Lasso regression analysis was performed to avoid over-tting and exclude the co-expressed AS events, which were selected by Univariate Cox analysis. Figure 3 presents the results of Lasso regression analysis including seven types of AS events and all AS events, and we selected the most highly correlated AS events. Multivariate Cox analysis was then used to construct predictive models based on particular AS events and calculate the risk scores. Table 1 lists the prognostic models of seven types of AS events and total AS events. Patients were then divided into high-risk and low-risk subgroups according to the median risk score of each model. There were 192 cases in each subgroup. The median risk scores of AA, AD, AP, AT, ES, ME, RI, and the whole cohort were 0.9788, 0.9437, 0.9235, 0.9892, 0.9083, 1.006, 0.9735, and 0.9137, respectively. According to K-M survival analysis, we found that eight types of prognostic models played signi cant roles in distinguishing good or poor outcomes of patients ( Figure 4A-H). We plotted the ROC curves and calculated the AUC to compare the e ciency of these predictors. The results revealed that the risk score of AD re ected the greatest prognostic power with an AUC of 0.768, followed by AP with an AUC of 0.758, ME with an AUC of 0.756, and AT with an AUC of 0.755 ( Figure 5A-H). Figure 6 illustrates the distribution of patients' survival status, risk score, and the expression heatmap of prognostic models. The risk curve showed the result of patient ranking based on the risk scores. There was a difference between the high-risk group and the low-risk group in risk score. The survival status of patients disclosed that there were higher mortality rates in the high-risk group (green dots represent survival, and red dots represent death). The color transition from green to red in the heat map indicated that the PSI score of the AS events was increased from low to high.

Estimation of independent prognostic value
We used Univariate and Multivariate Cox regression analyses to estimate the independent prognostic value of age, grade, and risk score of each prognostic model. Univariate Cox regression analysis indicated that both age and risk score could predict survival of OV patients (Table 2). However, multivariate Cox regression showed that the risk score of each prognostic model was the independent prognostic indicator of OV survival. The age of predictive models, such as AA, AD, AP, ES, ME, RI, and the whole cohort, was the independent prognostic indicator of OV patients, except for AT events (Figure 7).

Correlation network of SFs
To analyze the correlation between survival-associated AS events and SFs, an AS-SF network was constructed based on the result of Pearson's correlation test. Figure 8A showed that the network contained 56 SFs (blue triangles) and 104 survival-associated AS events, including 45 down-regulated AS events and 59 up-regulated AS events (red and green dots). The green lines represented AS events, which were positively correlated with the expressions of SFs, while red lines indicated negative correlations.
GO functional and KEGG pathway enrichment analyses of OV patients GO analysis demonstrated that "mRNA splicing via spliceosome", "regulation of RNA splicing", "mRNA processing", "RNA processing", and "regulation of alternative mRNA splicing via splicesome" were the most signi cant biological process terms. Moreover, "nucleoplasm", "membrane", "catalytic step 2 splicesome", and "nuclear speck" were the most three signi cant cellular component terms. Besides, "poly (A) RNA binding", "nucleotide binding", and "ATP binding" were the most three signi cant molecular function terms ( Figure 9A). KEGG analysis revealed four remarkably enriched pathways, including "spliceosome", "RNA transport", "mRNA surveillance pathway", and "RNA degradation". It also revealed that these genes were mainly involved in the "spliceosome" pathways ( Figure 9B).

Discussion
AS event is one of the main engines driving proteome diversity. It is estimated that up to 94% of genes are alternatively spliced in humans. Just as many other cellular processes are modi ed during cellular growth, differentiation, and tissue development, AS events are also affected [18]. AS allows cells to generate diverse mRNA by modifying mRNA isoforms. The plasticity of AS is often exploited by cancer cells to produce isoform switches that promote cancer cell survival, proliferation, metastasis, and drug resistance. In recent years, it has been proved that AS events play an important role in the occurrence and development of several types of tumors. For example, AS event of TCF-4 is found to inhibit the proliferation and metastasis of lung cancer cells [19].TP53, FAS, CASP9, and BCL2L1 are also associated with the apoptosis and survival of cancer cells [20]. Calabretta's study has revealed that modulation of the pyruvate kinase gene (PKM) splicing can promote gemcitabine resistance in pancreatic cancer cells [21].
Recently, with the development of bioinformatics technology, TCGA project contains a large amount of RNA-seq data, PSI value of AS events, and clinical information of patients, which provides a rich source for the exploration of the relationship between AS events and the prognosis of cancer patients [19][20][21]. Associations between AS events and prognosis of patients have been demonstrated in non-small cell lung cancer (NSCLC), adrenocortical carcinoma (AC), head and neck squamous cell carcinoma (HNSCC), and so on. Previous studies have demonstrated the role of several AS patterns in OV. Dutta et al. have reported that EVI1 is frequently aberrantly spliced in OV, and the dominant form of EVI1 (EVI1 Del190-515 ) plays oncogenic roles in the tumorigenesis of OV [22]. Sosulski et al. have reported that CD44 variants containing exons v8-10 (CD44 v8-10) are associated with metastasis and worse prognosis in OV [23]. Although these reports provide evidence for the involvement of AS events in OV, it is still urgently necessary to systematically analyze the characteristics of AS events, which may provide potential prognostic biomarkers and therapeutic targets.
To explore the prognostic signi cance of AS events, we identi ed survival-related AS events and constructed predictive models based on each type of AS event for OV. In the present study, we systematically described the AS pro les and explored the interaction network between AS events and SFs in OV. A total of 47,922 AS events of 21,794 genes were detected, indicating that AS was a common process in OV. ES events were the main type, accounting for 1/3 of the total AS events. AP events were the second most frequently occurring type, followed by AT events. Next, Lasso regression and Multivariate Cox regression analysis were performed to construct eight predictive models, including seven types of AS events and total AS events. These OV patients from TCGA database were then divided into low-risk and high-risk groups according to the risk score. K-M analysis demonstrated that the difference in OS between the low-risk patients and high-risk patients was signi cant. The results revealed that the risk score of the AD model consisting of 15 survival-related AS events had the greatest prognostic power with an AUC of 0.768, followed by AP, ME, AT, and the whole cohort. Previous studies have also identi ed survival-related mRNAs and evaluated the prognostic power of the constructed model. Pan et al. have established a six-mRNA signature relevant to the prognosis of OV, with an AUC of 0.711 [24]. Zhao et al.
have established a nine-gene-based signature for OS using LASSO Cox regression analysis of TCGA dataset [25]. Time-dependent ROC analysis shows that the AUC values for OS are 0.640, 0.663, 0.758, and 0.891 at 1, 3, 5, and 10 years, respectively. A single mRNA gene might have one to seven types of AS events. Therefore, it has an obvious advantage to serve as a prognosis marker for AS events.
An interaction network between AS events and SFs was also established. The network contained 56 SFs and 104 survival-associated AS events (including 59 up-regulated AS events and 45 down-regulated AS events). Previous studies have shown that a single gene could regulate multiple AS events of the same parental gene, even in opposite way. Interestingly, our results indicated that MSI1 could positively regulate TACC2-13336-AP and negatively regulate TACC2-13333-AP which was consistent with previous reports.
Besides, GO functional enrichment and KEGG pathway analysis for the SFs signi cantly related to AS events provided helpful clues to elucidate the underlying mechanism of AS events in OV patients.
According to the results, "mRNA splicing via spliceosome", "regulation of RNA splicing", "mRNA processing", "RNA processing", and "regulation of alternative mRNA splicing via splicesome" were the most signi cant biological process terms. Moreover, "nucleoplasm", "membrane", and "catalytic step 2 splicesome" were the most three signi cant cellular component terms. Besides, "poly(A) RNA binding", "nucleotide binding", and "ATP binding" were the most three signi cant molecular function terms. KEGG pathway analysis revealed that these genes were mainly involved in the "spliceosome" pathways.
Although our predictor performed well in prognosis prediction of OV, there were inevitably several limitations in the current study. First of all, the data we collected from public databases were limited. Therefore, the clinical information was not comprehensive and might cause potential bias and errors. Second, our exploration of the mechanisms was not deep enough, and further studies, such as molecular and clinical trials, are necessary to con rm these ndings. Our data revealed the prognostic value of survival-associated AS events and related SFs, which might play essential roles in tumor initiation and progression by regulating the corresponding AS events. Collectively, our ndings might provide valuable insights into effective therapies using AS events for OV.     ROC curves to evaluate the predictive power of each prognostic signature and the risk score of AD re ected the greatest prognostic power with an AUC of 0.768. AA cohort (A), AD cohort (B), AP cohort (C), AT cohort (D), ES cohort (E), ME cohort (F), RI cohort (G), and the whole cohort (H).

Figure 6
Patients were then divided into high-risk and low-risk subgroups according to the median risk score of each model. The upper part is the risk score of each individual. The middle part is the survival status of each individual (green dots represent survival and red dots represent death). The bottom part is the heatmap of AS events. The color transition from green to red indicates the increasing PSI score of corresponding AS events from low to high. AA cohort (A-C), AD cohort (D-F), AP cohort (G-I), AT cohort