Identification of Prognostic Alternative Splicing Signature in Triple-negative breast cancer

Background: Alternative splicing (AS) is a pervasive and vital mechanism involved in the progression of various cancer. Studies confirm the importance of prognostic value of AS events in tumor patients, but systematic analysis of AS in triple-negative breast cancer (TNBC) is still lacking. Methods: Information from 115 TNBC patients from the Cancer Genome Atlas (TCGA) database were extracted. And we performed a comprehensive analysis of whole-genome AS events with corresponding clinical information to evaluate the roles of seven AS patterns. Prognostic analyses were performed with predictive models and splicing network built for TNBC patients. Results: Among 28,744 mRNA AS events in 20,353 genes, we detected 1,428 AS in 138 important survival genes related to the overall survival of TNBC patients event. Through functional and pathway enrichment analysis, we found that these genes are involved in ubiquitin-mediated proteolytic pathways. At 1800 days overall survival, the area under the ROC curve for prognostic signatures was 0.8. It shows that this model is very effective in differentiating patient prognosis. The use of Spearman's test to establish a potential regulatory network between survival-related AS events and abnormal SF indicates a clear trend in the role of SF in TNBC. Conclusions: In summary, we established a reliable and powerful TNBC prognostic signature. A splicing network that could be its underlying mechanism was discovered.


Background
Breast cancer (BC) is the most commonly diagnosed cancer in women. By 2020, an estimated 276,480 women will develop breast cancer in the United States, accounting for 30 percent of the 912,930 new cancer cases diagnosed in women. BC is also the second leading reason of cancer deaths in women(15%) (Siegel, Miller, & Jemal, 2020). Clearly, breast cancer has been one of the greatest threats to women's health. Triple-negative breast cancer (TNBC) is a high-grade tumor with poor prognosis, accounting for 15-20% of BC, which characterized by lacking expression of the estrogen receptor (ER), progesterone receptor (PgR) and human epidermal growth factor receptor 2 (HER2) (Criscitiello et al., 2018). However, compared to other molecular subtypes of BC, TNBC lacks a clear molecular target. At present, chemotherapy is still one of the more commonly used treatments (de Nonneville et al., 2019). Accordingly, further screening of potential diagnostic biomarkers to treat TNBC is imminent.
Alternative splicing (AS) is a pervasive and vital mechanism involved in the progression of various cancer by expanding genomic encoding capacity and increasing protein complexity (Zhang, Duan, Wang, & Lin, 2019). AS was involved in a series of oncogenic processes, such as proliferation, metastasis, apoptosis and immune escape (David & Manley, 2010). There have been more evidences prove that AS is exploited by tumor cells, cancer cells depend on specific isoforms at the stage of carcinogenesis and disease development (Genov,  In conclusion, our study made the first attempt to integrate splicing profiles and TCGA clinical factors of TNBC patients to comprehensively research the prognostic value of AS and constructed prognostic signature for TNBC patients. Furthermore, in addition to valuable prognostic factors for patients, the present study uncovered interesting splicing networks in TNBC further revealing the potential mechanism of AS in TNBC tumorigenesis. Our results unravel the pattern of global aberrant AS and its clinical implications in TNBC and provide the evidence for further screening and identifying the potential biomarkers for the early diagnosis of TNBC.

Data collection and processing
Download the RNA sequencing data from the BRCA queue and the corresponding clinical information from the TCGA data portal. After removing patients with incomplete information, 122 TNBC samples were obtained.
Meanwhile, we collected data of AS events using a Percent Spliced In (PSI) value of exceeding 75% in TNBC samples from the TCGA SpliceSep database to develop the AS profiling for every TNBC patient.
The PSI value was an intuitive quantification used to quantify splicing event. The PSI values range from 0 to 100 (%), suggesting a shift in splicing events (Ryan, Cleland, Kim, Wong, & Weinstein, 2012). The AS events include seven types: Exon Skip (ES), Mutually Exclusive Exons (ME), Retained Intron (RI), Alternate Promoter (AP), Alternate Terminator (AT), Alternate Donor site (AD), and Alternate Acceptor site (AA). These seven types of schematic diagrams were shown in Figure.1. In the end, 115 patients were enrolled in our cohort.

Survival analysis
We performed Univariate Cox regression analysis to assess the relationship between AS events and overall survival (OS), then applied Multivariate COX regression to analyze independent prognostic predictors. In order to compare the efficiencies of prognostic factors, the survival Receiver Operating Characteristic Curve (ROC) package in R was used to estimate the time-dependent ROC curve with review data, and the Area Under Curve (AUC) of the ROC curves were generated for these models (Li et al., 2017). The model's ability to predict the outcomes at the points in time were compared on account of fewer events occurred after 1800 days in all survival analyses. Finally, we constructed the risk score models according to a combination of gene expression levels weighted by regression coefficient (β) originating from the multivariate Cox regression analysis. The formula was as follows: Risk score = β gene1 × expr gene1 + β gene2 × expr gene2 + ··· + β genen × expr genen .
All analyses were performed using R software (version 3.4.1).

Bioinformatics analyses
Here, the UpSet plot for quantitative analysis of the interaction sets between seven sorts of AS was from Functional pathway enrichment analysis Using the annotations of DAVID (Version 6.8) to visualise and integrate database, gene ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were performed for the genes with survival associated AS events (Sherman et al., 2007). GO and KEGG were used to assess the functional categories, terms with a P value and q-value both smaller than 0.05 were considered significant categories. P value < 0.05 was selected as the standard segmentation.

Construction of Potential SFs-ASs Network
A list of 404 SFs was got from a previous study (Supplementary Table 1 Benjamini and Hochberg (BH) correlation, P value < 0.05 was considered significant. Use Cytoscape (3.7.1) to generate a potential SFs-ASs regulatory network among the significantly correlated pairs.

Results
The Overviews of mRNA splicing events in TNBC from TCGA A flowchart in Fig. 1 showed detailed processes of our study design. Integrated mRNA splicing events profiles for 115 TNBC patients were established in depth from RNA sequencing data. In Fig. 2A, seven sorts of AS events were illustrated. As a whole, there are 28,744 mRNA splicing events in 20,353 genes in TNBC cohort, which contains 3215 AA events in 2456 genes, 2624 AD events in 2101 genes, 3480 AP events in 3480 genes, 3715 AT events in 3715 genes, 12988 ES events in 6661 genes, 70 ME events in 70 genes, and 2652 RI events in 1870 genes for evaluating prognostic value (Fig. 2B).

Molecular characteristics of survival-associated AS events
To reveal the molecular characteristics of these genes with survival associated-AS events, several bioinformatics analyses were conducted. According to the functional annotations of DAVID in Supplementary Table 2, "negative regulation of transcription from RNA polymerase II promoter", "aging" and "cellular response to DNA damage stimulus" were the three most obvious biological process terms (Fig. 4A). "cytosol", "nucleoplasm" and "extracellular exosome" were the three most significant cellular component terms (Fig. 4B). For molecular function, "protein binding", "ligase activity" and "chromatin binding" were three most enriched categories (Fig. 4C). More importantly, we found that the "Ubiquitin mediated proteolysis" pathway was most significantly correlated with these genes (Fig. 4D).

Prognostic signatures for TNBC patients
We chose the most important survival-related AS events (excluding ME events, < 20) among the six types of AA, AD, AP, AT, ES, and RI as candidates in seeking for independent prognostic factors for TNBC patients. In order to eliminate the events that may not be independent factors in the prognosis gene models, the six types of candidate splicing events were applied by Multivariate Cox regression models respectively. The six signatures established with different types of AS events revealed a powerful ability to distinguish between improvement and deterioration in TNBC patients. Based on the six different types of AS in the TNBC cohort, the survival times of the two groups in each prognostic model were significantly different as shown in Fig. 5A-G. Furthermore, as presented in Fig. 5H, the AUC were significantly different in different splice type models. ROC curves were used to validate the efficiency of these prognostic models. Compared to other type of AS models, the ROC curve demonstrates that the ultimate use of all types of AS predictors has the highest efficiency in distinguishing between improvement and deterioration in TNBC patients (AUC 0.8). These independent prognostic and specific AS events were further in depth analyzed to establish the final prognostic predictors for TNBC in Table 3. The final prognostic signature was the most ideal predictor (Fig. 6A). This signature was easier and more valuable because it could well distinguish TNBC patients with distinct clinical outcomes in clinical practice (Fig. 6B).

Construction of survival-associated AS events network
Nowadays, the phenomenon that the disordered AS events were mainly caused by many SFs has been widely recognized. Next, it was examined whether important portion of these AS events might be regulated by some key SFs that alter the expression of TNBC. A splicing regulatory network of 208 most significant survival-associated AS events in TNBC was characterized in Table 4. Using the genes expression levels calculated from the RNA sequencing data of level 3 of TNBC in TCGA, 11 SFs with significant genes expression levels in OS in TNBC were identified (P < 0.05) and possess the ability to predict the OS of TNBC patients (Fig. 7A, B). In the TNBC related network, the purple dots represented 11 survival-related SFs, and their expressions were significantly correlated with 12 survival-related AS events. The green dots indicated 7 AS events that were significantly related to the good patient survival (HR < 1), while the red dots indicated 5 AS events that were closely related to poor patient survival (HR > 1). Furthermore, the correlation between the genes expression levels of survivalrelated SFs and the PSI values of most survival-related AS events was calculated by the Spearman test. Related networks were built and only significant correlations as shown in Fig. 7C. We found an interesting phenomenon that most poor survival prognostic AS events were positively correlated (red lines) with the expression of SFs, while most good survival prognostic AS events were negatively correlated (green lines) with the expression of SFs. The dot plots showed the correlation between splicing factor ALYREF and AD of MT1X, and the correlation of splicing factor MFSD11 and ES of ANKDD1A (Fig. 7D, E). The high expression of ALYREF was significantly correlated with low survival rate, while the high expression of MFSD11 was significantly related to the good survival of the patients.

Discussion
AS was a universal regulatory mechanism for gene expression that allowed a single gene to produce . In our study, we not only explored the relationship between AS events and tumorigenesis, but also emphasized the potential role of SF in TNBC. A potential SF-AS related network was constructed between prognostic SF and the most significant survival-related AS events.
AS events with good prognosis were positively correlated with splicing factor expression, while AS events with poor prognosis were negatively correlated with splicing factor expression. For example, ALYREF is an RNA binding protein that ligates to transcription. It has been reported that ALYREF may not only be a molecular marker for early detection of lymph node metastasis, but may also be effective in preventing oral squamous cell carcinoma metastasis (Saito et al., 2013). However, the role of ALYREF in TNBC has not been explored. Therefore, whether the down-regulation of some specific SFs leads to a reduction in favorable prognostic AS events and an increase in poor prognosis AS events requires further verification by functional experiments. But we have raised important questions about the potential key SFs in TNBC. Up-regulation or down-regulation of SFs expression may lead to abnormal splicing and differential expression of splice variants. Upregulation of certain oncogenic SFs could promote TNBC progression. Because of the exploration of prognostic prediction models, more personalized methods for different patients could precisely target and regulate AS in TNBC patients, providing a wealth of biomarker candidates and potential targets for TNBC treatment.

Conclusions
In summary, we revealed the system characteristics of AS events in TNBC, and found that our final model of prognostic indicators related to survivor-related AS events performed well in the risk stratification of TNBC patients, which is promising in clinical applications. In addition, we also found a unique splicing-related network in TNBC patients.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.