Analysis of genome-wide mutation pro�le and establishment of risk signature for prognosis of bladder cancer

Background: Emerging studies have shown that a variety of gene mutations occur in development and progression of cancer and highly mutation genes could play oncogenic or tumor suppressive roles in cancer. Therefore, our aim is to explore mutation genes which affect the prognosis of bladder. Methods: Mutation pro�le was obtained and analyzed from TCGA data set. A mutation-based signature was established by multivariable Cox regression analysis. Kaplan-Meier was performed to assess the prognostic power of signature. Time-dependent ROC was conducted to evaluate predictive accuracy of signature for bladder cancer patients. Results: There are 20177 genes have alteration in 403 bladder patients and 662 of them were frequently variation (mutation frequency > 5%). In this study, we assessed the prognostic predictive ability of 662 highly mutated genes and identi�ed a mutation signature as an independent indicator for predicting the prognosis of bladder. The time-dependent ROC showed that AUC were 0.893, 0.896, 0.916 and 0.965 at 1, 3, 5 and 10 year, respectively. Strati�ed analysis and Multivariate Cox analysis showed that this mutation signature was reliable and independent biomarker. Furthermore, the nomogram predictive model can be used to effectively predict clinical prognosis of bladder patients. The decision analysis curve showed patients with risk threshold of 0.03-0.92 potentially yielded clinical net bene�t. Finally, we identi�ed several signaling pathways that associated with risk score by GSEA and KEGG analysis including PI3K-Akt signaling pathway and so on. Conclusions: In general,


Background
Bladder cancer (BC), the most common malignancy of the urinary tract, was the 9th most common cause of tumor-related death worldwide and was ranked fourth of the most common cancers in male, with an estimated 17,670 deaths (12,870 cases for male and 4800 cases for female) and 80,470 new cases (61,700 cases for male and 18,770 cases for female) in 2019 cancer statistics [1]. Indicated by the degree of detrusor involvement, bladder cancer can be clinically classi ed into the following categories: nonmuscle invasive (NMIBC) or muscle invasive bladder cancer (MIBC) through the stage [2]. Despite early prevention and intervention, many patients with MIBC are still at high risk of recurrence and death [2].
Although the prognosis of NMIBC is better than MIBC, we should not ignore the high recurrence risk of NMIBC patients [3]. Therefore, it is imperative to screen a novel potentially reliable prognostic indicators to forecast the survival of bladder patients.
So far, gene mutation theory, widely accepted by the scienti c community, believes that mutation in gene level are main cause of cancer. When it comes to the development of cancers, we have to talk about somatic mutations. It is reported that partial and complete tumor genome sequences exist hundreds to thousands of variants in most cancers including bladder cancer [4]. Consistently, in recent years, with the deepening research for cancer, increasing reports demonstrated that gene mutation exerted a positive or negative role in cancer [5][6][7][8]. Previous study have identi ed multiple frequently mutation genes in bladder cancer, including TP53, FGFR3, RB1, PIK3CA, KDM6A, CREBBP, EP300 and so on [9][10][11][12]. Several gene variants have been demonstrated to confer susceptibility to bladder cancer [13][14][15] and some mutation genes were identi ed novel biomarker for bladder cancer such as TERT and CCNE1 [16][17][18]. However, due to the high heterogeneity of cancer, including massive random genome alteration, it is therefore necessary to continuously research and evaluate the utility of gene mutation.
In this study, we investigated the whole mutation pro le in bladder patients from TCGA database. Then, we identi ed a mutation signature consisting of 23 mutated genes. This signature is an independent and high accuracy indicator to appraise the survival of bladder patients. Our results con rmed a novel panel of combined mutation genes that connection with the survival of bladder patients and combined risk score with conventional clinical characteristics have high clinical application value.

Patient Samples
Mutation data and expression pro les of genes and corresponding clinical information for bladder patients were available at The Cancer Genome Atlas of the National Cancer Institute (TCGA, http://cancergenome.nih.gov) and UCSC Xena (https://tcga.xenahubs.net). In this study, patients were removed if follow-up information was lacking (1 cases). The remaining 403 samples were further analyzed and clinical characteristics include age, gender, TNM stage and so on.

Mutation gene collection and signature construction
Using univariable and multivariable Cox regression analysis, a prognostic mutation signature and the risk score model formula were constructed as follows: risk score = β 1 gene 1 * mutation status of gene 1 + β 2 gene 2 * mutation status of gene 2 +…β n gene n * mutation status of gene n . In this formula, β indicates the coe cients of every mutated gene in multivariable Cox regression analysis. "0" means that the gene is not mutated, and "1" means that the gene is mutated including Missense Mutation, In Frame Del, Silent, RNA, Splice Site, Frame Shift Del, Frame Shift Ins and Transition Start Site.

Predictive nomogram of the mutation signature
The development and progression can be diagnosed and predicted using nomogram with a variety of indicators [19]. To elucidate the clinical application of our mutation signature, we assembled a predictive nomogram by utilizing the "rms", "nomogramEx" and "regplot" R package to estimate the 1-, 3-and 5-year OS of bladder patients. The factors included risk score, TNM stage, T stage, N stage, tumor status and lymphovascular invasion. To evaluate the clinical application of our nomogram, we performed decision curve analysis (DCA) and clinical impact curve. In DCA, the x-axis represented the percentage of threshold probability and y-axis represented net bene t. The clinical impact curve can appraise the number of highrisk individuals with different possibility and true positive number among 1000 participants [20].

Functional analysis
The GSEA (http://www.broadinstitue.org/gsea/index.jsp) was conducted by the JAVA.8 program using Molecular Signatures Database (MSigDB) [21]. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway was implemented to illustrate related signaling pathways and potential biological functions related to high risk or low risk groups using the Database for Annotation, Visualization, and Integrated Discovery website (DAVID, https://david.ncifcrf.gov/summary.jsp).

Statistical Analysis
The Kaplan-Meier analysis as well as univariate and multivariate Cox regression analysis were performed to identify and validate mutation genes signi cantly related to the survival of bladder patients. Univariate and multivariate Cox proportional hazard regression analysis of mutated genes in bladder were performed with R package 3.5.1 (https://www.r-project.org/, v3.5.1). The cBioPortal database was used to identify the co-occurrence and mutual exclusivity of mutated genes within the TCGA Pancancer Atlas

Identi cation of mutation genes related to the survival of bladder patients
To evaluate the in uence of somatic mutation genes on overall survival, we assembled data encompassing 403 bladder patients form TCGA database, listed in Table 1. By subjecting 662 highly mutated genes to univariate Cox analysis, we identi ed 57 mutated genes related to the overall survival of bladder patients (P < 0.05, Table 2). To construct a prognostic classi er, these 57 candidate mutated genes were considered candidates for further multivariate Cox analysis. The optimal tuning parameter identi ed twenty-three mutated genes: AHNAK2, RYR2, USH2A, DCHS2, BRCA2, RP1L1, FBN2, ZFYVE26, ANK1, NCOR1, F8, WNK1, IGSF10, COL11A1, RANBP2, MED1, FANCM, MYO18B, ZNFX1, MROH2B, AUTS2, CNOT1, COL5A1 ( Table 3). The mutation pro le of these mutated genes generated from cBioPortal for Cancer Genomics (Additional le 1a). The interactions between these mutated genes were also generated from cBioPortal. As shown in Additional le 1b, most of these mutated genes are co-occurrence, suggesting that these genes might play an essential role together. Furthermore, we built a risk score formula so as to calculate the overall survival of bladder patients based on the mutation status of individual mutated genes and their respective coe cients as follows: 23 mutated genes risk score = (-0.6344 * mutation status of AHNAK2) + (-0.5828 * mutation status of RYR2) + (-1.0395 * mutation status of USH2A) + (-0.9495 * mutation status of DCHS2) + (1.0108 * mutation status of BRCA2) + (-1.2290 * mutation status of RP1L1) + (-1.2238 * mutation status of FBN2) + (1.7481 * mutation status of ZFYVE26) + (-1.5427 * mutation status of ANK1) + (-1.4791 * mutation status of NCOR1) + (-1.1631 * mutation status of F8) + (-1.1986 * mutation status of WNK1) + (-0.8949 * mutation status of IGSF10) + (-0.7952 * mutation status of COL11A1) + (-1.3281 * mutation status of RANBP2) + (-1.8627 * mutation status of MED1) + (-2.0523 * mutation status of FANCM) + (-1.6193 * mutation status of MYO18B) + (-1.3542 * mutation status of ZNFX1) + (-1.0805 * mutation status of MROH2B) + (-1.5807 * mutation status of AUTS2) + (-1.5486 * mutation status of CNOT1) + (0.6871 * mutation status of COL5A1). Next, we computed the 23-mutation signature risk score for per patient. The distribution of the mutation risk score, the survival situation of bladder patients and the alteration of mutated genes were obtained. As shown in Fig. 2a, patients in high risk group have shorter survival and higher mortality than those in low risk group. Then, we used the median risk score as the cut-off point to divide bladder patients into highrisk group (n = 201) or low-risk group (n = 202). Kaplan-Meier curve were performed and the result appeared that there are shorter OS and poor prognosis for patients with high risk score (P < 0.0001, Fig. 2b). The time-dependent ROC curve displayed that the predictive sensitivity and speci city of the mutation signature were 0.893, 0.896, 0.916 and 0.965 at 1-, 3-, 5-and 10 year, respectively (Fig. 2c). Taken together, this mutation signature have high accuracy and is a potentially novel indicator to evaluate the patients' survival.

The mutation signature is an independent prognostic indicator
We conducted the univariate and multivariate Cox analysis so as to further determine if the risk score was an independent risk factor for clinical outcome of bladder patients. The univariate Cox analysis showed that risk score, TNM, T stage, N stage, tumor status as well as lymphovascular invasion were signi cantly associated with the prognosis of patients (Fig. 3a). Furthermore, Risk score and tumor status were considered as independent risk elements for the prognosis of bladder patients by using the multivariable Cox analysis (Fig. 3a). These results demonstrated that the mutation signature remain independently related to overall survival after adjustment by conventional clinical characteristics.
Next, we performed Kaplan-Meier curve to examine prognostic effect of clinical characteristics. The results showed that TNM stage (P < 0.0001), T stage (P = 0.0003), N stage (P = 0.0001), Tumor status (P < 0.0001) and Lymphovascular invasion (P = 0.0006) were importantly related to the survival of bladder patients ( Fig. 3b-d, 3 h-i). However, other clinical characteristics were not related to the patient' survival, such as age (P = 0.0941, Additional le 2a-e). Furthermore, we explored the correlation between risk score and clinical features. The results revealed that risk score was elevated in higher T stage and N stage ( Fig. 3f-g). Contrast to patients with tumor, patients with tumor free have lower risk score (Fig. 3j). Further, higher risk score was chie y enriched in patients with lymphovascular invasion (Fig. 3k). However, there are no signi cant correlation between the mutation signature and other clinical characteristic, including TNM stage and so on ( Fig. 3e and Additional le 2f-j).
To appraise whether our mutation signature manifest strong prognostic ability within same clinical features, strati ed analysis were further implemented. Patients were strati ed into different groups according to each clinical features. The strati ed survival analysis results showed that patients were divided into high-risk or low-risk group and there are shorter OS and poor prognosis for patients in highrisk group (Fig. 4). In general, the results above suggested that our mutation signature was independent risk factor to evaluate the survival of patients and could divide patients into subtypes with different prognosis.

Clinical Application of the mutation signature
Based on univariate Cox analysis and relationship analysis, we further constructed a prognostic nomogram including risk score, stage, T stage, N stage, tumor status and lymphovascular invasion to estimate the survival of 1-, 3-and 5-year of bladder patients (Fig. 5a). The points of each prognostic factor add up to get the total points and the higher the patients' points, the worse the prognosis. The internally validated Harrell's C-index was 0.817, demonstrating the mutation signature based nomogram exhibited superior performance in clinical condition. Taken together, these results suggested that the predictive nomogram has potentially high clinical application value. Furthermore, to evaluate clinical utility of predictive nomogram, we performed decision curve analysis as well as clinical impact curves. The decision curve analysis result showed that the risk prediction nomogram with the risk threshold probabilities ranging from 0.03-0.92. When using 0.55 as the risk threshold, patients yielded the best clinical net bene t (Fig. 5b). Clinical impact curve was performed to estimate the numbers of high risk people for each risk threshold and true positives. For instance, if 1000 patients were examined at a risk threshold of 0.55, about 220 patients would be deemed high risk, whereas about 198 patients were truly positives (Fig. 5c).

The mutation signature are better than other biomarkers in survival prediction
To determine if our mutation signature are better than other biomarkers, we also constructed a mRNA signature and a miRNA signature by univariate and multivariate analysis from differentially mRNA and miRNA between cancer and adjacent cancer, respectively. Then formula as follows: mRNA signature = miRNA signature = (-0.3472 * miR-590) + (0.2458 * miR-365a) + (0.6026 * miR-337) + (0.2907 * miR-192) + (-0.5377 * miR-127) + (-0.1839 * miR-378c) + (-0.2279 * miR-576) + (-0.3928 * miR-194-2) + (-0.1462 * miR-4636) + (0.0967 * miR-217) + (-0.2276 * miR-3913-2). Simultaneously, we collected other miRNA signature for predicting bladder patients' survival from previous studies [26]. Kaplan-Meier curve as well as time-dependent ROC analysis were conducted to compare their clinical outcome (Fig. 6a-c). The Kaplan-Meier results showed that compared to patients with other signature's low risk score, patients in our mutation signature's low-risk group have distinctly longer OS and better prognosis, similarly, compared to patients with other signature's high risk score, patients in our mutation signature's high-risk group have distinctly shorter OS and poor prognosis. The results of time-dependent ROC analysis displayed that the mutation signature have clearly higher AUC than other signature (Fig. 6c-d). In general, our mutation signature exhibit a better predictive ability for prognosis of bladder patients.
3.6 The higher risk score was related to higher expression of APOBEC3B Recent studies has suggested that APOBEC3B is a source of mutations in a variety of cancer embracing bladder cancer, ovarian cancer and so on [27][28][29][30]. Hence, we rst investigated the expression of APOBEC3B in various cancers using GEPIA website (Fig. 7a-b). The results con rmed that APOBEC3B is highly expressed in many cancer types, such as bladder cancer (Fig. 7a-c). In addition, the result also showed that APOBEC3B is highly expressed in bladder cancer using ulacan website (Fig. 7d). Then, we investigated the correlation of APOBEC3B expression with TNM stage as well as N stage. The results showed higher expression of APOBEC3B mainly enriched in advanced stage and higher N stage (Fig. 7eg). Next, we analyzed the relationship between our mutation signature and APOBEC3B to explore if our signature is associated with APOBEC3B and result showed that expression of APOBEC3B is higher in patients with high risk score (Fig. 7h). In general, these results demonstrated that APOBEC3B was not only high expression in bladder cancer, but also had a close connection with our signature.

Identi cation of signaling pathways correlated with the mutation signature
To identify the potential signaling pathways correlated with this mutation signature, we performed GSEA analysis and KEGG analysis. In the light of the results of GSEA analysis, we observed that hallmarks involving "In ammatory response", "Interferon alpha response" and "Interferon gamma response", were mainly enriched in low-risk group (Fig. 8a). "Epithelial mesenchymal transition"(EMT) and "Hedgehog signaling" were enriched in high-risk group although they are not statistically signi cant (Fig. 8b).
Simultaneously, we analyzed the differential gene expression data between high-and low-risk groups (|LogFC| > 2, P < 0.05). Then, based on the results of KEGG, upregulated genes in low risk group were primarily involved in "Pancreatic secretion", "Neuroactive ligand-receptor interaction" and so on, while upregulated genes in high risk group were primarily involved in "PI3K-Akt signaling pathway", "Ras signaling pathway" and so on (Fig. 8c). These results suggested this mutation signature might be participated in deterioration of BC through these pathways.

Page 8/24
4 Discussion BC has become one of the most common malignancy due to its increasing incidence and high recurrence rates [31]. Therefore, early diagnosis and prognosis prediction need to be taken seriously. It is common knowledge that the accumulation of genetic alteration in tumor-related genes is necessary for the development and progress of carcinogenesis [32]. Therefore, our study concentrated on addressing the clinical signi cance of mutated genes with highly frequency for bladder cancer.
Single markers have been used clinically for many years, but there are still shortcomings such as low sensitivity and speci city. So far, numerous studies have demonstrated that combination markers could improve sensitivity and speci city and better predict the prognosis of patients [33,34]. Furthermore, given the heterogeneity of bladder cancer, it is unlikely to identify its progression by a single marker. This underlines the importance of biomarker combinations [35]. In this study, we analyzed the whole mutation pro le and identi ed 23 mutated genes which had a clearly correlation with the prognosis of bladder patients. Furthermore, we found this mutation signature has high accuracy by time-dependent ROC analysis and was an independent indicator by multivariable Cox regression analysis.
Besides, more and more researches have demonstrated that combined molecular with clinical features have higher predictive ability than single biomarker [36,37]. Thus, nomogram and DCA analysis were established by combining risk score with clinical features involving TNM stage, T stage, N stage, tumor status and lymphovascular invasion. The results suggested that combined parameters performs better in survival prediction for bladder patients and has good clinical application. Our ndings provide a superior and helpful prognostic biomarker for bladder patients.
Recent evidence has implicated bladder exist four mutational signatures, including signature 1B, signature 2, signature 5 and signature 13. Both Signature 2 and 13 are mainly featured by C > T as well as C > G and have a close connection with the APOBEC families of cytidine deaminases. In our study, we also observed alteration are mainly featured by C > T together with C > G. APOBEC enzymes usually play essential role in innate immune responses, embracing target retroviruses, demonstrating that there are interrelated among immunity mutagenesis, and viral infection in the progression of cancer development [38]. Based on the similarity of the sequence background of cytosine mutations caused by APOBEC enzymes in experimental systems, APOBEC1, APOBEC3A and APOBEC3B appear to play a greater role in human cancer than other members of the family. Hence, we investigated the expression of APOBEC1, APOBEC3A and APOBEC3B and their expression based on clinical characteristics in bladder. APOBEC1 and APOBEC3A are not signi cantly differential expressed in bladder tumor (Additional le 3). And we found APOBEC3B is not only highly expressed in tumor, but also its higher expression is related to advanced TNM stage and higher N stage (Fig. 7). Thus, we further explored relationship between our mutation signature and APOBEC3B and found patients with high risk score have higher APOBEC3B expression, suggesting our signature is a reliable predictor to estimate survival of bladder patients.
It is well known that the epithelial-mesenchymal transition (EMT) process plays a signi cant role in embryonic development as well as tissue repair and tightly associated with cancer metastasis and invasiveness including bladder cancer [39,40]. In human, hedgehog homologs was divided into three types including sonic hedgehog (Shh), desert hedgehog (Dhh) and Indian hedgehog (Ihh) [41]. Shh is currently the best studied of these three homologs, [42]. Previous study suggested that sonic hedgehog (Shh) is a crucial element for the development and progression of bladder cancer [43]. The PI3K-AKT signaling pathway, a well-known cancer-related signaling pathway, has close relationships with cell growth, cell cycle, survival as well as metastasis in various cancer including bladder cancer [44,45]. Therefore, it is highly possible that these selected genes may in uence the bladder progression through these pathways. Besides, interferon is a cytokine produced by monocytes and lymphocytes. It is divided into three types: α, β and γ. Many studies have shown that interferon (especially α-interferon and γinterferon) has an obvious anti-cell proliferation effect [46]. Therefore, low-risk patients have a better prognosis and reduce mortality through these immune response and in ammatory response.
It has been reported that some of the screened genes play an essential role be in the tumorigenesis, including bladder cancer. For example, NCOR1 gene was found to be high frequently mutation in bladder cancer. In our study, NCOR1 is not only a highly mutated gene in bladder cancer but also associated with better prognosis of patients. Another candidate, bladder patients with BRCA2 gene mutation have a better prognosis [47]. In our study, we also observed mutated BRCA2 exerts a protective role. However, when combined with other mutated genes, it played a risky role for bladder patients. As for the rest of mutated genes, such as AHNAK2, RYR2, DCHS2, USH2A, RP1L1 and so on were found to be mutated in various cancer and diseases [48][49][50][51][52][53][54][55]. Although the effect of these genes' mutation on bladder cancer were previously unclear, our ndings indicated that these genes were associated with patients' survival and deserve further investigations in bladder cancer.

Conclusions
In conclusion, we established a mutation signature related to clinical outcome of bladder patients. Furthermore, this signature is a novel independent indicator and high accuracy for survival prediction. More importantly, we perform a predictive model to effectively predict clinical outcome of bladder patients through integrating risk score with clinical characteristics.

Declarations Ethics approval and consent to participate
The data obtained in this study were rooted mainly in the UCSC and TCGA databases, GEPIA and UALCAN website, which are open access.

Consent for publication
Not applicable.

Competing interests
The authors declare that they have no competing interests.       lymphovascular invasion (k) In univariable and multivariable Cox analysis, P value with Bold represents P<0.05. Red represents the factor was risky factor and blue represents the factor was protective factor.

Figure 4
Strati cation analysis of the mutation signature. Kaplan-Meier analysis of the mutation signature for patients with different clinical characteristics, including age < 69 or age ≥ 69, female or male, stage I+II or stage III+IV, N0 stage or N1+2+3 stage, M0 stage or M1 stage, no lymphovascular invasion or lymphovascular invasion, non-papillary or papillary, tumor free or with tumor and non-white or white.