Construction of prognostic model in TCGA-BLCA
We conducted this study as description in the flow chart (Fig.1a). 391 cases with survival follow-up information were involved in subsequent analysis. The patients were randomly divided into a training cohort (n=260) and a test cohort (n=131). All the characteristics of patients with bladder cancer were summarized in Additional file 1: Table S1. Clinical parameters showed no statistical significance between the training cohort and test cohort except for TNM stage.
497 autophagy-related genes were obtained from RNA profiling data in training cohort. 51 survival-related candidates were identified after Log-rank test. 34 of them showed statistical significance by Univariate Cox regression analysis (Additional file 2: Table S2). Next, LASSO penalized analysis was employed to further narrow potential biomarkers (Fig.1b, Fig.1c). Finally, by multivariate Cox regression analysis we screened out six genes for prognostic signature construction from sixteen candidates. The six genes were distinct subgroup of the Ras family member 3 (DIRAS3), epidermal growth factor receptor (EGFR), histone cluster 1 H3 family Member E (HIST1H3E), neuregulin 3 (NRG3), protein kinase C delta (PRKCD), and signal transducer and activator of transcription 2 (STAT2). Here a risk score formula developed as: risk score = (1.54371×DIRAS3) + (0.09894×EGFR) + (0.17889×PRKCD) + (2.3629×NRG3) + (-0.46133×HIST1H3E) + (-0.39365×STAT2). We calculated risk score of each patient in both training and test cohort of TCGA. Based on optimal cutoff defied with “surv_cutpoint” function, the patients were divided into a high- and low-risk-score groups with a cut-off of 1.254 for risk score in training set, 1.786 in the validate cohort (Fig.2) and 1.525 in whole set of TCGA (Fig.3a). Kaplan–Meier curves showed that high-risk-score group had significantly poorer OS than that in the low-risk-score group in each set. The area under the curves (AUC) of ROCs for 1-,3- and 5- year overall survival was 0.728,0.744 and 0.756 respectively in training cohort ,0.796, 0.737, and 0.762 in the validation cohort, and 0.722, 0.691 and 0.714 in the whole TCGA cohort (Fig.4). These results indicated the favorable value of this model in prognostic prediction. Moreover, a clear separation was also found in Kaplan-Meier survival curves of four mRNA expression-based molecular subtypes of BC, introduced by A. Gordon Robertson (7), in Additional file 3: Fig. S1.
Validation of the prognostic gene signature
To further validate the reliability of the signature, GSE13507 serve as an independent dataset for. The risk score of all the individuals were calculated. In accordance with results obtained in TCGA-BLCA, patients with high-risk-score presented notably poorer OS than the counterparts (Fig.3b). The AUCs for 1-,3- and 5-year OS respectively were 0.692,0.634 and 0.579 (Fig. 4d). Collectively, these results implied this signature had potential generalization ability.
External validation in online database
Among the 127 patients in cBioportal, the six genes are altered in 32 (25%) of them (Fig. 5a). EFGR accounted for 11% genetic alterations and amplification mutation was quite common in this signature. In Oncomine database, statistical differences in mRNA level of signature were observed between tumor and normal tissues (Fig. 5b). IHC of five proteins encoded by corresponding genes (NRG3 was not included in this database) were presented in Fig. 5c and information was documented in Additional file 4: Table S3. After that, we investigated their DNA methylation level and status. DiseaseMeth version 2.0 indicated aberrant DNA methylation of the six genes (Fig. 6) and significant differences between the BC tissues and normal tissues. The results could be potential mechanisms for abnormal regulation of them. Some methylation sites in the DNA sequences of EGFR, HIST1H3E, and PRKCD were found in MEXPRESS online tool, which were negatively correlated to their expression levels (Additional file 5: Fig. S2). By using TIMER, we found EGFR, DIRAS3, STAT2 and NGR3 were conversely associated with purity (Additional file 6: Fig. S3, correlation=-0.128, correlation=-0.294, correlation=-0.298 and correlation=-0.116, respectively). While PRKCD was significantly correlated with tumor purity (correlation=0.23). In addition, elevated DIRAS3 correlated with CD8+cell and macrophage infiltration. Simultaneously, increased STAT2 expression positively related to neutrophil, and dendritic cell infiltration (p<0.05), prompting immune infiltration level.
Independent of prognostic value of the signature
To explore whether the signature was independent of clinical parameters, we extracted complete clinical information (including age, gender, race, tumor status and TNM stage) of people with bladder cancer and carried out univariate and multivariate Cox regression analysis. The results indicated that age, tumor status and risk score based on the six-gene signature were independent factors for overall survival of bladder cancer (Fig. 7a).
Setting up a predictive Nomogram
We integrated the six-gene classifier and clinical parameters to construct a nomogram for quantification of possible risk. Risk score, age, tumor status were clinical characteristics included in the nomogram (Fig.7b). The C-index of nomogram was 0.791 and this combined model performed favorably in accuracy regarding the OS (Fig.7c). To justify the clinical value, we applied DCA in TCGA dataset (Fig.8). The curve showed that when the threshold probability more than 20% for 1- year, 3-year and 5-year OS, patients with BC would gain more benefits from the nomogram (orange line) than either treating all (dark green) or treating none (black line) schemes. Within this range, the nomogram was superior to TNM staging system (purple line) and other candidate models in general, even though with several overlaps. These results demonstrated that the nomogram incorporated with age, tumor status and risk score is a favorable classifier to predict the OS of bladder cancer.