Identification metabolism-related genes using GSEA
In the aggregate 409 samples including 391 BLCA and 18 healthy samples with their clinical features information were achieved from the TCGA as training cohort and 165 BLCA samples achieved from the GEO as external validation cohort (Table 1). We integrated mRNA expression profiles and clinical information, and the limma algorithm was used to identify the mRNAs that were differentially expressed in the BLCA samples relative to healthy samples in training cohort, there were 132 up-regulated and 33 down-regulated genes with respect to healthy samples were obtained (p < 0.05). GSEA was used to explore whether there were significant genomic differences between BLCA and normal tissues in different kinds of metabolic phenotypes. Finally we found that these genes are significantly enriched in amino acid (ACID_AMINO_ACID_LIGASE_ACTIVITY), fatty acid (GO_FATTY_ACID_BETA_OXIDATION) and glycolysis metabolism. Glycolysis was the most obvious pathway in two gene sets (HALLMARK_GLYCOLYSIS and REACTOME_GLYCOLYSIS)(Fig. 1). Then we analyzed different expressed mRNAs and found in total 151 participating genes, including 109 genes in HALLMARK_GLYCOLYSIS gene sets (NES = 1.71, nominal P < 0.005, FDR < 0.005) and 42 in REACTOME_GLYCOLYSIS (NES = 1.62, nominal P < 0.05, FDR < 0.05) respectively.
Construction of glycolysisrelated genes and BLCA prognosis models
Univariate Cox regression analysis were used to investigate the relationship between Glycolysis and the prognosis of BLCA patients as preliminary sizing, of which four genes with the cut off P < 0.05 were primary selected. Then used multivariate Cox regression analysis to examine the relevance between the expression of four genes and the survival of patients. Of which we built the prognostic model constructed with a protective role (Arginine kinase 3, AK3) with HR < 1 with lower expression and a risk role (Galactokinase 1, GALK1 and nucleoporins 205, NUP205) with HR > 1 with higher expression (Fig. 2a), and the risk score formula came out as the followed: -0.33227* AK3 + 0.462604* GALK1 + 0.539478* NUP205.
BLCA patients were ranked by their risk scores and split into a high risk group (n = 165) and a low risk group (n = 166) using the median risk score of the series (Fig. 2b). As shown in the Fig. 2c, a higher death rate was notable for high risk group than for those with low risk scores. In the model, low risk group had significantly longer overall survival time than high risk group rendered by a Kaplan-Meier survival curve and a log-rank test (Fig. 2d). Then we used our prognosis model with the ROC curve analysis (Fig. 2e), containing the 3, 5-year survival of the proportions under the receiver operating characteristic curve (AUC) which were 0.651 and 0.709 respectively, to verify the sensitivity and specificity in predicting survival for BLCA patients. We used both univariate and multivariate Cox regression analyses to assess whether our prognosis model was an independent predictor of BLCA. The results revealed a moderate and independent prognostic power for risk scores (Fig. 2f,g), which further validated the dependability of our previous prognostic model.
Hierarchical analysis between the clinical characteristics and the three glycolysis-related gene signature
The results of univariate and multivariate Cox regression analyses turned out obviously differences between the high and low risk groups associated with risk scores, age, clinicopathology and cancer stage according to the AJCC (Fig. 2f,g). On the grounds of our univariate Cox regression analyses results, we used stratification analysis to explored the relationship between the overall survival and different clinical features, and found that age, tumor status including T stage, N stage and M stage and stage according to our three glycolysis-related gene signature were significantly related to the survival, which were rendered by a Kaplan-Meier survival curve and a log-rank test (Fig. 3a-e). Next we classified the patients into different subgroups with median risk core by our signature according to age (≤ 65 to > 65), T stage (T0-2 to T3-4), N stage (N0 to N1-3), M stage (M0 to M1-3) and AJCC stage (stage I–II to stage III-IV). As shown in the Fig. 3f-j and Sfig. 1a-e (Additional files), Kaplan-Meier curves showed that in the training cohort patients with high-risk scores brought out poorer prognosis in T3 − 4 subgroup, N0 subgroup, M0 subgroup, higher stage and elder (age > 65) patients (P < 0.05) relative to the other group, which suggested that our three glycolysis-related gene signature may have a better prognostic value in higher malignancy BLCA patients.
Validation of the three glycolysis-related gene signature for survival prediction
Next we verified our prognosis model in training cohort and whole cohort to further confirm its reliability. We analyzed expression levels of three glycolysis-related genes in both the validation cohort and whole cohort, respectively, GALK1 and NUP205 were expressed higher in the high risk group while AK3 was expressed lower (Fig. 4a,b). To validate the survival prediction of the three glycolysis-related gene signature, risk scores were calculated for each patient in both the validation cohort and whole cohort. As the risk score formula mentioned before, BLCA patients were ranked and split into a high risk group or a low risk group using the median risk score from the training cohort as the cutoff point (Fig. 4c,d). As expected, a higher death rate was noted for BLCA patients with high risk scores than for those with low risk scores in each cohort (Fig. 4e,f). In accord with our training cohort results, low risk patients revealed longer overall survival time than high risk patients in both validation cohort and whole cohort (Fig. 4g,h) as p < 0.001 in each cohort, ROC curve analysis also showed independent prognostic power of our model (Fig. 4i,j). Then we further analyzed the association between our three glycolysis-related gene signature and the clinicopathological features of BLCA in the validation cohort. In accordance with our prediction before, results indicated that our model is notably related to older, higher clinicopathological stage and AJCC stage, suggesting the signature may play a crucial role in BLCA malignant progression (Fig. 4k-o). Then we analyzed the association between various clinical features and survival in our validation group. Kaplan-Meier curves also showed that in the validation cohort patients with progression clinicopathological features brought out poorer prognosis in T3 − 4 subgroup, N1 − 3 subgroup, M1 − 3 subgroup, higher stage and elder (age > 65) patients (P < 0.05) (Sfig. f-j), which also confirmed that our model possess a more accurate prognostic value for higher malignancy BLCA patients.
Genetic information of the three glycolysisrelated genes and validation of the expression levels
The frequently altered, or said mutated, sites of the three glycolysis-related genes and its frequency of mutation were analyzed by cBioPortal software (Fig. 5a,b). The network including AK3, GALK1, NUP205 and their most interacted neighbor genes were constructed by GeneMANIA (Fig. 5c). As shown in the picture, there were 23 genes including three target genes were automatic generated by GeneMANIA, consisted of physical interaction, co-expression and so on. To eliminate the inner relationship of the three hub-genes, we analyzed the correlation of the three genes and found that the absolute values of the correlation coefficient among them were all less than 0.3, indicated that there were no correlation among these three genes (Fig. 5d).
After that we also investigated the expressions of these three genes in normal and BLCA tissues. As illustrated in Fig. 5e, there were significant different expression levels between normal and cancer tissues. Immunohistochemistry (IHC) staining images were downloaded from the Human Protein Atlas database (https://www.proteinatlas.org/) to verify the expression. Images showed that the expression of AK3 in urothelial normal tissue was stronger than that in urothelial carcinoma tissue, while the intensity of GALK1 performs on the contrary (Fig. 5f,g), which also confirmed that in our signature based on three glycolysis-related genes there were both risk factors and protective markers to make a more accuracy prognostic model. But there were no data about IHC staining images of NUP205, which remained a defectiveness to be further investigated.
Nomogram establishment and validation
Then we built a nomogram to evaluate the possibility of the 3- and 5- year OS and establish a clinically method to predict the survival probability of BLCA patients based on the training cohort. Seven independent predictors including age, grade, AJCC stage, T-stage, N-stage, M-stage and risk score were incorporated to build the nomogram (Fig. 6a). The calibration plots indicated satisfying coherence between the nomogram predictions and actual observations (Fig. 6b, c). ROC curve analysis also showed the AUC of the nomogram at 3- and 5-year were 0.707 and 0.76(Fig. 6d).