1. Hierarchical clustering identified immune high and immune low subtypes
Hierarchical clustering analysis identified two clusters, the immune high subtype and the immune low subtype, among 680 samples from three merged GEO datasets with 405 samples from the TCGA-BLCA dataset. In the GEO dataset, 241 samples were clustered into the immune high subtype and 439 samples were defined as the immune low subtype. While in the TCGA-BLCA dataset, the immune high and immune low samples were 340 and 65, respectively. Both the two clusters could be well-separated in the two datasets (Supplementary Fig 1).
2. The landscape of tumor immune microenvironment in the GEO and TCGA datasets.
SsGSEA analysis was performed to estimate the abundance of the 28 immune cell types in each sample. The ssGSEA enrichment score of each sample was calculated to evaluate the landscape of the tumor immune microenvironment in the GEO dataset and the TCGA dataset. In the GEO dataset, the immune scores, stromal scores, and ESTIMATE scores of the immune high subtype samples were higher than the scores in the immune low subtype samples. However, the tumor purity scores were the opposite (Fig 1 a, Fig 2 b). The results of the TCGA dataset were consistent with those of the GEO dataset (Fig 1 b, Fig 2 g). The clinicopathologic characters were also presented in Fig 1. Most of the scores of the anti-tumor immunity cells and immune-suppressive cells of the immune high subtype were higher than those of the immune low subtype in the GEO dataset, except for the type 17 T helper cell, CD56bright natural killer cell, and immature dendritic cell (Fig 2 a). And all the scores of the anti-tumor immunity cells and immune-suppressive cells of the immune high subtype were higher than those of the immune low subtype in the TCGA dataset (Fig 2 f). Moreover, most of the expressions of HLA genes were higher in the immune high subtype than the immune low subtype in the GEO dataset (Fig 2 c), except for HLA-DPB2. And all the expressions of HLA genes were higher in the immune high subtype than the immune low subtype in the TCGA dataset (Fig 2 h). The expressions of the immune checkpoint genes such as CD8A, CTLA4, CXCL9, HAVCR2, LAG3, PDCD1LG2, and PRF1 were higher in the immune high subtype than the immune low subtype in the GEO dataset (Fig 2 d). And the expressions of the immune checkpoint genes such as CD8A, TIGIT, CD274, GZMB, IDO1, HAVCR2, CXCL9, IFNG, PDCD1LG2, CTLA4, TNF, GZMA, LAG3, CXCL10, PRF1, and PDCD1 were higher in the immune high subtype than in the immune low subtype in the TCGA dataset, while the expressions of the SIGLEC15 and TBX2 were lower in the immune high subtype than in the immune low subtype in the TCGA dataset (Fig 2 i). Finally, the immune-related KEGG pathways like antigen processing and presentation, autoimmune thyroid disease, B cell receptor signaling pathway, FC gamma r mediated phagocytosis, the intestinal immune network for IgA production, leukocyte transendothelial migration, natural killer cell mediated cytotoxicity, primary immunodeficiency, and T cell receptor signaling pathway were enriched in the GEO dataset by GSEA analysis (Fig 2 e). While the immune-related KEGG pathways such as antigen processing and presentation, autoimmune thyroid disease, B cell receptor signaling pathway, FC gamma r mediated phagocytosis, the intestinal immune network for IgA production, natural killer cell mediated cytotoxicity, primary immunodeficiency, and T cell receptor signaling pathway, TGF beta signaling pathway, and VEGF signaling pathway were enriched in the TCGA dataset (Fig 2 j). Interestingly, most of the immune cells were highly positively correlated with each other in the two datasets (Supplementary Fig 2).
3. Survival analysis based on immune cells and immune subtypes.
In the GEO dataset, survival analysis indicated that 22 immune cell types had a prognostic value with bladder cancers (Fig 3 a-v). The higher scores of the effector memory CD4 T cell, effector memory CD8 T cell, Immature dendritic cell, monocyte, T follicular helper cell, and type 17 T helper cell in the immune high subtype were associated with a good prognosis of bladder cancers, and the p-value was 0.011, 0.014, 0.003, 0.002, 0.044, and 0.019, respectively. Conversely, The higher scores of the activated CD4 T cell, activated CD8 T cell, activated dendritic cell, CD56dim natural killer cell, central memory CD4 T cell, central memory CD8 T cell, gamma delta T cell, MDSC (Myeloid-derived suppressor cell), memory B cell, natural killer cell, natural killer T cell, neutrophil, plasmacytoid dendritic cell, regulatory T cell, type 1 T helper cell, and type 2 T helper cell in the immune high subtype were related to a shorter survival probability, and the p-value was 0.002, 0.016, 0.003, 0.032, <0.001, 0.002, <0.001, 0.002, 0.036, 0.002, <0.001, <0.001, 0.007, 0.043, 0.029 and 0.013, respectively. Additionally, 16 immune cell types had a prognostic impact on the survival of bladder cancer patients in the TCGA dataset (Fig 3 w-l1). The higher scores of the activated CD4 T cell, activated CD8 T cell, CD56bright natural killer cell, effector memory CD8 T cell, immature B cell, and monocyte in the immune high subtype predicted a longer survival ability, and the p-value was 0.011, 0.004, 0.001, 0.009, 0.018, and 0.028, respectively. In contrast, patients with higher scores of the central memory CD8 T cell, effector memory CD4 T cell, eosinophil, Immature dendritic cell, mast cell, memory B cell, natural killer cell, neutrophil, plasmacytoid dendritic cell, and regulatory T cell had a poor outcome on bladder cancers, and the p-value was 0.002, 0.001, 0.047, 0.030, 0.003, <0.001, 0.042, 0.049, 0.006, and 0.022, respectively. Besides, in the GEO dataset, the percentage of alive and dead patients was 70% and 30% among the immune high subgroup, while among the immune low subgroup it was 73% and 27%, respectively (Fig 3 m1). And in the TCGA dataset, the percentage of alive and dead patients was 55% and 45% among the immune high subgroup, while among the immune low subgroup it was 63% and 37%, respectively (Fig 3 n1). However, the ESTMATEScores of alive and dead patients among the immune high and immune low subgroups were not statistically different in either the GEO or TCGA dataset (Supplementary Fig 3). Finally, Kaplan-Meier survival analysis revealed poorer survival (P=0.011) among the immune high subtype in the gse32894 dataset (Fig 3 o1), while it showed prolonged survival (P=0.039) among the immune high subtype in the gse48276 dataset (Fig 3 p1). However, there were no significant differences in the survival probability between the immune high and immune low subtype in the merged GEO dataset, the gse13507 dataset, and the TCGA-BLCA dataset (Supplementary Fig 4). Similarly, the survival probability of the female and male patients in both the two datasets among the immune high and immune low subgroup was not significantly different (Supplementary Fig 5).
4. Independent prognostic immune cell types in the GEO dataset and the TCGA-BLCA dataset
Univariate Cox regression analysis of the GEO dataset for the OS revealed that activated CD4 T cell, central memory CD8 T cell, gamma delta T cell, type 2 T helper cell, activated dendritic cell, MDSC (myeloid-derived suppressor cells), monocyte, natural killer cell, natural killer T cell, neutrophil, and plasmacytoid dendritic cell were statistically significant prognostic immune cell types for the OS of bladder cancer patients (P<0.05) (Fig 4 a). On multivariate Cox regression analysis, monocyte was revealed as the independent prognostic immune cell type for the OS of bladder cancer patients, for which the hazard ratio (HR) was 0 (0-0.4) and the p-value was 0.001 (Fig 4 b). Moreover, univariate Cox regression analysis of the TCGA-BLCA dataset for the OS demonstrated that activated CD8 T cell, central memory CD8 T cell, effector memory CD4 T cell, memory B cell, CD56bright natural killer cell, mast cell, and plasmacytoid dendritic cell were statistically significant prognostic immune cell types for the OS of bladder cancer patients (P<0.05) (Fig 4 c). By multivariate Cox regression analysis, activated CD8 T cell (P<0.001, HR 0.005(0-0.054)), central memory CD8 T cell (P=0.011, HR 2988.975(6.236-1432610.99)), memory B cell (P=0.025, HR 241.642(1.993-29303.584)), CD56bright natural killer cell (P=0.02, HR 0(0-0.282)) were revealed as the independent prognostic immune cell types for the OS of bladder cancer patients (Fig 4 d).
5. Construction and validation of gene risk score model
Differential expression analysis between the immune high and immune low subtype of the TCGA-BLCA dataset revealed 2403 differentially expressed genes (DEGs), among which 1963 were up-regulated and 440 were down-regulated (Fig 5 a). Heatmap of DEGs was shown in Supplementary Fig 6 a. Then these 2403 DEGs were intersected with 1793 immune-related genes, and 504 genes were found to be overlapped, among which 221 genes were shared by both datasets (Fig 5 b). Heatmap of differentially expressed immune-related genes was shown in Supplementary Fig 6 b. We next screened the prognostic genes (P<0.05) individually within the 221 shared genes in the GEO dataset and the TCGA-BLCA dataset using Univariate Cox regression analysis. As shown in Fig 5 c and d, 56 genes were identified as prognostic genes in the GEO dataset and 51 genes in the TCGA-BLCA dataset. Using Lasso regression analysis of the univariate model (Fig 5 e, f), which were based on the GEO dataset as the training set, we constructed the final gene risk score model, with the following parameters: the lambda.min=0.0225, and the Riskscore=(0.0153)×S100A9+(0.0026)×TNFRSF11B+(0.0045)×SLPI+(-0.3779)×CXCR6+(0.0116)×FCGR3A+(0.0353)×CD70+(0.2527)×IFI30+(0.0126)×PDGFC+(0.0211)×EDN1+(0.0565)×SPP1+(0.0927)×SFTPD+(0.2113)×IL18+(-0.1092)×SLC40A1+(-0.1691)×CD3G+(0.0308)×S100A8+(0.0845)×IFITM1+(0.023)×FCGR3B+(-0.2586)×IL15+(0.0508)×VEGFC+(0.0232)×PTX3+(0.0454)×AQP9+(0.0958)×GRP+(0.0481)×CCL8 (Fig 6 a). The correlations between the 51 prognostic genes in the TCGA-BLCA dataset and corresponding transcription factors were presented in Fig 5 g and h.
Calibration curves of the training cohort and the validation cohort both performed well for the 1-, 3-, and 4-year time points. ROC curves revealed that AUC for the 1-, 3-, and 4-year time points in the training cohort were 0.768, 0.780, and 0.766, respectively. And the AUC for the 1-, 3-, and 4-year time points in the validation cohort were 0.610, 0.577, and 0.561, respectively. The correlations of the 23 risk score genes and the 28 immune cell types in the training set and the validation set were displayed in Fig 6 f-g. The risk curve of the training cohort illustrated with higher risk scores indicating a higher risk of death and the risk score was inversely correlated with OS (the coefficient=-0.25, P=6.3e-08) (Fig 6 h-j). However, in the validation cohort, the negative correlation was not significant (Supplementary Fig 7). Survival analysis indicated poorer OS in the high-risk group both in the training cohort (P<0.001) and the validation cohort (P=0.045) (Fig 6 k-i). Besides, survival analysis indicated that higher TMB predicted a favorable survival outcome in the validation cohort (P<0.001) (Fig 6 m). Furthermore, the high-TMB & low-risk group showed the best survival, the high-TMB & high-risk group was the second-best, followed by the low-TMB & low-risk group, while the low-TMB & high-risk group showed the worst survival (P<0.001) (Fig 6 n). However, the TMB between the high- and low-risk groups was not significantly different. In the training cohort, The immune checkpoint genes like CTLA4, CXCL9, HAVCR2, LAG3, PDCD1LG2, and PRF1 displayed a higher expression in the high-risk group, except CD8A (Fig 6 u). Whereas in the validation cohort, the immune checkpoint genes like TIGIT, CD274, GZMB, IDO1, HAVCR2, CXCL9, PDCD1LG2, CTLA4, TNF, GZMA, LAG3, CXCL10, and PRF1 displayed a higher expression in the high-risk group, while SIGLEC15 and TBX2 showed a lower expression in the high-risk group, and the expressions of CD8A, TIGIT, IFNG, and PDCD1 were not significantly different between the high- and low-risk groups (Fig 6 v).
6. Construction and validation of CGRS model
Univariate cox regression analysis of the GEO dataset (training cohort) demonstrated that age, T stage, and gene risk score model were statistically significant prognostic factors (P<0.05) for OS (Fig 7 a). Multivariate cox regression analysis of the training cohort demonstrated that age (P<0.001, HR 1.039(1.019-1.059)), T stage (P<0.001, HR 1.847(1.555-2.194)), and gene risk score model (P<0.001, HR 2.492(1.800-3.451)) were independent prognostic factors for the OS of bladder cancer patients (Fig 7 b). The nomogram for predicting the probability of OS at 1, 3, and 5 years was presented in Fig 7 c. Each predictor corresponded to a point, and all points summed up to a total point. The total point then corresponded to the survival probabilities. For example, for one patient with a total point of 226, he/she’s survival probability for an OS less than 1, 3, and 5 years was 31%, 60.2%, and 75.5%, respectively. The calibration curves of both the training cohort (Fig 7 d) and the validation cohort (Fig 7 f) for 1, 3, and 4 years illustrated that our nomogram model performed well because the calibration curves showed good consistency between the predictions by the nomogram (the solid lines) and actual observations (the dotted line). The ROC curves of the training cohort (Fig 7 e) and the validation cohort (Fig 7 g) also proved the accuracy of our nomogram model. The AUC for 1, 3, and 4 years in the training cohort was 0.850, 0.744, and 0.756, respectively. The AUC for 1, 3, and 4 years in the validation cohort was 0.719, 0.696, and 0.695, respectively.