Molecular immune phenotypes in MIBC patients
Unsupervised clustering of 1601 MIBC patients using immune signature genes was performed. Three main clusters were identified as cluster 1, cluster 2, and cluster 3 (Fig. 1A). ssGSEA scores indicated that Cluster 3 was infiltrated with high levels of innate and adaptive immune cells. Cluster 2 was heterogeneously infiltrated with immune cells while cluster 1 was non-infiltrated. Cluster 3 showed significantly high expression of signatures of CD8 T cells, macrophages, Th1 cells, NK cells, and Tregs, followed by cluster 2 and cluster 1 (Fig. 1B). Immune-related pathways, involving interferon (IFN) signaling, TGF-beta signaling, Antigen presenting mechanism (APM), angiogenesis, epithelial-mesenchymal transition (EMT) were mostly enhanced in cluster 3, followed by cluster 2 and cluster 1. On the other hand, DNA damage response (DDR) and FGFR3 signaling were mostly enhanced in cluster 3, followed by cluster 2 and cluster 1. Cluster 3 was divided into three subclusters 3A, 3B and 3C and cluster 1 was divided into two subclusters 1A and 1B. Cluster 3C was the most “inflamed” subcluster with high levels of infiltrated immune cells and activation of immune-related pathways. On the contrary, cluster 1A was associated with “non-inflammation” due to the low levels of infiltrated immune cells and inhibition of immune cells activation. We used RNA-based ESTIMATE algorithm to confirm that cluster 3C displayed the highest stromal score and immune score, which was the least pure subcluster. The analysis of ssGSEA scores in metabolic pathways showed that hypoxia regulated genes and glucose deprivation related genes were increased in cluster 3, with a potential to activate immune cells through immunometabolic mechanisms. Cluster 3 also had the best overall survival compared to the other two clusters (Fig. 1C). The expression levels of IFNG, TGFβ, immune checkpoints PD1 and PD-L1 were most enhanced in cluster 3, while the expression level of FGFR3 was most enhanced in cluster 1 (Fig. 1D).
Differences in somatic mutations related to the immune phenotype
After applying unsupervised clustering in TCGA cohort, mutation information of each gene in each sample was exhibited in waterfall plot in different subclusters, as cluster 1A, 1B, 2, 3A, 3B, and 3C (Fig. 2A, 2B, 2C, 2D, 2E, 2F). Various colors with annotations at the bottom represented different mutation types. Detailed information on somatic mutations in different subclusters was shown in supplementary fig. 1. The top 5 mutated genes in subcluster 1A were TP53 (46%), TTN (39%), ARID1A (29%), KDM6A (29%), and SYNE1 (29%), while the most mutated genes in subcluster 3C were TP53 (53%), RB1 (27%), TTN (27%), FLG (24%), and ARID1A (22%).
DEGs between cluster 3C and cluster 1A
After applying unsupervised clustering in TCGA and IMvigor210 cohorts, DEGs between cluster 3C and cluster 1A were analyzed. 3790 significantly up-regulated DEGs and 4794 down-regulated DEGs were identified in TCGA dataset. In IMvigor210 cohort, 2950 up-regulated genes and 1067 down-regulated genes are identified. The intersection includes 2043 significantly up-regulated and 662 down-regulated genes (Fig. 3A). The volcano plots of TCGA cohort and IMvigor 210 cohort were shown (Fig. 3B). The GO analysis of upregulated DEGs in cluster 3C were enriched in immune cell activation involving T cell activation, leukocyte activation, phagocytosis, and leukocyte migration (Fig. 3C). KEGG analysis showed that the downregulated genes in cluster 3C were enriched in various metabolic pathways involving amino acid metabolism and fatty acid degradation (Fig. 3D).
Difference in genomic CNVs and methylation in DEGs between cluster 3C and cluster 1A
We then performed an integrated analysis of Multi-omics data on TCGA cohorts. To evaluate whether copy number variations (CNVs) affect transcription of affected genes, analysis of genomic alteration of DEGs indicated regions of copy-number gains (chromosomes 1,2,3,4,10,12,17,19,21) or loss (chromosomes 2,6,9,10,11,17) as characteristic features of cluster 1A compare with cluster 3C in the TCGA cohort (Fig. 4A). 159 upregulated DEGs in cluster 3C were coded by regions with a higher frequency for copy-number deletions in cluster 1A, while 62 upregulated DEGs in cluster 1A were encoded by regions with more frequent gains in the cluster (Fig. 4B). Global methylation data available for the TCGA cohort were analyzed. In total, differentially methylated probes of 349 genes were identified comparing cluster 1A and cluster 3C. Probes with significantly higher beta values in cluster 3C were located in the proximal promoter of 25 DEGs with higher expression in cluster 1A, while Probes with significantly higher beta values in cluster 1A were located in the proximal promoter of 16 DEGs with higher expression in cluster 3C (Fig. 4C).
WGCNA analysis
The intersected 2705 DEGs were used to perform subsequent WGCNA analysis. The power of β was set at 10 to ensure a scale-free network (Fig. 5C). Gene modules were calculated. The gray module identified the genes that can’t be classified to other modules. 12 gene modules were identified by the hierarchical clustering dendrogram (Fig. 5A). The relationships between modules and clinic traits were assessed. Many modules are correlated with survival time. The correlation coefficients of the red, yellow, blue, and pink modules were greatest, at 0.28 (0.001), 0.28 (0.001), 0.25 (0.005), and 0.23 (0.009), respectively (Fig. 5B). MM and GS value in the upper quartile of genes in the module were considered as key genes of this module (The MM and GS cut-off of these modules were 0.8 and 0.2, respectively). The gene distribution of red, yellow, blue and pink modules were shown (Fig. 5D). A total of 415 hub genes were investigated from 4 modules, which were analyzed using matascape for pathway and process enrichment analysis. GO analysis indicate these genes are closely related to cytokine-mediated signaling, regulation of cell activation, regulation of cytokine production and leukocyte differentiation (Fig. 5E).
Identification and validation of hub genes
Protein-protein interactions (PPI) were applied to identify 8 genes as hub genes (LCK, HLA-E, IRF1, IL2RB, IFNG, CCL13, CXCR6, PSMB10) (Fig. 6A). To validate our finding of hub genes, we examined TCGA and IMvigor210 cohorts. The KM curves for genes with P<0.05 in TCGA cohort were shown in supplementary fig. 2. IFNG, CXCR6, IL2RB, LCK, and PSMB10 were found to be closely related to prognosis in both cohorts (Fig. 7A, 7B, 7C, 7D, and 7E). We constructed an immune infiltration interaction network based on the 5 hub genes using the STRING dataset. The results were imported into Cytoscape for visualization (Fig. 6B)
In order to establish a model for predicting prognostic status, 5 genes were used to perform a lasso regression, with 10-fold cross-validation with 1000 repeats. λ value with the smallest Partial Likelyhood Deviance was shown in 2 of the 5 genes had coefficients that were not zero. A prognostic score model involving LCK and PSMB10 was established.
Risk Score=(-0.1973×LCK) + (-0.0091×PSMB10)
By predicting survival of patients at 5 years, the areas under the curve (AUC) of the ROC curves generated from the risk-based prediction model in the test data were 0.652 (Fig. 6F). The test data were obtained from IMvigor210 cohorts, consisting of patients receiving ICB therapy.
The decision tree illustrated decision rules, among all the 5 hub genes that were entered in the analysis, 4 were selected by the program for the classification tree (Fig. 8). The four parameters were LCK, IFNG, PSMB10, and CXCR6. LCK was the most essential determining factor, which was the first-level split of two initial branches of the classification tree. The mean accuracy, sensitivity, and specificity for predicting SD/PD were 70.1%, 70.0% and 71.7%.