3.1 Identification of BC molecular typing
Details of sample statistics are shown in Table 1. The expression profiles of IRGs was extracted from the TCGA dataset. In total, 646 IRGs were characterized after removing genes with no or low expression. Cox univariate regression analysis identified 81 genes that were significantly associated with prognosis. Three optimal molecular subtypes for IRGCluster were obtained using the NMF algorithm, with C1 (112), C2 (118) and C3 (165) (Fig. 1A). Unsupervised clustering was employed to cluster expression profiles of 81 genes. Three gene sets: gSig1 (23), gSig2 (37) and gSig3 (21) were obtained. gSig1 was highly expressed in C1 subtype but its expression was lowest in C3 subtype. gSig2 was highly expressed in C1/C2, but generally lowly expressed in C3. Expression of gSig3 was highest in C3 but lower in C1/C2 (Fig. 1A).
Table 1
Pre-processed dataset clinical information.
Characteristic | TCGA | GSE31684 | GSE32894 | GSE77952 | UROMOL |
Survival status | Alive | 221 | | 196 | | |
Dead | 174 | | 25 | | |
Mean OS | Alive | 1026 | | 1291.72 | | |
Dead | 564.4 | | 596.3 | | |
pT | T1 | 10 | 10 | 61 | 9 | 108 |
T2/T3/T4 | 379 | 78 | 51 | 14 | 7 |
Ta/CIS | | 5 | 109 | 7 | 343 |
Un | 6 | | | | |
pN | N0 | 228 | | | | |
N1 | 44 | | | | |
N2 | 75 | | | | |
N3 | 7 | | | | |
Un | 41 | | | | |
pM | M0 | 189 | | | | |
| M1 | 10 | | | | |
| Un | 196 | | | | |
Stage | I | 2 | | | | |
II | 124 | | | | |
III | 137 | | | | |
IV | 130 | | | | |
Un | 2 | | | | |
Grade | High | 374 | 87 | 44 | | 179 |
Low | 18 | 6 | 175 | | 272 |
Un/PUNLMP | 3 | | 2 | | 7 |
NewEventTissue | Bladder | 11 | | | | |
Bone | 22 | | | | |
Liver | 12 | | | | |
Lung | 30 | | | | |
Lymph Node Only | 22 | | | | |
Other,specify | 37 | | | | |
Renal Pelvis | 4 | | | | |
Un | 257 | | | | |
NewEventType | Distant metastatic | 78 | | | | |
Locoregional | 46 | | | | |
Metastatic | 10 | | | | |
New primary tumor | 7 | | | | |
Un | 254 | | | | |
Age | ≤ 60 | 106 | | | | |
> 60 | 289 | | | | |
Smoking | Lifelong Non-smoker | 105 | | | | |
Current smoker | 87 | | | | |
Current reformed smoker for > 15 years | 108 | | | | |
Current reformed smoker for ≤ 15 years | 71 | | | | |
Current reformed smoker, duration not specified | 11 | | | | |
Un | 13 | | | | |
Overall, 81 genes were significantly differentially expressed across the three subtypes. In addition, FGFR3 mutations were significantly higher in C3 than in C1/C2 (Fig. 1A), while RB1 mutations were significantly higher in C1/C2 (Fig. 1A). Overall survival (OS) varied significantly among the subtypes (Fig. 1B). The expression of C3 correlated with the best prognosis whereas C1/C2 subtypes were associated with the worst prognosis. In the external independent verification set GSE32894, C1/C2/C3 had equally significant prognosis difference in OS, with C3 subtype having significantly better prognosis compared with C1/C2 (Fig. 1C). In bladder cancer, high-grade often associates with worse prognosis compared with low grade types.
When the high/low grade BC was compared in the IRGCluster, results showed that almost all of the low grade (17/18) were distributed in the best prognostic subtype C3, whereas high grade cancers contained the three subtypes, C1/C2/C3 (Fig. 1D). The expression of the three subtypes was compared with mRNA cluster typing reported by A Gordon Robertson et al. [21]. Our findings showed that 70.29% of samples in Cluster1 were Basal-squamous. Cluster 3 contained 92.59% of Luminal-papillary and 80.77% of Luminal type samples, whereas Cluster 2 contained 80.52% of Luminal-infiltrated and 68.42% of Neuronal subtype samples (Fig. 1E). Marker genes from a previous study [22] were also evaluated. Results showed that the expression of Basal marker gene was significantly high in C1 subgroup than the other two groups, whereas the expression of Immune/Infiltrate marker gene was significantly higher in C2 subgroup than in the other two groups. This may be one of the factors contributing to poor prognosis in C2 group. The expression of Luminal marker gene in C3 group was significantly higher than that in the other two groups. (Fig. 1F). These data indicated that high grade BCs are similar pathologically, but exhibit different immune characteristics. These differences are associated with different prognostic outcomes.
3.2 IRGCluster characteristic gene score and functional analysis
The scores for gSig1, gSig2 and gSig3 were calculated based on the average expression level of each set. The gSig scores were significantly different in the IRGCluster (Fig. 2A). These results were consistent with pathological features of the subtypes generated from the GSE32894 dataset (Fig. 2B). This further validated the uniformity between IRGCluster and gSig. Analysis of OS of gSig showed that gSig score was significantly associated with OS, with gSig1 and gSig2 being associated with poor prognosis, whereas gSig3 was associated with better prognosis (Fig. 2C). GO enrichment analysis revealed that gSig1 was mainly associated with degranulation of neutrophils and neutrophil activation (Fig. 2D). gSig2 was associated with steroid hormone mediated signaling pathway (Fig. 2E), whereas gSig3 was associated with secretion of extracellular matrix and muscle cell proliferation (Fig. 2F). Interestingly, no significant enrichment insight was found in the KEGG pathway. These results suggested that the three gene sets regulate different biological pathways, which may be one of the factors contributing to different clinical outcomes in patients with BC.
3.3 Clinical characteristics of IRGCluster and differential expression of characteristic immune gene sets
We first analyzed the distribution of T, N, M, Stage, Smoking and Age among the three molecular subtypes. Samples with advanced stage of tumors such as T, N, were more likely to fall in C2 subtype, and associate with worst prognosis (Table 2). However, no difference was observed in the distribution of characteristics such as smoking and metastasis between the three molecular subtypes. The worst prognosis category, C2, mainly comprised of old individuals between 70 to 100 years (Table 2). Based on the high/low grade classification provided in the TCGA and GSE32894 and UROMOL data sets, we compared the gSig scores on different levels of samples and found that the gSig scores showed very high agreement on the three sets of data sets (Fig. 3A-C). Both gSig1 and gSig2 scores, which were associated with poor OS, were significantly higher in high grade than in low grade tumors. However, gSig3 score, which was associated with better OS, was significantly higher in low grade than in high grade tumors (Fig. 3A-C). The gSig score was consistently distributed between the muscle invasive and non-muscle invasive (Fig. 3D-F). These results imply that the gSig score could be a marker for BC.
Table 2
Comparison of clinical features with IRGCluster.
Characteristic | C1 | C2 | C3 | p-value |
pT | T1 | 2 | 3 | 5 | 0.00116 |
T2 | 51 | 38 | 97 |
T3 | 44 | 58 | 49 |
T4 | 14 | 17 | 11 |
Un | 1 | 2 | 3 |
pN | N0 | 65 | 60 | 103 | 0.00751 |
N1 | 17 | 15 | 12 |
N2 | 16 | 36 | 23 |
N3 | 2 | 1 | 4 |
Un | 12 | 6 | 23 |
pM | M0 | 55 | 36 | 98 | 0.19091 |
M1 | 1 | 4 | 5 |
Un | 56 | 78 | 62 |
Stage | I | 0 | 0 | 2 | < 1e-5 |
II | 33 | 17 | 74 |
III | 43 | 49 | 45 |
IV | 36 | 52 | 42 |
Un | 0 | 0 | 2 |
Grade | High | 111 | 117 | 146 | 0.00002 |
Low | 1 | 0 | 17 |
Un | 0 | 1 | 2 |
New Event Tissue | Bladder | 4 | 2 | 5 | 0.12681 |
Bone | 4 | 5 | 13 |
Liver | 5 | 3 | 4 |
Lung | 7 | 11 | 12 |
Lymph Node Only | 4 | 11 | 7 |
Other, specify | 16 | 15 | 6 |
Renal Pelvis | 1 | 1 | 2 |
Un | 71 | 70 | 116 |
New Event Type | Distant Metastasis | 21 | 27 | 30 | 0.46538 |
Locoregional | 16 | 13 | 17 |
Metastatic | 2 | 6 | 2 |
New Primary tumor | 3 | 3 | 1 |
Un | 70 | 69 | 115 |
Age | ≤ 60 | 29 | 23 | 54 | 0.0448 |
> 60 | 83 | 95 | 111 |
Smoking | Lifelong Non-smoker | 28 | 27 | 50 | 0.29977 |
Current smoker | 28 | 20 | 39 |
Current reformed smoker for > 15 years | 34 | 37 | 37 |
Current reformed smoker for ≤ 15 years | 15 | 25 | 31 |
Current reformed smoker, duration not specified | 2 | 5 | 4 |
Un | 5 | 4 | 4 |
*Chisq-test |
3.4 Immune landscape features for IRGCluster
The scores for six immune cells in the BC sample from the TCGA dataset were calculated using TIMER (tumor immune estimation resource) tool. Although there was significant difference between scores of the six immune cells in IRGCluster, C3 subtype displayed a different pattern from C1/C2. Except for B cell, the expression of other types of immune cells was significantly higher in C1/C2 than in C3 (Fig. 4A). The scores of the 22 immune cells calculated based on cibersort algorithm [23] also demonstrate that the score for memory B cells in C3 was significantly higher than that of C1/C2 (Fig. 4B). This expression pattern reflects the specific response pattern of the immune system in the IRGCluster. Analysis of immune expression signatures showed that there was a significant difference between C3 and C1/C2. For C3 that was found to be associated with positive prognosis, the scores of leukocyte fraction, proliferation, macrophage regulation, lymphocyte infiltration, IFN gamma response and TGF beta response were all significantly lower than those of C1/C2 (Fig. 4C).
3.5 The relationship between gSigs and immune checkpoints
There were 19 immune checkpoint genes (ICGs) in the TCGA dataset. gSig1 and gSig2 were positively correlated with expression of 18 ICGs, whereas gSig3 was negatively correlated with expression of ICGs (Fig. 5A). Similar findings were observed in independent GSE32894 dataset (Fig. 5B). The correlation analysis revealed that there was a strong positive association of gSig1 and gSig2 with 14 of 15 ICGs and 15 of 16 ICGs, respectively, but there was a strong negative association of gSig3 with 14 of 18 ICGs (Fig. 5C). Similar results were observed after an independent analysis on GSE32894 dataset (Fig. 5D). The relationship between the expression of the immune checkpoint gene IDO1 and the gSig score was further analyzed. In both TCGA and GSE32894 datasets, IDO1 was at a lower expression level in the best prognostic C3, but it was highly expressed in C1/C2 with poor prognosis (Fig. 5E-F).
3.6 Genome heterogeneity of IRGCluster
For the IRGCluster of BC, we analyzed gene mutations among C1/C2/C3. A total of 45 genes with significant mutations were obtained. The frequency of FGFR3 mutation was significantly high in C3 subtypes, whereas RB1 mutations were prominent in C1/C2 subtypes. The prevalence of TP53 and PIK3CA gene mutations was similar among the three molecular subtypes (Fig. 6A). Neoantigens and tumor mutation burden (TMB) between IRGclusters were similar among the three molecular types (C1/C2/C3) (Fig. 6B). There was no correlation between gSig score and TMB (Fig. 6C-E). These findings suggest that hot spot mutations of some important high frequency mutated genes may be associated with the prognosis of different IRGclusters compared to the overall tumor mutation burden.