Identification of potential antigens of BLCA
The workflow was presented in Figure S1. 900 differentially overexpressed genes were screened out within a total of 1810 aberrantly expressed genes to identify potential antigens of BLCA (p < 0.05 & |log(FC)| >= 1) (Figure S2 A-B, SupplementaryTable1 (all differentially expressed genes) and SupplementaryTable2 (overexpressed genes)). It can be clearly found that the difference in the expression of over-expressed genes between cancer and normal samples. The chromosomes distribution of these genes was showed in Fig. 1A. We identified 15751 genes that mutated in TCGA-BLCA cohort and the top 30 mutated genes were showed in Fig. 1B. The altered genome fraction and mutation counts in individual samples were demonstrated in Fig. 1C-D. Of note, AHNAK2 and ARID1A were also the most frequently mutated genes with both altered genome fraction and mutation counts (Fig. 1E-F). In addition to AHNAK2 and ARID1A in top 10 candidates with altered genome fractions, ANK2, SAXL2, CHD2, CREBBP, CSMD3, DCC, DMD as well as DNAH5 all have the high mutation frequency (Fig. 1F). High mutation counts were also observed in FRGFR3, KDM6A, RYR2, TP53, TTN, KMT2D, SPTANI, ZFP36L1(Fig. 1E). Overall 764 genes, which were highly expressed and mutated in TCGA-BLCA cohort, were identified as potential candidate antigens (Fig. 2A, Figure S2f).
Then Gene Ontology (GO) analysis and KEGG、HALLMARK and REACTOME signaling pathway enrichment of these 764 genes were respectively shown in Fig. 2B and Figure S2C-E. The results indicated that these genes might involve in tumor-specific immune response and could be the potential candidates for mRNA vaccine development.
Identification of tumor antigens associated with BLCA prognosis and antigen presenting cells
To identify potential prognostic-related antigens, we performed univariate cox regression analysis on candidate genes based on overall survival (OS) in TCGA-BLCA cohort (P < 0.05), and finally found 102 prognostic-related potential antigens (Fig. 2A). We selected the 6 potential antigens, PTPN6, OAS1, PLA2G2F, ETV7, CHMP4C, and SPA4G, which were most associated with OS (Fig. 2C-D). We further used CIBERSORT algorithm to analyze the immune infiltration in TCGA-BLCA cohort. Then the spearman correlation between the expression of 6 potential antigens and B Cell, Macrophages, and Dendritic scores was detected. The final correlation score is distributed between 0.0011 (CHMP4C in B_cells) and 0.32 (CHMP4C in Macrophages), indicating that the correlation between potential antigens and immune infiltration scores is not evenly distributed, which also indicated the differences in immune levels of different genes. (Figure S3).
Immune subtypes identification
Tumor immune microenvironment might impact the efficacy of immunotherapy. And immune subtypes might be helpful to identify patients who can response to mRNA vaccination. We analyzed the transcriptional profiles of patients in TCGA-BLCA cohort and found 2323 immune-related genes. According to consensus clustering algorithm, by considering its cumulative distribution function and incremental area map, we chosed k = 3 for stable clustering of immune-related genes, and established 3 immune subtypes, named Cluster 1, Cluster 2, and Cluster 3, respectively(Figure 3A-C,3E). Principal component analysis showed the three subtypes were distinguished very well (Fig. 1D) with distinct OS. Cluster 3 associated with better prognosis, while Cluster 1 had the poorest survival probability (Fig. 3F). Subtype distribution across different tumor stages and grades indicated that patients diagnosed as differential stage were irregularly clustered And we could found that with the TNM staging progress, the proportion of Cluster 1 increases significantly (Figure S4). Taken together, the immunotyping could be used to predict prognosis of BLCA patients and its accuracy was superior to traditional grading and staging. We also validate the feasibility of this classification in the testing dataset GSE13507 (P = 0.014) (Fig. 3G).
The association of immune subtypes with mutational status
Previous studies demonstrated that tumor mutational burden (TMB) and mutation used to quantify the number of tumor antigens is closely correlated to immunotherapeutic efficacy, including mRNA vaccine(24). Therefore the TMB and mutations frequency of each immune subtypes were also analyzed(Fig. 4). Our results showed that there was no difference between groups regarding TMB, while there were significantly difference of mutation number among these three groups, and Cluster 1 was obviously higher than the other groups. The waterfall diagrams of different subtypes were showed in Fig. 4A-B. We also visualized the G-score scores of copy number variation in different subtypes (Fig. 4F) and found significant differences between each subtype. Furthermore we also obtained the immune activity scores of the samples from TIP (http://biocc.hrbmu.edu.cn/TIP/), and showed the distribution of Cluster 1, Cluster 2, and Cluster 3 scores in immune cells, and showed statistical significance (Fig. 5A). It can be found that the immune subtypes we have identified are significantly related to immune activity. And then we analyzed differences in homologous recombination defects (HRD), neoantigen load, chromosomal instability, and dryness index between subtypes (Figure S5). These findings suggest that the immune subtype can predict TMB mutation number in BLCA patients, and that patients of Cluster 1 may respond positively to the mRNA vaccine.
Immune subtypes with immune checkpoints (ICPs) and immune infiltrating cells
Previous studies indicated that both ICPs (such as PD-L) (10) and immunogenic cell death (ICD) modulators (such as HMGB1)(25) play critical roles in modulating the host anti-tumor immunity, which could influence the efficacy of mRNA vaccine. Therefore, the differential expression of ICPs and ICD modulators was assessed in these immune subtypes (Fig. 5B-C, 5F). Our results showed that, 43 (91.5%) of ICPs in TCGA cohort were differentially expressed among the immune subtypes. And the expression levels of almost all genes in Cluster 1 and Cluster 3, such as CD274, CTLA4, PDCD1, IDO1 and TIGIT, were lower than those in Cluster 2. Interestingly, some specific ICPs in Cluster 3 seemed to be lower than those in Cluster 1 such as BLTA, CD200, CD200R1, CD27, CD274, CD276, CD28, CD48, CD77, CD80, CD86, CTLA4, ICOS, IDO1, LAIR1, PDCD1, PDCD1LG2, TIGIT, TNFRSF18, TNFRSF4, TNFRSF9, TNFRSF14. This result indicated that Cluster 3 might also be a potentially effective population for mRNA vaccines despite lower levels of TMB. For immune cell deaths (ICDs), The expression level of 17 ICDs of all 25 kinds of ICDs were significantly different in three immune subtypes. The level of CALR, CXCL10, FPR1, HGF, IFNAR2, MET, P2RX7, TLR4, PANX1 and TLR3 were highest in Cluster 2 than Cluster 1 and Cluster 3, EIF2AK2, EIF2AK4, ANXA1 were overexpressed in Cluster 1. While EIF2A, IF2AK1were both overexpressed in Cluster 1 and Cluster 3. Moreover, we showed that Cluster 2 had markedly higher immune score, higher stromal score and lower tumor purity (Figure S6) than Cluster 1 and Cluster 3. The results showed that the subtype scan reflect the expression levels of ICPs and ICD regulators, and can be used as a potential therapeutic biomarker for mRNA vaccines. Also, considering the previous results of immune cell infiltration, Cluster 2 is an immune “hot” and immunosuppressive phenotype, while Cluster 1 and Cluster 3 were more likely to be immune “cold” phenotypes.
In a previous study, Thorsson et al. identified six immune categories (C1-C6) based on the immunogenomic analysis of more than 1000 tumor samples among 33 cancer types (26). These categories were significantly associated with prognosis, genetic, and immune-modulatory alterations in tumors. Thus, the distribution of six categories was also investigated in our study. A distinct distribution over Cluster 1, Cluster 2 and Cluster 3 was observed, and the individual immune categories substantially varied in their proportion in the three immune subtypes (Fig. 5E). Our result indicated that proportion of C2 in Cluster 2 was obviously higher than that in Cluster 1 and Cluster 3, while the proportion of C4 in cluster3 was significantly more than that of C4 in Cluster1 and Cluster2 and there was barely no C4 in Cluster 2. This result further validated the “hot” phenotype of Cluster 2 and the “cold” phenotype of Cluster 1 and Cluster 3. Therefore, mRNA vaccine administration in Cluster 1 and Cluster 3 might stimulate the immune response, namely turning “cold” tumor to “hot”.
Immune landscape of BLCA
Then we used the immune transcriptional profile of patients to construct immune landscape (Fig. 6A). As shown in Fig. 6B, Macrophages.M1 displayed the most obviously correlation among different subtypes. We found the distribution of samples in Cluster 1 and Cluster 2 was relatively discrete in immune landscape (Fig. 6A) and they can be further divided into two subgroups according to the location of immune cell populations (Fig. 6C). We then found the relative amount of immune cells were significantly differences among multiple subgroups (Fig. 6F-G). In addition, patients in group 5 had a higher survival probability when comparing with the samples in the extreme locations within the immune landscape (Fig. 6D-E). The immune landscape based on immune subtypes can accurately identify the immune components of BLCA patients and predict their prognosis, providing favorable conditions for personalized treatment options for mRNA vaccines.
Identification of immune gene co-expression modules of BLCA
Immune gene co-expression modules were identified by clustering the samples by WGCNA.Using the default parameters of the "WGCNA" R package with a soft threshold of 4 for scale-free network. The representation matrix was then converted to an adjacency and then to a topological matrix. We finally got 5 modules, of which the gray modules are not grouped together with other modules (Figure S7 and Fig. 6A). The survival analysis was also identified (Fig. 6B). It is found that the Turquoise module had significant prognostic efficacy, with differently expressed pathways such as cytokine pathway, natural killer meditated cytotoxicity pathway as well as antigen progressing and presentation pathway (Fig. 6C-D) .
Identification of differential expression genes (DEGs) between immune subtypes
To explore our immunotyping more deeply, we intersected the differentially expressed genes between distinct three clusters (p < 0.05) (Figure S7 D and SupplementaryTable3), and showed it in the heat map (Figure S7E).
Then, by intersecting the differential expression genes between subtypes (4505), potential antigen candidate genes (764) and prognostic module hub genes (518), we have finally defined 10 genes as key tumor antigen markers (Fig. 7A). We performed a univariate cox regression analysis on all key tumor antigen markers and screened 3 genes with P < 0.05 (Fig. 6B-C). Through LASSO regression analysis, we built a risk model (Fig. 2D). We found the expression of MMP9 and INHBA in the low-risk group was significantly lower than that in the high-risk group, while ERBB3 was significantly higher in low-risk group, which also confirmed their risk and provides a direction for clinical research (Fig. 2E). In addition, we also found that MMP9 is also a potential prognostic-related antigen, which is consistent with our previous study. It can be found that the risk scores are significantly different between subtypes, and the proportion of high-risk samples in Cluster 2 is relatively high, while the proportion of low-risk samples in Cluster 3 is relatively high (Fig. 2F). Taken together, we believed that ERBB3, MMP9 and INHBA were potential BLCA antigens biomarkers for mRNA vaccine development.