Identification of m7G related genes and Construction the m7G Prognostic Signature
According to the MSigDB and previous literature, we sorted out 42 m7G-related mRNAs. With the use of the RNA-seq data of BLCA samples in TCGA, 1758 m7G-related RNAs were screened based on |Cor| > 0.4 and p-value < 0.001 (Fig. 2A). 231 m7G-related prognostic RNA included in the univariate Cox regression model and the top 50 were shown in Fig. 2B (p < 0.05). Then, the lasso Cox regression analysis was carried out for further screening in the training group. An 11-gene signature was acquired with the ideal λ value (Fig. 2C, D). Based on the following calculation, risk score was computed: risk score = (− 0:09660 * RAPGEF6 exp) + (-0.00036 * METTL7A exp) + (-0.01734 * TMC7 exp) + (0.00502 * TUBGCP5 exp) + (0.00524 * EIF3J exp) + (0.05632 * ASIC1 exp) + (-0.00595 * ATAT1 exp) + (0.00249 * PRPF19 exp) + (0.00123 * MORF4L1 exp) + (-0.57038 * LINC00310 exp) + (-0.00219 * MRPS6 exp). Base on median risk score, the training group was divided into high- and low set. The distributions of m7Gscore and survival status showed that the survival of the low-risk group was significantly higher than that of the high-risk group, and the Kaplan-Meier method supports this finding (Fig. 3A, B). The area under the curve (AUC) of the ROC curves were 0.720, 0.710 and 0.750, indicating that the m7Gscore performed well (Fig. 3C).
Validation of the m7G Prognostic Signature
To validate the model, the test group and the total cohort were analyzed using the same methods. The m7Gscore formula and cut point were the same as the training group. The distribution heatmap showed that the low-risk group had a longer survival time both in the test group and total cohort (Figs. 3D, G). Moreover, the OS of the low-risk patients was also significantly longer than that of the high-risk patients (Fig. 3E, test group: p < 0.001, HR = 2.26 (1.48–3.45); Fig. 3H, total cohort: p < 0.001, HR = 2.32 (1.72–3.13)). Ideal validation was provided by the ROC curves of the test group and total cohort with AUCs of 0.67, 0.63 and 0.61 (Fig. 3F) and 0.68, 0.65, and 0.67 (Fig. 3I).
Development and Evaluation of the Nomogram Model
We compared the clinical characteristics of the high and low-risk groups and presented them in the form of a heatmap (Fig. 4A). The result showed that papillary, lower grade and younger age were associated with significantly lower m7Gscore and better prognosis (Fig. 4B-E). After the univariate and multivariate Cox regression model was constructed, the m7Gscore, stage and age were confirmed as the independent prognostic factors (Fig. 4F, G). On the basis of important clinicopathological features and the m7Gscore, we have developed a nomogram model that can accurately predict OS in training group (Fig. 5A). The results of the calibration curve showed that the nomogram can effectively predict the 1-year, 3-year, and 5-year survival time of BCa patients (Figure S1A). The OS in the high-risk group was worse than that in the low-risk group by
Kaplan-Meier curves ((Figure S1B). The AUC values of ROC curves were 0.84 (1-year), 0.84 (3-year), and 0.85 (5-year) and the C-index was 0.76996 (95% CI: 0.72147–0.81845, p = 1.002E − 27) ((Figure S1C). Besides, compared with clinical features, the nomogram achieved the best clinical net benefit via DCA plot (Figure S1D).
The same method was used to construct the nomograms in the test set and in the total set (Fig. 5B, C). We used the test set and the total set to verify the effectiveness and accuracy of the model, and the calibration curves still showed a good fit (Figure S1E, I). Kaplan-Meier curves showed that patients in the high-risk group also significantly worse than those in the low-risk group, both in the test set and in the total cohort (Figure S1F, test set: HR = 2.94 (1.91–4.52), p < 0.001; Figure S1J, total cohort: HR = 3.34 (2.47–4.52), p < 0.001). The AUCs of the test set were 0.74, 0.66, and 0.63 (Figure S1G). In total cohort, the AUCs were 0.77, 0.74, and 0.74 (Figure S1K). The good predictive power of the model in both cohorts is shown by the DCA curves (Figure S1H, L).
Gene correlation and functional enrichment analysis
The correlation analysis of 11 m7G-related genes in BCa indicated that EIF3J was the center node significantly associated with other genes (Fig. 6A). Previous evidence suggests that eukaryotic translation initiation factors (EIFs) is an important oncogene and its multiple subunits are highly expressed in BCa and promote tumor proliferation and metastasis. In addition, five different gene sets were used to explore the underlying mechanisms of m7G (Figs. 6B-F). The results showed that the high m7Gscore phenotypes were significantly associated with several immune function and cancer pathways, including the antigen processing and presentation, CD8+ TCR downstream pathway, IL6/JAK/STAT3, MYC targets etc.
Tumor Immune Microenvironment of the m7Gscore-Defifined Groups
The ESTIMATE algorithm revealed that estimate, immune and stromal scores were significantly higher in the high-risk group than in the low-risk group, while tumor purity showed the opposite result (Fig. 7A). Figure 8B shows that high-risk patients have higher expression of CD4 + T cells, macrophages and eosinophils, but lower infiltration of B cells, plasma cells and regulatory T cells (Treg), suggesting that high-risk groups are more sensitive to immunotherapy. In addition, the high-risk group had higher scores for multiple immune functions, especially the expression of immune HLA family and checkpoints (Fig. 7C-E). These findings further suggest that immunotherapy may become an important treatment for patients in the high-risk group. To ensure the stability of these results, we analyzed the correlation between different tumor-infiltrating immune cells using several algorithms (Fig. 7F) and showed part of the correlation scatter diagram in Figure S2A-B.
Drug Sensitivity of the m7Gscore Defined Groups
To fully explore the potential of this signature in targeted drugs, the “pRRophetic” was used to predicted IC50 values of various drugs in the two groups. Consequently, patients in the high-risk group are less sensitive to chemotherapy drugs and bladder infusion drugs, such as Cisplatin, Gemcitabine, Docetaxel, Doxorubicin, Mitomycin.C, Camptothecin, Epothilone.B, Bleomycin, Vinblastine, paclitaxel (Fig. 8). In addition, there is evidence that the targeted drugs sorafenib and sunitinib may be more sensitive in the high-risk group.
Identification of m7G Related Patterns by Consensus Clustering
We categorized the TCGA cohort into distinct patterns with consensus clustering on the basis of the expression of the 11 hub genes (Fig. 9A). According to CDF and delta area, we divided the cohort into C1 and C2 groups (Fig. 9B, C). The result of 3dPCA revealed a substantial difference distribution in two patterns (Fig. 9D). The Sankey diagram showed that most samples of C1 belonged to the high-risk group, indicating a worse prognosis (Fig. 9E). Kaplan-Meier curves also found that the survival rate of the C1 group was significantly lower than that of the C2 group (Fig. 9F).
Tumor Immune Microenvironment of m7G Related Patterns
To discover the TME difference among the two patterns, we evaluated the TME score and found that C1 group have higher estimate score, immune score and tumor purity, but lower stromal score (Fig. 10A). We also looked at tumor-infiltrating immune cells and found that T cells, macrophages and DC cells were upregulated in C1 (Fig. 10B). Multiple immune functions were significantly higher in the C1 group than in the C2 group, similar to the M7G-related signature (Fig. 10C). The immune cell heatmap based on various algorithms showed that the level of immune cells in the C1 group was significantly higher than in the C2 group (Figure S2B). We further explore the HLA expression and BCa related immune checkpoint. The result showed the level of multiple HLA in C1 group was higher than C2 (Fig. 10D) and some checkpoint like PD-1, PD-L1, PD-L2 and TIGIT were higher in C1 group (Fig. 10E).