Identification of 146 DEMDGs in RCC
The research process of the study was showed in Fig. 1. Based on Wilcoxon rank test, a total of 5198 DEGs were selected from 28 RNA-seq transcriptome profiling of normal samples in GTEx dataset and 1021 RNA-seq transcriptome profiling (893 RCC samples and 128 normal samples) in TCGA dataset (FDR < 0.05 and |log2FC| > 2) (Table S1). The heatmap shows the expression of DEGs between RCC samples and normal samples (Fig. 2A). We screened the 270 methylation-driven genes (MDGs), whose methylation status negatively correlated with expression levels (Cor < −0.3, |log2FC| > 0 and adjust p < 0.05) (Table S2). The heatmap shows the expression of MDGs between RCC samples and normal samples (Fig. 2B). Then, 270 MDGs and 5198 DEGs were intersected to obtain 146 DEMDGs for further analysis (Fig. 2C). We further visualized the methylation levels (Fig. 2D) and gene expression levels (Fig. 2E) of 146 DEMDGs in RCC samples and normal samples. The comprehensive landscape of DEMDGs interactions and DEMDGs connection for RCC patients was depicted with the DEMDGs network (Fig. 2F). The above results found 146 significantly DEMDGs in RCC and normal samples. These abnormal DEMDGs were interconnected and may be involved in the occurrence and development of RCC.
Consensus clustering based on 146 DEMDGs and immune status analysis of clusters
To select optimized cluster number, we calculated k-means clustering algorithm with the ConsensusClusterPlus R packet. K = 2 was identified with optimal clustering stability (Fig. 3A-D). Then, we analyzed the methylation levels and gene expression levels of 146 DEMDGs, as well as the clinical features of paired patients. There were significant differences in the methylation levels and gene expression levels between cluster 1 and cluster 2, and the clinical features were evenly distributed in two clusters (Fig. 3E-F). The RCC patients in cluster 2 (n = 419) had a better overall survival (OS) than the patients in cluster 1 (n = 435, p < 0.001) (Fig. 3G).
Immune checkpoint inhibitors (ICIs) are administered for the treatment of RCC. We investigated whether the two clusters were related to ICI-related biomarkers. The results showed that cluster 1 was positively correlated with high expression of LAG3 (p < 0.001), CD160 (p < 0.001), HAVCR2 (p < 0.001), CTLA4 (p < 0.001) and TIGIT (p < 0.001), and the stromal score and immune score were significantly higher in cluster 1 compared with cluster 2 (p < 0.001) (Fig. 4A). The abundance of B cells naive (p < 0.001), T cells CD8 (p < 0.001), T cells CD4 memory activated (p < 0.001), T cells follicular helper (p < 0.001), T cells gamma delta (p < 0.001) and Macrophages M1 (p < 0.001) was significantly higher in cluster 1 compared with cluster 2 (Fig. 4B). The higher immune infiltration level corresponded to cluster 1, and lower immune infiltration level corresponded to cluster 2 (Fig. 4C). The RCC patients in high-immune score group had a worse OS than the patients in low-immune score group (p < 0.001) (Fig. 4D). Besides, we accessed the correlation of immune cells in cluster 1 and cluster 2. In cluster 1, the positive correlation between T cells CD8 and T cells follicular helper was the strongest, in which correlation coefficient was 0.55. The correlation coefficient between T cells CD8 and T cells CD4 memory resting was -0.66, the lowest negative correlation (Fig. 4E). However, in cluster 2, the cells with the strongest negative correlation were activated T cells CD8 and Macrophages M2, in which correlation coefficient was -0.46 (Fig. 4F). These results showed that the two clusters based on 146 DEMDGs were closely associated with prognosis and immune status in RCC patients.
To explore the possible reasons causing worse OS in cluster 1, we selected 106 DEGs from two clusters (Cor < −0.3 and |log2FC| > 1). We used heatmap visualizing the gene expression levels of 106 DEGs in RCC and normal samples (Fig. S1). We performed GO and KEGG analysis to analyze underlying functions and pathways of 106 DEGs (p < 0.05). Results of GO analysis were significantly enriched in regulation of T cell activation, T cell proliferation, positive regulation of leukocyte proliferation, positive regulation of T cell proliferation, positive regulation of cell-cell adhesion, etc. (Fig. 4G, S2A and S2B). Results of the KEGG pathways were significantly enriched in pathogenic Escherichia coli infection, natural killer cell mediated cytotoxicity, viral myocarditis, one carbon pool by folate, cytokine-cytokine receptor interaction, etc. (Fig. 4H). The above results indicated that the 106 DEGs were in close contact with immune microenvironment, which may be the causing for OS difference between two clusters.
Construction and evaluation of the prognostic model
To determine the prognostic value of 146 DEMDGs, univariate Cox regression analysis, LASSO and multivariate Cox regression analysis were used to identify them. Subsequently, the prognostic model was constructed based on 17 independent and prognostic DEMDGs (including TRIM4, TCF19, SH3BGR, PPP1R18, NMI, NCKAP1L, NAPSA, MYH14, MOB3A, FMNL1, FAXDC2, ESRP2, EMP3, CX3CL1, C1orf54, BST2, BDH1) (Fig. 5A-D). We analyzed the association between the gene expression and the survival of the patients. The patients’ OS with high-expression of TRIM4, TCF19, SH3BGR, PPP1R18, NMI, NAPSA, MYH14, MOB3A, FMNL1, FAXDC2, ESRP2, EMP3, CX3CL1, C1orf54, BST2 and BDH1 were worse than the low-expression group (p < 0.05) (Fig. S3). The methylation levels of 17 DEMDGs were inversely correlated with their expression level (p < 0.001) (Fig. S4). The risk score was calculated as follows = SH3BGR*(-0.0896) + BDH1*(-0.0667) + MOB3A*(-0.0244) + CX3CL1*(-0.0343) + NMI*(0.0516) + NAPSA*(-0.0068) + PPP1R18*(0.0375) + TCF19*(0.0231) + FMNL1*(0.0946) + C1orf54*(-0.0348) + FAXDC2*(-0.03550) + BST2*(0.0037) + MYH14*(-0.0410) + ESRP2*(-0.0836) + NCKAP1L*(-0.1905) + TRIM4*(-0.0840) + EMP3*(-0.0132). The coefficients of each gene were showed in Table 1.
RCC patients were split into high- and low- risk groups according to optimal cut-off value of the risk score (Fig. 5E). The AUC of the ROC curves was 0.788, 0.743 and 0.747 within 1-, 2-, 3-year, which demonstrated that risk score had a good prognostic value (Fig. 5F). The distributions of the risk score in high- and low-risk groups were showed in Fig. 5G. Patients’ mortality risk increased with increasing risk score (Fig. 5 H). The survival curve was carried out to assess the survival time between high- and low-risk score groups. The survival time of high-risk group was significantly worse than the low-risk group (p < 0.001) (Fig. 5I). RCC samples were clearly structured in two different groups by the principal component analysis, which suggested our study could significantly reflect the prognosis differences of RCC patients (Fig. 5J). To investigate the prognostic value of the signature for OS in RCC patients stratified by clinical features, RCC patients were stratified according to gender, grade, clinical stage, T stage, N stage and M stage. For all different stratifications, the OS time of the high-risk group was shorter than that of the low-risk group (p < 0.001) (Fig. 6). These results showed that the 17 DEMDGs for OS could accurately predict the prognosis of RCC patients and that the prognostic model could accurately reflect the survival of patients under different clinical features.
Clinical verification of prognostic model and enrichment analysis of prognostic model
We investigated the relationship between the risk score of RCC and several variables. We found that T stage (p < 0.001), N stage (p < 0.05), clinical stage (p < 0.0001), tumor grade levels (p < 0.05), cluster (p < 0.0001) and immune score (p < 0.01) were significantly associated to the risk score. The risk score level of the T1, T2 and N0 stage were clearly lower than that of the T3, T4 and N1, N2 stage (Fig. 7A-B). The higher level of clinical stage had higher level of risk score (Fig. 7C). The risk score increased with the grade of the tumor (Fig. 7D). The risk score of the cluster 1 was significantly higher than that of the cluster 2 (Fig. 7E). The low immune score group had lower risk score than high immune score group (Fig. 7F).
To further annotate functions enriched in the high- and low-risk groups, GSEA was queried to confirm the signaling pathways in which the genes are enriched. The results were represented in Fig. 7G-J (p < 0.05). The following biological processes were enriched in the high-risk group: regulation of DNA damage response signal transduction by p53 class mediator, chondrocyte development, negative regulation of gene expression epigenetic, positive regulation of leukocyte proliferation and cardiac epithelial to mesenchymal transition. The following signaling pathways were enriched in the high-risk group: cytokine-cytokine receptor interaction, vascular smooth muscle contraction, cell cycle, nod-like receptor signaling pathway, jak-stat signaling pathway. To clarify the possible molecular mechanism of 17 prognosis-related DEMDGs, we also performed GO and KEGG pathway analysis. Results of GO analysis were significantly enriched in negative regulation of cytokine production, response to interferon-gamma, cortical actin cytoskeleton organization, multivesicular body, SCAR complex, Rac GTPase binding, actin binding, etc. (Fig. 7K, S5A and S5B). Results of KEGG analysis were significantly enriched in synthesis and degradation of ketone bodies, pathogenic Escherichia coli infection, regulation of actin cytoskeleton, butanoate metabolism (Fig. 7L). These results suggested that the risk score was correlated with clinical features, cluster and immune score and that the above biologic functions and signaling pathways might impact RCC patients’ prognosis.
The predictive accuracy of prognostic model
In order to determine whether the risk score could be presented as an independent prognostic factor for RCC patients, we employed univariate and multivariate Cox proportional hazards regression analysis. In the univariate analysis and multivariate analysis, the risk score, age and M stage showed pronounced effects on the RCC prognosis (p < 0.05) (Fig. 8A-B). Beyond this, we constructed a nomogram with the significant variables in the multivariate analysis (Fig. 8C). Results suggested that risk score had the significant influence on survival prediction. The 1-, 2-, and 3-year predicted calibration curves also respectively suggested that the model had a good prediction accuracy (Fig. 8D-F). We also established a dynamic web-based survival rate calculator (http://127.0.0.1:7634/), which could individually predict the survival of patients according to their clinical features and risk score. For example, the 3-year cancer-specific survival rate was approximately 55% (95% CI 42–72%) for patients with low risk, M0 stage and aged < 65 years (Fig. 8G-H).
Prognostic model correlated with tumor-infiltrating immune cells and drug susceptibility
To further analyze the relationship between prognostic model and tumor-infiltrating immune cells, we performed a detailed spearman correlation analysis and the result was presented with the lollipop shape (Fig. 9A). High-risk group were more positively correlated with tumor-infiltrating immune cells, including CD8+ T cells, Macrophage M1, B cell , monocytes, Myeloid dendritic cell, T cell regulatory and myeloid dendritic cells, etc. The detailed results were showed in Table S3. We also attempted to identify associations between prognistic model and the efficacy of six common chemotherapeutic drugs for the treatment of RCC. The high-risk score was associated with the lower half maximal inhibitory concentration (IC50) of chemotherapeutics such as Temsirolimus (p = 1.1e-09), Sunitinib (p < 2.22e-16), Pazopanib (p = 0.0041), Doxorubicin (p = 0.0054), Bleomycin (p = 0.012), Axitinib (p = 7.1e−06) (Fig. 9B-G). The above results showed that this model closely correlated with tumor-infiltrating immune cells and drug susceptibility.