The relationship between the aberrant expression of RNA:m5C methyltransferases and clinicopathological characteristics of gliomas
m5C modifications of RNA play an important role in the occurrence and progression of tumors and influence many biological functions. We comprehensively studied the relationship between each RNA:m5C methyltransferase and the clinicopathological characteristics of gliomas, for example, WHO grades and isocitrate dehydrogenase (IDH)-mutant status. The relationship between aberrant expression of RNA:m5C methyltransferases and WHO grades are shown using heatmaps (Fig. 1a, b). As shown in the heatmaps, the expression of almost all RNA:m5C methyltransferases was significantly correlated with WHO grade. The significant differentially expressed RNA:m5C methyltransferases include NOP2, NSUN2, NSUN4, NSUN5, NSUN6, and NSUN7. Thereafter we quantitatively analyzed the expression of these differentially expressed genes in the CGGA dataset (Fig. 1c) and verified our findings using the TCGA dataset (Fig. 1d). As shown in the figure, NOP2, NSUN2, NSUN4, NSUN5, and NSUN7 were up-regulated and NSUN6 was down-regulated with an increase of WHO grade.
Mutation of the IDH gene has been reported in gliomas, particularly in low-grade gliomas (LGGs) and the prognostic value of the mutation has been confirmed by many authors in the literature. Based on these findings we investigated the relationship between aberrant expression of RNA:m5C methyltransferases and IDH-mutant status in LGGs. The results showed different expression of NSUN3, NSUN4, NSUN5, NSUN6, and NSUN7 between IDH-mutant and IDH-wildtype status in both CGGA (Fig. 1e) and TCGA (Fig. 1f) datasets. In patients harboring IDH mutants, the expression of NSUN4, NSUN5, and NSUN7 increased, while the expression of NSUN3 and NUSN6 decreased. We also studied the expression of RNA:m5c methyltransferases in glioblastomas stratified by IDH mutant status, and findings indicated that NSUN5, NSUN6, and NSUN7 were still differentially expressed. (Additional file 1: Figure S1A, B).
In addition, we predicted the gene mutation frequencies of seven RNA:m5c methyltransferases in the cBioPortal of the Cancer Genomics Database and verified these findings using the TCGA dataset resulting in that mutations of these RNA:m5c methyltransferases were rare (Additional file 1: Figure S1C; Additional file 5: Table S4). Even for the top-ranked RNA:m5c methyltransferases like NOP2, the mutation frequencies were only 3%. It demonstrated that the aberrant expression of RNA:m5C methyltransferases may not be generated by genetic mutation.
Interaction and unsupervised consensus analysis of these RNA:m5C methyltransferases
To investigate the close connection between RNA:m5C methyltransferases and clinicopathological features of gliomas, we systematically investigated the function, interaction, and correlation of RNA:m5C methyltransferases. We found that all the RNA:m5C methyltransferase genes were involved in various types of methylation, among which NOP2, NSUN3, NSUN4, and NSUN5 were mainly involved in rRNA methylation and NSUN2, NSUN6, as well as NSUN3 participated in tRNA methylation. Their function and interaction were supported by text mining and co-expression (Fig. 2a). Thereafter, a Pearson correlation analysis was performed to study the expression profile of seven RNA:m5C methyltransferases in the CGGA (Fig. 2b) and TCGA (Additional file 2: Figure S2A) datasets. The expression of NOP2, NSUN4, and NSUN7 seemed to be closely linked. While the expression of NSUN6 was significantly negatively correlated with NSUN4, NSUN5 as well as NSUN7, and the expression of other genes were positively correlated with each other. These results were consistent with the quantitative analysis of RNA:m5C methyltransferases expression profile in gliomas.
Based on the RNA:m5C methyltransferase expression profiles of 306 patients with gliomas in the CGGA dataset, we used unsupervised consensus clustering analysis to identify three subtypes namely MC1, MC2, and MC3 (Fig. 2c). Clearly, k=3 seemed to be a relatively stable distinction of the samples in the CGGA dataset with clustering stability increasing from k=2 to k=9 (Additional file 2: Figure S2B-E). Furthermore, principal component analysis (PCA) was used to compare the transcription profiles of these three subgroups. The results showed that they could be adequately divided into three distinct clusters (Fig. 2e). Next, we investigated the relationship between the RNA:m5C methyltransferase expression profiles of these three subtypes and clinicopathological features of gliomas (Fig. 2d). In these three subtypes, MC2 subtype compared to MC3 subtype and MC3 subtype compared to MC1 subtype were significantly correlated with a higher grade (P<0.0001), IDH-wildtype status (P<0.0001), 1p/19q-noncodel status (P<0.0001), older average age at diagnosis (P<0.0001) and receipt of additional chemotherapy (P<0.01) (Additional file 5: Table S2). Moreover, we found significant differences in overall survival between the three groups (P<0.0001). The survival of patients who fell into the MC2 subtype was obviously shorter than for the other two subtypes (Fig. 2f). The results above indicating that consensus clustering of RNA:m5C methyltransferases could identify subtypes with different clinicopathological features and prognosis in gliomas.
Functional annotation of the subtypes
To investigate the different clinicopathological features and overall survival rates of the three groups in gliomas, we annotated the biological processes of specific genes associated with the MC2 subtypes. A comparison was performed with the other two groups, in which 664 genes were up-regulated (log FC>1.5, normalized P-value<0.01 and FDR=0.05) and 645 genes were down-regulated (log FC<-1.5 and normalized P-value <0.01, FDR=0.05) in MC2 relative to MC1 and MC3 subtypes. GO analysis of the up-regulated genes revealed that “extracellular matrix organization”, “vasculature development”, “epithelial cell proliferation”, “cell-substrate adhesion” and “cellular response to tumor necrosis factor” were enriched in biological processes and pathways which might be highly correlated with malignant progression of gliomas. Top 20 significantly enriched biological processes were shown in the figure (Fig. 3a). The KEGG pathway analysis further revealed that these genes were also notably associated with tumor-relevant signaling pathways, for example, ECM-receptor interaction, Jak-STAT signaling pathway, and P53 signaling pathway, amongst others (Fig. 3b).
Moreover, GSEA was performed to investigate the hallmarks of tumors in the MC2 subtype. The results indicated that the tumor hallmarks, such as, P53 pathway (NES=1.720, P-value=0.013),P13K/AKT/mTOR signaling (NES=1.811, P-value=0.008), DNA repair (NES=2.050, P-value<0.001), and MTORC1 signaling ((NES=1.894, P-value=0.006) (Fig. 3c, d) were enriched in MC2 subtype. Combined with the above analysis, the subtypes identified by RNA:m5c methyltransferases were significantly associated with the malignant progression of gliomas.
Prognostic value of RNA:m5C methyltransferases and construction of the risk score model by five RNA:m5c methyltransferase genes
Based on the relationship between RNA:m5C methyltransferases and malignant progression of gliomas, we further attempted to explore the prognostic role of RNA:m5C methyltransferases in gliomas by a univariate survival analysis by Cox proportional hazards models of expression levels in the CGGA dataset, which was set up as a training dataset. We obtained six genes associated with prognosis (P<0.01), among which NOP2, NSUN2, NSUN4, NSUN5, and NSUN7 acted as risk factors (HR>1), and NSUN6 played a protective role (HR<1) in gliomas (Fig. 4a). To improve the robustness of the six RNA:m5C methyltransferases, these genes were selected to conduct an additional analysis by the LASSO Cox regression algorithm in the CGGA dataset (Fig. 4b). Five genes of RNA:m5C methyltransferase genes and coefficients (Fig. 4c) were screened to construct the risk score model, and the formula for the risk score is as follows: 0.884 (expression value of NOP2) + 1.167 (expression value of NSUN4) + 0.190 (expression value of NSUN5) + (-0.161) (expression value of NSUN6) + 0.014 (expression value of NSUN7) both in the training dataset (CGGA) and the verification dataset (TCGA). In this risk score model, four genes (NOP2, NSUN4, NSUN5, and NSUN7) were cancer-promoting and NSUN6 was a cancer-suppressor gene. To better understand the role of these five prognostic genes in glioma, the K-M analysis was performed both in the CGGA dataset and TCGA dataset in which samples were classified by high or low expression according to the median gene expression level. All five of these RNA:m5C methyltransferase genes were significantly correlated with OS (P<0.0001) (Fig. 4d; Additional file 3: Figure S3A). Moreover, we acquired the immunohistochemistry pathological specimens of the five RNA:m5C methyltransferases from the website of The Human Protein Atlas, and these samples were taken from patients with a similar age and gender. We found that the protein expression of NOP2, NSUN4, NSUN5, NSUN7 in high-grade glioma tissues were much higher than those in low-grade glioma tissues, while NSUN6 was reversed (Fig. 4e). The above results demonstrate the prognostic value of these five genes in glioma.
The power of the prognostic value of the risk score model in gliomas
To obtain the predictive effect of the risk score model for clinical outcomes in patients with gliomas, the median of all patient’ scores was used as a standard, and the data were divided into high- and low-risk groups both in the CGGA and TCGA datasets. The analyses indicated that the number of patients who died increased significantly as the risk score increased (Fig 5a, d). In addition, there was a significant difference in overall survival (P<0.0001) between the high-risk group and low-risk group (Fig. 5b, e). Thereafter, the ROC curve analyses at 1-year, 3-years, and 5-years for prognostic risk scores were performed to test the predictive efficiency of the risk model. The results showed that the risk score had high accuracy (the area under the curve (AUC) of all results in the ROC curve were >0.750) in distinguishing the OS of gliomas (Fig. 5c, f). Moreover, we performed further studies on the prognostic value of this signature in glioma patients stratified by WHO grade and IDH-mutant status. The results revealed that the risk score model could be used to divide glioma patients, in the CGGA dataset, into two distinct prognosis groups with different WHO grade subtypes (LGG and GBM) and IDH-mutant status subtypes (Fig. 5g-j). The similar results of the risk score model were also obtained using the TCGA dataset (Additional file 3: Figure S3B-E). From the comprehensive analyses above, we concluded that the prognostic efficacy of the risk score was accurate and stable.
The interrelation of the risk scores and clinicopathological characteristics in patients with gliomas
The expression of the five screened RNA:m5C methyltransferases in low- and high-risk patients in the CGGA dataset are represented by heatmaps (Fig. 6a). We found statistically significant differences between the low-risk and high-risk groups both in the CGGA and TCGA datasets, based on WHO grade (P<0.0001), histology (P<0.0001), IDH status (P<0.0001), 1p/19q status (P<0.0001), age (P<0.0001), and receipt of additional chemotherapy (P<0.0001) (Additional file 5: Table S3). Thereafter, we explored the relationship between the risk scores and each clinicopathological characteristic. As shown in the figure, the risk scores were significantly different in these groups with the CGGA dataset (Fig. 6b) compartmentalized by WHO grade, IDH status, 1p/19q status, histology, age and receipt of additional chemotherapy and were verified in the TCGA dataset (Additional file 4: Figure S4A-G). Moreover, considering the importance of reported glioma-associated driver-gene alterations in glioma initiation and progression, including ATRX, P53 pathway (TP53, MDM2, MDM4), RB pathway (CDK4, CDK6, CCND2, CDKNA/B, RB1) and P13K/RTK pathway (PIK3CA, PIK3R1, PTEN, EGFR, PDGFRA, NF1), we obtained the somatic mutation data from the TCGA database. The mutational landscape of tumor driver-gene alterations between low- and high-risk patients with gliomas was rendered as a waterfall plot (Fig. 6c). The high-risk patients were characterized by more frequent alterations in tumor driver-genes than low-risk patients. This means high-risk patients harbored more progressive cancer. Combining the above results, the risk scores can predict not only the overall survival but also clinicopathological features.
Next, we investigated whether this risk score was an independent prognostic factor based on eight clinicopathological features. Univariate and multivariate Cox regression analyses were performed with the CGGA dataset. We observed that the risk score, age, WHO grade, IDH status, 1p/19q codel status, chemotherapy status and radiotherapy status were significantly correlated with prognosis using the univariate analysis (Fig. 7a). Multivariate analysis based on the above factors was performed and the risk score remained strongly associated with the OS (P<0.001, Fig. 7b). In the verification dataset (TCGA), we obtained similar results, including the same factors in the multivariate analysis, the risk score also remained strongly associated with the OS (P=0.002, Fig. 7c, d). The consensus results demonstrated that the risk score constructed by RNA:m5C methyltransferases was a powerful and independent prognostic factor for glioma outcome. Based on this date, we explored the biological differences between the high-risk and low-risk groups. Three hundred and fifty-seven differentially up-regulated genes were obtained by variance and intersection analyses in the CGGA and TCGA datasets, which are represented by Venn diagrams (Additional file 4: Figure S4H). The GO analysis of the up-regulated genes revealed that “cell division”, “angiogenesis”, “cell adhesion”, “cell proliferation” and “extracellular matrix organization” were enriched in high-risk glioma patients (Fig. 7e). The KEGG pathway analysis further revealed that these genes were also notably associated with relevant signaling pathways in ECM-receptor interaction, cell cycle, PI3K-Akt signaling pathway, and P53 signaling pathway, amongst other (Fig. 7f).