3.1 Analysis of m6A/m5C highly correlated lncRNAs
The flow chart of this study was present in Figure S1. By Spearman analysis, adjusting the correlation coefficient to 0.5, we found 190 correlated lncRNAs with potential prognostic value in the TCGA cohort and 95 correlated lncRNAs in the CGGA cohort (Figure S2A). The TCGA and CGGA m6A/m5C-related lncRNA data sets were intersected to obtain 14 highly correlated lncRNAs that are representative of the majority of cases (Figure S2A, Table S2). Then, we incorporated the intersecting lncRNAs and related m6A/m5C genes to produce co-expression network maps by Cytoscape software (Fig. 1A). By joining the normal sample data in the GTEx database, we found that these 14 highly correlated lncRNAs were differentially expressed in healthy individuals and LGG cases (Fig. 1B).
3.2 Identification of two clusters of LGG cases by consensus clustering and survival analysis
The LGG cases in the TCGA cohort were classified into two clusters by clustering analysis with the best K = 2 consensus matrix (Fig. 1C). The PCA computation method was performed to display the characteristics of the two clusters from the TCGA cohort (Fig. 1D) or CGGA cohort (Figure S2B), which were found to be visually distinguishable between these samples. Subsequently, we performed survival analysis on the two clusters of cases and found that the survival time of cluster 1 was significantly reduced compared with cluster 2, both in the TCGA (Fig. 1E) and in the CGGA cohort (Figure S2C).
3.3 Analysis of oncological differences between the two clusters of LGG cases
Through heat map visualization, we showed the differences in expression levels of 14 highly relevant lncRNAs and the clinicopathological features between the two clustered samples (Fig. 2A). Interestingly, we found that cases in cluster 1 had fewer IDH mutations, 1p19q co-deletions, and MGMT promoter methylation both in the TCGA (Fig. 2A, Table S3) and in the CGGA cohort (Figure S2D, Table S5). In addition, it was found that the ESTIMATE score, immune score, and stromal score of cluster 1 were higher compared with cluster 2 (Fig. 2A). Then, the clusters were then analyzed for differences in immune checkpoint expression levels, where we found that immune checkpoints, including CD276, CD40, IDO1, CTLA4, LAG3, PDCD1, TNFSF14 and et al., were significantly upregulated in cluster 1 (Fig. 2B). Furthermore, the two clusters were remarkably different in immune infiltrating B cells, CD4+ T cells, and macrophages by immune cell infiltration analysis with Cibersort and Timer methods (Fig. 2C, D and Figure S2E, F). By GSEA enrichment analysis, it was found that cluster 2 was highly enriched in the Allograft rejection, asthma, GABAergic synapse, graft-versus-host disease, and intestinal immune network for IgA production pathways, while cluster 1 was enriched in morphine addiction, nicotine addiction, and synaptic vesicle cycle pathways (Fig. 2E). Mutation landscape analysis showed that the highly mutated genes both in clusters 1 and 2 were mainly IDH1, TP53, and ATRX, while cluster 1 had a higher mutated EGFR and PTEN gene compared with cluster 2 in the TCGA cohort (Fig. 2F).
3.4 Constructing and evaluating the risk model by m6A/m5C-related lncRNAs
Firstly, LASSO regression analysis was performed to reduce the variables for risk model construction with 14 lncRNAs narrowed down to 8 lncRNAs (Fig. 3A-C). These 8 lncRNAs were combined to build our risk model: GDNF-AS1, HOXA-AS3, LINC00346, LINC00664, LINC00665, MIR155HG, NEAT1, RHPN1-AS1. Subsequently, we grouped patients according to risk scores and found a significantly lower probability of survival among patients in the high-risk group from the TCGA cohort (Fig. 3D, Table S4) and CGGA cohort (Fig. 3G, Table S5). Next, a K-M curve was applied to analyze the survival differences between the different risk groups, verifying that patients in the high-risk group had shorter survival times (Fig. 3E, H). Furthermore, we used time-ROC curve analysis to confirm that our model had promising predictive power in survival prediction for both the TCGA (AUC at 1/3/5years respectively: 0.86, 0.84 and 0.77, Fig. 3F) and CGGA cohorts (AUC at 1/3/5years respectively: 0.73, 0.76 and 0.76, Fig. 3I). The risk scores were in agreement with previous clustering analyses and provided good discriminatory functions in terms of tumor molecular characteristics, such as fewer IDH1 mutations, 1p19q co-deletion, and MGMT methylation in the high-risk group (Fig. 3J, Figure S4A).
3.5 The value of risk models compared with other clinical factors in applications
Univariate Cox regression analysis revealed that our risk score was as favorable a prognosis judgment as other commonly used clinical prognostic indicators such as grade, IDH status, 1p19q co-deletion, and MGMT promoter methylation (Fig. 4A, B). Then, through multivariate COX regression analysis, we verified that the risk scores took a better prognostic predictive power compared with molecular characteristics in different cohorts (Fig. 4C, D). Using ROC curves to analyze the degree of predictive accuracy, we observed that the risk scores possessed the greatest AUC values among the various clinically relevant prognostic factors in the TCGA cohort as well as CGGA cohorts (Fig. 4E, F). Visualization of our multivariate COX model through Nomogram to visually depict that our risk score model have a greater weight in the evaluation system (Fig. 4G, Figure S3A). Moreover, our time-ROC curve analysis (Fig. 4H, Figure S3B) as well as the calibration curves (Fig. 4I, Figure S3C) demonstrated the high accuracy of our prediction model in predicting survival at different time durations.
3.6 Risk model distinguishes immune landscapes and oncological characteristics of different risk groups
The Sankey diagram showed the correspondence between our subgroups, and the risk scores for prognostic prediction yielded similar results to the consensus clustering analysis (Fig. 5A). Then, analyzing the expression levels of immune checkpoints, CD276, CD274, CTLA4, ICOS, ICOSLG, TNFSF14, and PDCD1 were observed to be elevated in the high-risk group (Fig. 5B). In addition, the stromal score, immune score, and ESTIMATE score were all higher in the high-risk group as measured by ESTIMATE immune rating (Fig. 5C, Figure S4B). With immune infiltration cell analysis, we found that B cells, CD4 + T cells, macrophages, and myeloid-derived DC cells were significantly increased among the high-risk group (Fig. 5D, E, Figure S4C, D). Via a further correlation analysis, we verified that neutrophils, myeloid DC cells, and M2 macrophages were all positively correlated with risk scores (Fig. 5F, Figure S4E). Combining the tumor TMB analysis, we found that patients with high TMB had shorter survival times (Fig. 5G). Additionally, risk scores remained well discriminating among patients in TMB loading groups, suggesting that patients with high-risk scores had shorter survival times regardless of whether they were classified in the high or low TMB group (Fig. 5H). Considering that tumor behavior is highly associated with mutations in genetic background, it was found by CNV analysis that the high-risk group possessed more CNVs, and a higher G-score (Fig. 5I).
In addition, the expression level of each screened lncRNA as our risk model could be an individual indicator to predict the prognosis of LGG patients (Figure S5A). Mutation landscape analysis from different risk groups showed that low-risk group had a relatively higher mutation rate in IDH1(low-risk group vs. high-risk group: 93% vs. 60%), while high- risk group had a higher mutated EGFR, NF1 and PTEN gene compared with low-risk group (Figure S5B). GO analysis revealed that cell functions enriched by heterogeneous genes between the high-risk group and its counterpart were as follows: regionalization, skeletal system development, pattern specification process, collagen fibril organization, anterior/posterior pattern specification (Figure S5C, Table S6). KEGG analysis unveiled that the signal pathways enriched by diverse genes between the high-risk group and its counterpart were: complement and coagulation cascades, ECM-receptor interaction, phagosome, focal adhesion and hematopoietic cell lineage (Figure S5D).
3.7 Predictive value of risk scores in clinical treatment subgroups
The routine treatment of glioma includes TMZ therapy and radiotherapy. In both the TMZ-treated and non-TMZ-treated subgroups, high-risk patients had shorter survival times in the TCGA cohort (Fig. 6A). Similarly, the same findings could be obtained in the CGGA group, and the survival differences between high- and low-risk patients were more pronounced (Fig. 6B). Patients in the high-risk group had shorter survival times than those in the low-risk group when grouped according to the presence or absence of radiotherapy both in TCGA and CGGA cohorts (Fig. 6C, D).
3.8 In vitro validation of biological functions of risk model lncRNAs
LINC00664 was sorted from risk model lncRNAs to validate its biological function in U87 and U251 cell lines. Firstly, we downregulated the expression level of LINC00664 in U87 and U251 cell lines by small interfering RNA and confirmed that it could promote the proliferation of glioma cells with the CCK8 proliferation assay (Fig. 7A, B). And then, it was demonstrated by a trans-well assay that the downregulation of LINC00664 could significantly reduce the invasion ability both of U87 and U251 cells (Fig. 7C). Subsequently, it was reconfirmed by wound healing assay that downregulation of LINC00664 reduced the migration ability of glioma cell lines U87 and U251 (Fig. 7D).