WGCNA identifies key modules
We combined GTEx and TCGA data to construct a co-expression network for all lincRNAs, using the R package ‘WGCNA’, and confirmed that the β value in the network was 4 (Figure 1A). Further, the dynamic tree cutting method was used to generate co-expression modules, and the closely related modules were merged into larger modules. Finally, 19 modules were generated in the lincRNA co-expression network (Figure 1B). The module eigengenes (MEs) of the turquoise module had the strongest correlation with GBM traits (Figure 1C). Figure 1D shows the correlation between gene significance and module membership in the turquoise module, which was considered a key module containing 2,206 lincRNAs.
We also constructed the co-expression networks of all mRNAs for GTEx and TCGA combined data and GEO combined data, comprising 20,270 and 14,807 mRNAs, respectively. In the mRNA co-expression networks of the two sets of data, β values were 8 and 4 (Figure 2A and Figure 3A), and 22 and 21 modules were generated (Figure 2B and Figure 3B), respectively. In the combined GTEx and TCGA data, the MEs of the blue and dark green modules have the strongest correlations with the tumour and normal traits, respectively (Figure 2C). The blue and dark green modules were considered key modules, comprising 7,736 mRNAs. In the GEO data, Figure 3C shows that the MEs of the blue and turquoise modules (comprising 8,022 mRNAs) have the strongest correlations with the tumour and normal traits, respectively. Figures 2D-E and 3D-E show the correlation between gene significance and module membership.
Identification of differentially expressed lincRNAs (DE lincRNAs) and mRNAs (DEmRNAs)
We identified DElincRNAs and DEmRNAs from the GTEx and TCGA combined data, including 163 up-regulated lincRNAs, 176 down-regulated lincRNAs, 2,953 up-regulated mRNAs, and 2,932 down-regulated mRNAs. From the GEO data, 2,434 up-regulated and 1,995 down-regulated mRNAs were identified. The two groups of mRNAs were then comprehensively analysed to obtain overlapping 1,040 down-regulated mRNAs and 1,358 up-regulated mRNAs. Finally, 339 DElincRNAs and 2,398 DEmRNAs were identified.
Preliminary construction of ceRNA network
Through WGCNA analysis of lincRNAs, we obtained 2,206 lincRNAs in the turquoise module and then integrated the analysis with 339 DElincRNAs to obtain 251 overlapping lincRNAs. Further, using the miRcode online tool, 251 lincRNAs were used to predict miRNAs. Additionally, using the miRDB, TargetScan, and miRTarBase datasets, the predicted miRNAs were used to obtain the corresponding mRNAs. By analysing the predicted mRNAs and DEmRNAs, we obtained 111 mRNAs (24 down-regulated and 87 up-regulated mRNAs). The predicted miRNAs and 251 lincRNAs were analysed using the 111 mRNAs, and 25 lincRNAs and 30 miRNAs were finally obtained. Subsequently, we used Cytoscape version 3.7.2 software to build a lincRNA-miRNA-mRNA ceRNA network (Figure 4).
Functional Enrichment Analysis
A functional enrichment analysis was performed on the 111 mRNAs obtained previously. The biological processes of enrichment were mainly cell cycle, epithelial cell proliferation, ossification, and sex differentiation (Figure 5A). The cellular components were concentrated in the transcription factor, protein kinase, and serine/threonine protein kinase complexes (Figure 5 B). Enriched molecular function was mainly involved in DNA-binding transcription activator activity, RNA polymerase II-specific core promoter binding, and HMG box domain binding (Figure 5 C). KEGG pathway analysis showed that the genes were associated with cellular senescence, cell cycle, multiple cancers, miRNA in cancer, and TGF-beta signalling pathway (Figure 5 D).
Construction of prognostic models
Among the CGGA GBM samples, complete survival information was available for 236 samples, and all clinical information was available for 198 samples. Univariate Cox regression analysis was performed on the 111 mRNAs previously obtained, and 27 mRNAs related to survival time were obtained (p<0.05) (Figure 6A). Multivariate Cox analysis was performed using 27 mRNAs, and a Cox proportional hazards regression model of GBM patients containing 13 mRNAs was constructed (Figure 6B). Based on the median risk score, all patients were divided into two groups (high-risk and low-risk groups). Figure 6C shows the survival status, survival time, and mRNA expression levels of patients. Survival analysis showed that patients in the low-risk group survived longer than those in the high-risk group (p<0.001) (Figure 6D). The univariate Cox regression analysis showed that recurrence (p=0.004), isocitrate dehydrogenase (IDH) mutation (p=0.009), 1p19q codeletion (p=0.014), and risk score (p=0.001) were predictors (Figure 6E). Moreover, multivariate Cox regression analysis confirmed that recurrence (p=0.002) and risk score (p<0.001) were independent risk factors (Figure 6F). The risk score and recurrence AUCs of the nomogram were 0.668 and 0.663, respectively (Figure 6G).
ceRNA network in GBM
The 13 mRNAs obtained through the previous multivariate COX regression analysis, as well as the 14 miRNAs and 23 lincRNAs interacting with them, were used to construct the final ceRNA network (Figure 7).
GSEA reveals the close relationship between lincRNA and GBM
Using | log2FC | ≥2 as a condition to screen 23 kinds of lincRNA, we obtained seven of them: MALA1, MEG3, NEAT1, MIR7-3HG, FAM95B1, EPB41L4A-AS1, and AC125494.1. Among them, extensive research has been conducted on MALAT1 and NEAT1. Thus, we selected MEG3, MIR7-3HG, FAM95B1, and EPB41L4A-AS1 in 248 CGGA samples for GSEA. As shown in Figure 8, the GSEA results revealed that these lincRNAs were closely related to cancer. MEG3 showed potential effects on prostate cancer, colorectal cancer, and cancer pathways (Figure 8A), MIR7-3HG demonstrated potential role in small cell lung cancer, prostate cancer, and cancer pathways (Figure 8B), FAM95B1 was closely related to endometrial cancer, non-small cell lung cancer, renal cell carcinoma, and MTOR signalling pathway (Figure 8C), while EPB41L4A-AS1 was associated with small cell lung cancer, prostate cancer, and JAK/STAT signalling pathway (Figure 8D). AC125494.1 is not found in the CGGA data.