Expressions of m6A methylation regulatory genes and identification of their lncRNA
Two sets of gene expression matrices for OS were downloaded from the GEO database (GSE19276, GSE99671). The GSE19276 data set includes sequencing data of 41 cases of OS after chemotherapy. The differential expressions of m6A methylation regulatory genes in good and poor responses to chemotherapy are shown in Figure 2a. The HNRNPA2B1 and RBM15 genes are differ between the good and poor chemotherapy groups. The GSE99671 data set includes sequencing data of 36 metastatic and non-metastatic patients. The expression levels of m6A methylation regulatory genes are shown in Figure 2b. The IGFBP1, YTHDF3, ALKBH5 and YTHDF1 genes are differ between the metastatic and non-metastatic groups. At the same time, we obtained 14,589 lncRNAs from the TARGET database. Then, the expression data of 23 m6A methylation regulatory genes was extracted from the OS dataset in the TARGET database. When lncRNA is significantly associated with one or more m6A methylation regulatory genes, it is the m6A-related target lncRNA (correlation coefficient >0.5 and P <0.05). At the end, we obtained 706 lncRNAs from the TARGET database, and the co-expression network of m6A methylation regulatory genes and related lncRNAs are showed in Figure 2c. The workflow chart is shown as in Fig. 1.
Consensus Clustering Identified Two Clusters of OS
The above research results confirmed that m6A methylation regulatory genes were changed in the metastasis of OS due to chemotherapy. However, how m6A methylation modulates lncRNA to regulate the occurrence and development of OS? We tried to further explore it. We found that, in total, 706 lncRNAs are related to m6A methylation regulatory genes and 26 lncRNAs related to the prognosis of OS were screened using a univariate Cox regression analysis combined with the clinical data of the patients (RP11-79N23.1, AL591893.1, CTB-193M12.5, RBM26-AS1, RP11-1017G21.5, LINC01534, RP11-180M15.7, CTC-444N24.8, RP1-92O14.6, SLX1A-SULT1A3 , SNHG6, LINC01355, RP11-2B6.2, RP11-876N24.5, YEATS2- AS1, TIPARP-AS1, CTC-268N12.3, CTD-2303H24.2, KB-431C1.5, NPTN-IT1, RP11-794A8.1, CTC-428H11.2, RP11-24N18.1, RP11-540B6.6, MIR210HG, and CTC-360G5.9), as shown in Figure 3a. In order to explore the role of m6A RNA methylation regulatory genes in OS metastasis, the consensus clustering method was used to divide patients with OS into subgroups based on the expressions of 26 prognostic-related lncRNAs. When k = 2–9, the cumulative distribution function (CDF) of consensus clustering and the increment of AUC are shown in Supplementary Figures 1a and 1b. According to the similarity of m6A-related prognostic lncRNA expression levels, it is found that k = 2 has the best cluster stability between k = 2 and 9 (shown in Figure 3b). Through a Kaplan–Meier survival analysis, it was identified that there are significant differences between the two subgroups (shown in Figure 3c), and suggested that changes in m6A lncRNA expressions affect OS prognosis.
Analysis of immune cell infiltration in OS
The immune microenvironment also affects the occurrence and development of tumors. We found that the immune score (Figure 4a, P = 0.02), stromal score (Figure 4b, P = 0.027), and estimate score (Figure 4c, P = 0.015) were higher in cluster 1 than the cluster 2. The abundance of 22 immune cells in cluster 1 and cluster 2 is shown in Figure 4d, indicating cluster 1 has higher T cells follicular helpers (P = 0.031) and compared to cluster 2. After, the mRNA expression level of each subtype immune checkpoint were tested. We found that compared to cluster 2, HAVCR2, LAG3, and PDCD1 were highly expressed in cluster 1, while the SIGLEC15 was low in expression, as shown in Figure 4e.
Development of a Prognostic Signature
Based on 26 m6A-related prognostic lncRNAs that were from the results single-factor Cox regression analysis, the prognostic signature was constructed for OS patients by using Lasso Cox analysis. We get that the model has the good predictive ability when the prognostic model consists of 12 lncRNAs, and the coefficient and partial likelihood deviance of the prognostic signature are shown in Figures 5(a) and 5(b). The risk score can be obtained by the following calculation formula: -0.0606×RP11-1017G21.5 expression +0.24817×LINC01534 expression-0.3080× CTC-444N24.8 expression-0.1920×RP1- 92O14.6 expression+0.0007×SNHG6 expression-0.2496×RP11-876N24.5 expression +0.2154×TIPARP-AS1 expression+ 0.4944×CTC-268N12.3 expression-0.1731×CTC- 428H11.2 expression+ 0.0343× RP11-24N18.1 expression+0.1530RP11-540B6.6 expression+0.0027× MIR210HG expression. Then, the patients with OS from the TARGET database were randomly divided into two groups (training set and a test set). In each group, OS patients were further divided into two data sets (high-risk and low-risk groups) based on the risk score using the median as the cut-off point. It can be found that there are significant differences between the high-risk and low-risk groups in the survival time of OS patients, and the OS patients in the low-risk group live longer from Figure 5(c) and Figure 5(d). The AUC in the training cohort was 0.871 (Figure 5e ), and the AUC in the test cohort was 0.870 (Figure 5f), it was showed that the prognostic model is robust in predicting the prognostic survival time of patients with OS.
To explore further the factors affecting OS prognosis, we explored the impact of gender and race on the prognosis of patients with OS combined with other clinical data. We found that the risk score was associated with the prognosis of OS in both sets (Figure 6a, 6b both P <0.05), but gender and race were not associated with the prognosis of OS. In addition, in the training and testing sets, the patients of OS in the high-risk group appear to was worse than the low-risk group (Figure 6e, 6f). LINC01534 and RP11-24N18.1 were all highly expressed in the high-risk group, while CTC-444N24.8 was all highly expressed in the low-risk group (Figure 6g, 6h).
GSEA and Risk Score Associated with Immune Infiltration
Based on the differential genes of the high-risk and low-risk groups, we found that in the high-risk group, the Linoleic acid metabolism and the glycine, serine and threonine metabolism pathway were mainly involved through GSEA (P <0.05), as shown in Figures 7a and 7b. In addition, we also found that two types of immune cells are associated with risk scores. As shown in Figures 7c and 7d, the abundance of activated mast cells (P = 0.024) and naive CD4 T cells (P = 0.0044) was positively associated with the risk score.