1. Identification of m6A-related genes
Based on our workflow (Fig. 1), we obtained 6,803 m6A-related and referred to as GeneSet 1. These genes intersected with 5,220 DEGs from the TCGA database, and 1,291 differentially expressed m6A-related genes were screened for GO and KEGG analysis. In the KEGG and GO analysis, we extracted a total of 5,642 genes from the significantly enriched pathways, which were referred to as GeneSet 2. In addition, we also clustered 1,291 differentially expressed m6A-related genes, and finally obtained two sets of samples, and screened out 644 DEGs between them, which were referred to as GeneSet 3. The three gene sets contained a total of 10,893 genes (Fig. 2A). The screening process is shown in Supplementary Fig. 1.
2. 33 Key Genes Were Identified By Wgcna Analysis
To obtain key genes related to m6A modification, we performed WGCNA on 10,893 genes and finally obtained 18 modules (Fig. 2B-C). Then, the GS value of each module was calculated, with a larger GS indicating that the module was more related to the phenotypic characteristics of the sample (Fig. 2D). We calculated the Pearson correlation coefficient between each module and the phenotypic characteristics of the sample (Fig. 2E).
According to the results, we identified two key modules, greenyellow and brown, and selected the genes in these two modules for subsequent analysis. We constructed a protein-protein interaction (PPI) network based on two modules, and 105 hub genes were screened in the network (Supplementary Fig. 2A). At the same time, 185 hub genes were screened in these two modules according to the thresholds of MM > 0.6 and GS > 0.2 (Supplementary Fig. 2B-C). The intersection contains 33 genes, which are considered to be the key genes related to m6A (Fig. 2F).
3. Construction Of The M6a-gpi In The Tcga Dataset
Univariate Cox analysis was performed on 33 key genes, and the results showed that 10 genes (IL12RB1, IL2RB, IFNG, FASLG, CXCL9, CXCL13, GBP2, CXCL10, CXCR6, and CIITA) had a significant relationship with the prognosis of CRC (P < 0.05). The forest plot is presented (Fig. 3A). To further determine the genes used to construct m6A-GPI, LASSO analysis was performed to identify the 7 most important genes and their coefficients (IL12RB1, FASLG, CXCL13, GBP2, CXCL10, CXCR6, and CIITA) (Fig. 3B-C), we utilized m6A-GPI to determine the patient’s risk score and divided all patients into a high-risk group and a low-risk group based on the median risk score. (Fig. 3D).
Compared with patients with high-risk scores, lower risk scores represent better DFS and a relatively longer survival time in the K-M curves (P < 0.05) (Fig. 3E). At the same time, a ROC curve was used to test the accuracy of m6A-GPI in predicting patient survival. The AUC of the 1-, 3-, and 5-year DFS rates reach 0.66, 0.67 and 0.65, respectively (Fig. 3F), which indicated that m6A-GPI has the potential to predict the DFS of patients in the TCGA cohort.
4. Validation Of The M6a-gpi
Generally, the pathological stage is of great significance to the prognosis of CRC [18], but other factors such as age and gender can also affect the prognosis. Therefore, we tested the stability of m6A-GPI in different clinical characteristics. In the stratified samples, the results showed that the high- and low-risk groups still had significant survival differences after distinguishing age, sex, and stage (P < 0.05) (Fig. 4A-F), which indicates that the m6A-GPI has good stability in stratified samples.
The external dataset was obtained from the GSE17538 cohort, and we used the same formula to calculate the risk score of the patients in this cohort. Similarly, patients were divided into high- and low-risk groups (Fig. 4G). Patients with higher risk scores had poor DFS in the GEO cohort (P < 0.05), which is consistent with the previous analysis of the TCGA cohort (Fig. 4H).
Furthermore, we compared the risk score in the clinical characteristics of the TCGA cohort (age, cancer type, sex, stage and cancer site), and the results showed that the risk score was significantly different in stages and cancer sites (P < 0.05) (Fig. 5A-E). As the stage increases, the risk score has an upward trend, and the risk score of left colon cancer is higher than that of right colon cancer.
5. The Molecular And Mutation Characteristics Of Different M6a-gpi Groups
We compared some potential factors that determine tumor immunogenicity in two subgroups, and the results indicated that the risk score was positively correlated with HRD, CNA, and mRNAsi (Fig. 6). HRD mainly include loss of heterozygosity (LOH), telomere allele imbalance (TAI), and large-scale transition (LST). These three indicators can be used to determine the genomic instability score (GIS) and then evaluate the HRD status. CNA, LOH, TAI, and LST represent the level of chromosome instability [19]. The mRNAsi is an index that can assess the similarity between tumor cells and stem cells and is related to the active biological processes in stem cells and the high degree of tumor dedifferentiation [20].
In addition, we used the “maftools” R package to analyze the distribution of somatic mutations between two subgroups in the TCGA cohort. Then we sorted the genes according to the mutation rate and identified the genes with the highest mutation rate in two groups. The mutation rates of APC, TP53, KRAS, TTN, MUC16, PIK3CA, SYNE1, FAT4, OBSCN, and MUC4 were higher than 20% in both groups (Fig. 7A-B). After we grouped samples according to the risk score, the high-risk samples showed significant amplifications on chromosomes 8, 11, 12, 17, and 20, while deletions were found on chromosomes 1, 3 to 6, 8, 10, 12, 16 to 20. However, the low-risk samples showed significant amplifications on chromosomes 5, 6, 8, 10, 12, 13, 16, 17, 19, and 20, while deletions were found on chromosomes 1, 3 to 8, 10, 15 to 22 (Fig. 7C-D).
6. Immune Characteristics Of Different M6a-gpi Groups
The effect of tumor treatment depends not only on the tumor immunogenicity of the tumor but also on the TIME. TIME is formed by various cells, including immune cells (such as T cells, Tregs, NK cells, and macrophages), endothelial cells, and inflammatory mediators [21]. The role of immune cells is particularly important, and it may affect the patient's response to treatment. To compare the distribution of immune cells in m6A-GPI subgroups, we analyzed the relative proportions of immune cells between the two m6A-GPI subgroups. Compared with low-risk patients, high-risk patients showed more infiltration of NK cells, M0 macrophages, and mast cells. However, in this group, there were fewer CD4 + T cells, CD8 + T cells, M1 macrophages, and M2 macrophages (Fig. 8A).
Then we explored the level of the stromal score, immune score and tumor purity among the two groups, and the results showed that the high-risk group had higher tumor purity and that the low-risk group had a higher stromal score and immune score (Fig. 8B-D). It suggests that the high-risk patients had a higher proportion of cancer cells in the tissue, and the TIME of the low-risk group contains contained abundant immune or matrix components.
7. Independent Prognostic Factor And Nomogram
By using univariate Cox regression analysis and multivariate Cox regression analysis, we sought to determine whether m6A-GPI was an independent prognostic factor for patients with CRC. Univariate Cox analysis showed that m6A-GPI was closely related to the prognosis of CRC [hazard ratio (HR) = 3.041, 90% CI: 2.06–4.5, P < 0.001]. Multivariate Cox analysis further showed that m6A-GPI can be used as an independent predictor in CRC [hazard ratio (HR) = 2.4, 90% CI: 1.6–3.59, P < 0.001] (Fig. 9A). At the same time, we constructed a nomogram and calibration plots of DFS at 1-, 3-, and 5-year in the TCGA dataset (Fig. 9B-E), which provides doctors with a method to quantitatively predict the prognosis of CRC. The accuracy of prediction at 3 years can be increased to 0.75 after combining the risk score and clinical characteristics (Fig. 9F).
8. Validation Of M6a-related Genes Expression Levels In Crc Tissues
To further investigate the expression levels of m6A-related genes in CRC clinical tissues. We first detected the mRNA expression of m6A-related genes (IL12RB1, FASLG, CXCL13, GBP2, CXCL10, CXCR6, and CIITA). Notably, the expression of CIITA was substantially increased in CRC tissues (Fig. 10A). However, there was no significant difference of others genes between CRC and adjacent normal tissues (Fig. 10A). Subsequently, protein level of CIITA was detected in CRC and adjacent normal tissues. Consistent with mRNA levels, the protein level of CIITA was also obviously increased in CRC tissues (Fig. 10B).