Survival-associated Metabolic Genes in Colon and Rectal Cancers

Uncontrolled proliferation is the most prominent biological feature of tumors. To rapidly proliferate and maximize the use of available nutrients, tumor cells regulate their metabolic behavior and the expression of metabolism-related genes (MRGs). In this study, we aimed to construct prognosis models for colon and rectal cancers, using MRGs to indicate the prognoses of patients. We rst acquired the gene expression proles of colon and rectal cancers from the TCGA and GEO database, and utilized univariate Cox analysis, lasso regression, and multivariable cox analysis to identify MRGs for risk models. Then GSEA and KEGG functional enrichment analysis were utilized to identify the metabolism pathway of MRGs in the risk models and analyzed these genes comprehensively using GSCALite. rectal the present pictures immunohistochemistry stain. the relationship between ENPP2 expression level and survival with colon the relationship between ENPP2 expression level and survival of patients with rectal cancer.


Background
Colorectal cancer is a common digestive tract malignant tumor that poses a serious threat to human health 1 . Given that the early symptoms of colorectal cancer are atypical, con rmed cases are often in the middle and late stages of the disease. The development of surgical techniques and adjuvant chemoradiotherapy has improved the survival time of colorectal cancer patients to some extent, but the 5 year survival rates of patients with colorectal cancer remain extremely low because of chemical drug resistance and other reasons 2 . Therefore, biomarkers that can provide accurate prognosis and predict therapeutic effects are bene cial for patients with colon cancer. The prognoses of patients with colorectal cancer considerably vary, and thus establishing an accurate and effective prognosis model is necessary.
Such model can be useful in assessing patients' risk and in treatment selection 3 . The AJCC-TNM stage is currently the most widely used prediction and clinical decision selection system for providing colorectal cancer prognosis. However, large differences in the prognoses of patients with the same clinical stage are often observed particularly because of the different gene expression pro les among individuals; thus, prognosis varies among patients 4 .
Current studies have shown that the origins of colorectal embryos, blood supply, function, and environmental carcinogens at different sites are different. Therefore, clinical characteristics, pathological stage, and the prognosis of colorectal cancer differ among primary sites [5][6][7] . The analysis of the differences in the molecular biology of colorectal cancer among different primary sites is necessary in the selection individualized treatment for patients with colorectal cancer. At present, with the deepening understanding of tumor biology and the complexity of tumor metabolism, metabolic reprogramming is found to be a marker of malignant tumor 8 . Compared with normal tissues, cancer cells have a high degree of metabolism heterogeneity 9 . Metabolic phenotypes evolve at the different stages of tumorigenesis, early tumor growth requires nutrient uptake and biosynthesis, and other subtypes of selective metabolic requirements occur during local in ltration and distant metastasis 10 . Two studies proposed prognostic models of prognosis-related metabolic genes for colorectal cancer prognosis 11,12 . In the present study, we modeled the prognosis genes of colorectal and rectal cancers separately for the individualized detection of patients with colorectal cancer.
We constructed prognostic models of prognosis-related metabolic genes for colon and rectal cancers, utilizing the expression pro les of TCGA databases. The prognostic models were further optimized through Lasso regression and Cox regression analyses, and the speci city and sensitivity of the models were evaluated through ROC curve analysis. Meanwhile, metabolism-related genes (MRGs) related to overall survival were screened out. Our data showed that colon and rectal cancers have speci c prognostic models, which are useful in guiding individualized prognosis and treatment.

Data collection
The transcriptome data and the relevant clinical data of colon cancer (including 398 cancer and 39 normal tissues) and rectal cancer (including 84 cancer and 2 normal tissues) were downloaded from TCGA database (https://portal.gdc.cancer.gov/) to construct risk prediction models, and GSE17538 of GEO database (https://www.ncbi.nlm.nih.gov/geo/) including 238 colon cancer patients was used as the validation dataset. MRGs were extracted from Gene Set Enrichment Analysis (GSEA). Meanwhile, the gene mutation data were obtained from GSCALite (http://bioinfo.life.hust.edu.cn/web/GSCALite/). The clinical and pathological characteristics of patients in TCGA and GEO cohorts are summarized in Table 1

Differential expression analysis of MRGs and gene set enrichment analyses
For colon cancer, the differences of MRGs between COAD and non-tumor samples were analyzed by Wilcoxon signed-rank test. An adjusted P < 0.01 and a |log2 (FC) |> 2 were considered the cutoffs for identifying differential expression MRGs (DEMRGs). KEGG analyses was then performed to discover the primary biological characteristics of these genes.

Construction and validation of risk prediction models using MRGs
We rst used univariate Cox analysis to initially identify potential MRGs, and least absolute shrinkage and selection operator (Lasso) regression analysis was subsequently applied by using "glmnet" and "survival" packages to eliminate false parameters caused by over tting. Multivariable regression analysis was used to generate MRGs with independent prognostic signi cance, and these MRGs were used to construct prognostic models with the following formula: Risk score = (Coe cient mRNA1 ×mRNA1 expression) + (Coe cient mRNA2 ×mRNA2 expression) + ⋯ + (Coe cient mRNAn × mRNAn expression).
Finally, Cox regression was used to establish overall survival prognostic risk model. In addition, univariate and multivariate regressions were performed to determine whether the model was an independent prognostic factor.

Gene set enrichment analysis (GSEA)
GSEA was performed to explore the characteristics and pathways that were enriched in the high-risk group and low-risk group. Using normalized enrichment score (NES) and normalized P-value calculation prede ned tags and KEGG pathway enrichment. Terms with |NES|>1 and P<0.05 were considered signi cantly enriched.

Patient samples
117 colon cancer and 114 rectal cancer were obtained from Tianjin cancer hospital. All tissue sections were con rmed by specialists to make a nal diagnosis. Histopathological diagnoses were made using the World Health Organization criteria. Table 5 summarized all patients' characteristics. This study has been approved by the hospital's Human Ethics and Research Ethics Committee and obtained written informed consent from all patients.

Immunohistochemistry Staining and Evaluation
Para n-embedded tissue sections (4µm) were dewaxed and rehydrated with xylene and fractional alcohol solution, the endogenous peroxidase activity was hardened with 3% hydrogen peroxide, and the sections were exposed to antigen by boiling in 10 mM citric acid buffer (pH 6.0). The sections were incubated with primary antibody at 4°C for 18 h, incubated with PV6001at 37°C for 30 min, and stained with DAB (Zhongshan Goldbridge Biotechnology Company) for 1 ~ 2 min. The control group was incubated with PBS instead of primary antibody.
To score staining of tissue sections, we selected 5 consecutive high-power elds on each section to estimate the average percentage of stained cells. For the expression of GSR in the nucleus, the mean value of expression in all cancer tissues was truncated, expression level ≤50% in tissues was classi ed as low expression group, and 50% of > was classi ed as high expression group ( gure 11A). For the expression of GSR in cytoplasm and ENPP2, both percentage and intensity were considered in the semiquantitative evaluation. The percentage of positive cells was divided into 0(0% positive cells), 1(1-25% positive cells), 2(26-50% positive cells), 3(50-75% positive cells), 4(>75% positive cells), and the strength was divided into 0(negative), 1(weak), 2(medium), and 3(strong). The intensity score (0-3) is multiplied by the percentage score (0-4), 0-4 being low expression and 5-12 being high expression 2.8 Statistical analysis All statistical analyses were performed using R software (version 3.6.0). Wilcox test or Kruskal-Wallis test were used to evaluate the distribution differences among variables. Kaplan-meier survival curve analysis and log-Rank test were used to analyze overall survival. Cox regression model was used to analyze the factors affecting patient survival. Cox proportional hazard regression model was used for univariate and multivariate analysis. Prognostic accuracy of the model was assessed by time-dependent ROC analysis.
P<0.05 was considered statistically signi cant.

Flow chart of this study
The detailed work ow for the construction of the risk models and downstream analysis is shown in Figure 1. We rst used the TCGA database to extract MRGs from the COAD and READ data sets and then observed the differential expression of MRGs at the gene expression level. Speci c prognostic risk models for colon and rectal cancers were constructed using the MRG set data, and the predictive ability of each risk model was tested through ROC analysis. The risk model of colon cancer was validated in the GEO database. Finally, Kaplan-Meier analysis and expression level detection of the genes in the risk models were performed, and the mutation information of each gene in COAD and READ in the models was examined.

Differential expression and functional annotation of MRGs in COAD
From the TCGA database, we rst downloaded mRNA expression data and the clinical information of 398 COAD tissue samples and 39 nontumor tissues (Table 1). A total of 772 MRGs were detected in this database, and 196 differentially expressed MRGs were obtained. The expression patterns of the differentially expressed MRGs in COAD and nontumor tissues are shown in the volcanic and heat maps (Figure 2A and B). Of the 196 differentially expressed genes, 98 genes were down-regulated, and 98 genes were up-regulated in the tumor tissues. To observe metabolic differences between colon cancer tissues and normal tissues, we used the KEGG pathway to conduct a functional enrichment analysis of the differentially expressed MRGs. We found that the up-regulated genes are mainly involved in the following metabolic pathways: purine metabolism, amino acid biosynthesis, carbon metabolism, and cysteine and methionine metabolism ( Figure 2C). Meanwhile, the down-regulated genes are mainly involved in fatty acid degradation, starch and sucrose metabolism, retinol metabolism, and galactose metabolism ( Figure  2D). We also downloaded the mRNA expression data and corresponding clinical information of 84 READ tissue samples and two nontumor tissues (Table 1). Owing to the small number of normal rectal tissue samples, we were unable to obtain effective differential gene data.

Construction and veri cation of the COAD risk model
To investigate the relationship between MRGs and colon cancer prognosis, we constructed a prognostic risk model. Initially, univariate regression analysis was performed on 13 genes that are signi cantly associated with prognosis, including two low-risk MRGs (GSR and SUCLG2P2) and 11 high-risk MRGs (CPT1C, PLCB2, PLCG2, PLA2G2D, PTGDS, GAMT, ENPP2, SULG2P2, PIK3CD, PIP4K2B, GPX3, and MAT1A; Figure 3A). Then, we used Lasso regression and multivariate regression to generate a nal prognosis model with eight MRGs, namely, CPT1C, PLCB2, PLA2G2D, GAMT, ENPP2, PIP4K2B, GPX3, and GSR ( Figure 3B and C). The speci c information of these genes, including full name, coe cient, and relevant pathway, is shown in Table 2. The patients with colon cancer in the TCGA database were divided into low-and high-risk groups, and Kaplan-Meier survival analysis was performed. The results showed that patients with high risk scores had signi cantly worse overall survival than those with low risk scores ( Figure 3D). Figure 3E shows the association between survival time and risk score. Survival time decreased signi cantly with increasing risk score. Time-dependent ROC analysis indicated that this prognosis model can accurately predict the OS for patients with colon cancer ( Figure 3F). The heatmap developed to show the gene expression pro les indicated that the high-risk genes (CPT1C, PLCB2, PLA2G2D, GAMT, ENPP2, PIP4K2B, and GPX3) were always expressed in the high-risk patients, whereas GSR was often expressed in the low-risk patients ( Figure 3G).
To verify the validity of the prediction model, we used the colon cancer data of GSE17538 from the Gene Expression Omnibus (GEO) as validation data. Consistent with the results obtained from TCGA database, our nding showed the prediction model can provide reliable prognoses for the patients in GSE17538. The patients with high risk scores had considerably poor overall survival ( Figure 4A). ROC curve analysis showed the sensitivity and speci city of the risk score for the OS when combined with clinical characteristics ( Figure 4B). The association between survival time and risk score indicated that the survival time decreased with the signi cant increase in risk score ( Figure 4C).

Construction of the READ risk model
Using the approach used in colon cancer analysis, we performed univariate regression analysis to construct a prognostic model for rectal cancer. Seven MRGs were used ( Figure 5A). A nal prognosis model with six MRGs (TDO2, PKLR, GAMT, EARS2, ACO1, and WAS) was constructed using Lasso regression and multivariate Cox regression ( Figure 5B and C). Speci c information on the genes, including full name, coe cient, and relevant pathways, is shown in Table 4. Meanwhile, the results of Kaplan-Meier survival analysis showed that patients with low risk scores had signi cantly better overall survival than those with high risk scores, suggesting that the established prognostic model is available ( Figure 5D). Survival time increased when the risk score decreased signi cantly ( Figure 5E), and ROC curve analysis showed the sensitivity and speci city of the risk score for OS when combined with clinical characteristics ( Figure 5F). Furthermore, the heatmap showed that the high-risk genes (TDO2, PKLR, and GAMT) were always expressed in the high-risk patients ( Figure 6G). Given the small number of rectal cancer cases in the GEO database, we were unable to use these data in validating the risk model.

Prognostic risk models of COAD and READ were independently related to OS
To further validate this prediction model, we performed univariate and multivariate regression analyses and used TCGA and GSE17538 in exploring the correlations of clinical parameters and risk score with overall survival in patients with colon cancer. The TCGA results of univariate Cox regression showed that pathological stage; T, N, and M stages; and risk score were all signi cantly correlated with overall survival (all P<0.001; Figure 6A). Multivariate Cox regression showed that T stage and risk score were correlated with overall survival in patients with colon cancer (P<0.05; Figure 6B). The GSE17538 results showed that not only univariate Cox regression analysis but also multivariate regression analysis indicated that the prognostic risk model of COAD was independently related to overall survival (P<0.05; Figure 6C and D). As to READ, the results of univariate regression and multivariate Cox regression analysis indicated that the obtained risk model was independently related to overall survival (P<0.05; Figure 6E and F).

Comprehensive analysis of all the genes in the prognostic models of COAD and READ
According to the previous results, the COAD risk model has eight genes, and the READ risk model has six.
To further evaluate the prognostic value of the above genes, we used the GEPIA database in performing Kaplan-Meier analysis. We found that high expression of CPT1C, PLCB2, or GPX3 indicates signi cantly poor prognosis, whereas the high expression of GSR indicates excellent prognosis in patients with colon cancer in the GEPIA database ( Figure 7A17). Although PLA2G2D, GAMT, ENPP2, and PIP4K2B cannot signi cantly predict prognosis, patients with high expression level of PLA2G2D, GAMT, ENPP2, or PIP4K2B always had poor prognosis ( Figure 7A). In READ, high TDO2 and GAMT expression levels were correlated with poor prognosis, whereas high EARS2 and ACO1 expression levels predicted good prognosis ( Figure 7B). Overall, the results of Kaplan-Meier analysis were generally consistent with the univariable Cox analysis results.
Meanwhile, we analyzed the mRNA expression levels of the genes in relative tumor and normal tissues with the GEPIA database. As shown in Figure 8A, in the COAD risk model, the gene expression levels of PLA2G2D, ENPP2, and GPX3 in colon cancer tissues were signi cantly lower than corresponding normal tissues, but changes in the other genes were not obvious. As to the READ risk model, only GAMT had signi cantly lower expression in rectal cancer tissues than in corresponding normal tissues ( Figure 8B). Furthermore, we investigated genetic alterations of genes in the prognosis models with the data of Gene Set Cancer Analysis Lite (http://bioinfo.life.hust.edu.cn/web/GSCALite/). The results showed that the genes in the COAD model were changed in all the 38 queried samples (100%; Figure 9A), and the genes in the READ model were changed in all the ve queried samples (100%; Figure 9B). In COAD, genetic alterations included single nucleotide polymorphism (SNP), delete mutation, and insert mutation, whereas the genetic alterations in READ just included SNP.

Involved metabolic pathways of the genes in the risk models analyzed by GSEA
To understand the metabolic characteristics of colon cancer and rectal cancer tissues, we used the GSEA approach in analyzing major differences in metabolic pathways between high-and low-risk patients. In colon cancer, inositol phosphate metabolism, tyrosine metabolism, arachidonic acid metabolism, glycerophospholipid metabolism, and ether lipid metabolism were activated in high-risk tumors, whereas fatty acid metabolism, butanoate metabolism, amino sugar and nucleotide sugar metabolism, propanoate metabolism, and glycolysis gluconeogenesis were inhibited in low-risk tumors ( Figure 10A). In rectal cancer, glycerophospholipid metabolism, arachidonic acid metabolism, alpha linoleic acid metabolism, linoleic acid metabolism, and glutathione metabolism were activated in high-risk tumors, whereas the TCA cycle, propanoate metabolism, one carbon pool by folate, lysine, valine leucine and isoleucine degradation were inhibited in low-risk tumors ( Figure 10B).

Expression of GSR and ENPP2 in colon and rectal cancers detected through immunohistochemistry staining
To test the validity of the prognostic models, we detected the protein expression levels of GSR and ENPP2 through immunohistochemistry staining in clinical colorectal cancer specimens and analyzed the relationship between the expression levels and patient prognosis. Immunohistochemical studies have demonstrated that GSR is expressed in the cytoplasm and nuclei of cancer cells ( Figure 11A) and ENPP2 is mainly expressed in the cytoplasm of cancer cells ( Figure 11H). Kaplan-Meier survival analysis showed that the high expression levels of GSR in the cytoplasm is signi cantly correlated with the poor prognosis of patients with colon cancer ( Figure 11B) but not with the prognosis of patients with rectal cancer ( Figure 11E). No signi cant correlation was found between the expression levels of GSR in nucleus and prognoses of patients with colon cancer ( Figure 11C) and rectal cancer ( Figure 11F). Meanwhile, the high expression of GSR the cytoplasm and nucleus could not reminder the prognoses of patients of colon cancer ( Figure 11D) or rectal cancer ( Figure 11G). In addition, patients with colon cancer and high ENPP2 expression levels in cancer tissues always had poor survival ( Figure 11I), and no signi cant correlation between ENPP2 expression and patient prognosis was found in rectal cancer ( Figure 11J).

Discussion
The Warburg effect is one of the important characteristics of tumor cells, which shows hyperactive glycolysis even when oxygen is su cient. However, this effect is only a part of metabolic reprogramming, and metabolic abnormalities in tumor cells are far more complex than previously thought. Tumor cells reprogram many metabolic pathways to the meet nutrient, energy, and REDOX requirements of rapid proliferation 13 . Metabolic reprogramming causes changes in the levels and types of speci c metabolites inside and outside tumor cells. Such changes can promote tumor growth and metastasis by in uencing gene expression, tumor microenvironment, and cell status 14 . Currently, drugs targeting these metabolic reprogramming can inhibit the malignant progression of tumors and promote the death of tumor cells [15][16][17] . Therefore, using metabolic gene changes in constructing a prognosis model for cancer is feasible.
Such model can be used to predict the patients' prognoses at the metabolic level.
In our study, we explored the expression pro les of MRGs using the TCGA database and identi ed molecular markers related to the prognoses of patients with colon or rectal cancer. We screened DEMRGs between colon cancer and nontumor tissues and performed KEGG analysis on the DEMRGs to understand metabolism in colon cancer. We found that the genes involved in amino acids biosynthesis and carbon metabolism were up-regulated, whereas the genes involved in fatty acid degradation were down-regulated. These results suggested that colon cancer cells are more prone to anabolism than normal tissues and inhibit catabolism. In addition, the genes involved in galactose, starch, and sucrose metabolism were down-regulated. In READ, owing to the small number of normal rectal tissue samples, we were unable to obtain effective differential gene data.
Recently, two study teams analyzed DEMRGs between colorectal cancer and normal tissues using TCGA database, then used DEMRGs in constructing a prognosis MRG model. One study constructed a model comprising six genes (NAT2, XDH, GPX3, AKR1C4, SPHK1, and ADCY5) for colorectal cancer 11 and another model comprising AOC2, ENPP2, ADA, ACADL, GPD1L, and CPT2 12 . Although they used the same data, they obtained completely different prognostic models. We considered that the difference might be due to differences in methods used in data processing or different P values de ned. In these studies, the authors modeled colon and rectal cancers together for prognosis. As the origin of colorectal embryos, blood supply, function, and environmental carcinogens vary among different sites, the prognosis of colorectal cancer varies among different primary sites [18][19][20] . Thus, in our study, we constructed a prognostic model of prognosis-related metabolic genes for colon cancer and another for rectal cancers, using the expression pro les of TCGA databases. Differentially expressed genes in the tumor and relative normal tissues have been used in constructing prognostic models in many studies. We think that by doing so, numerous genes that actually have prognostic signi cance would be overlooked. The reason is that genes that predict prognosis in cancer patients are in fact not necessarily expressed differently in tumors unlike in normal tissues. We did not directly use differential genes in constructing the prognostic models. Instead, we analyzed all the genes involved in metabolism. Finally, we constructed models using Cox regression and Lasso regression in COAD and READ, respectively. The nal prognosis model of COAD with eight MRGs (CPT1C, PLCB2, PLA2G2D, GAMT, ENPP2, PIP4K2B, GPX3, and GSR), and the mode READ with six MRGs (TDO2, PKLR, GAMT, EARS2, ACO1, and WAS) were identi ed, and multivariable Cox regression analysis showed that the risk score calculated by the model independently predicted the prognosis of patient with COAD or READ. Thus, prognostic models for colon and rectal cancers are quite different. Only one gene, GAMT, appeared in the risk models for colon and rectal cancers. This nding further showed that modeling colorectal and rectal cancers separately for the prediction of the prognoses of patients is necessary and reliable.
Among the eight genes in the prognostic model of colon cancer, three genes (ENPP2, CPT1C, and PLA2G2D) are involved in lipid metabolism, two genes (PIP4K2B and PLCB2) are involved in phosphoinositol metabolism, and two genes (GPX3 and GSR) are involved in glutathione metabolism, indicating that the metabolism of lipid, phosphoinositol, and glutathione are signi cantly important to colon cancer. GPX3, as a risk gene, is responsible for the peroxide metabolism of glutathione, and GSR, as a protect gene, is responsible for the reductive metabolism of glutathione. Thus, glutathione peroxidation may be associated with the malignancy of colon cancer. In our study, we used immunohistochemistry to verify the relationship between GSR and the prognosis of colorectal cancer. We found that high GSR expression level the cytoplasm is signi cantly correlated with the poor colon cancer prognosis but not with rectal cancer prognosis. Among the six genes in the model of rectal cancer, two genes (TDO2 and GAMT) are involved in amino acid metabolism, suggesting that amino acid metabolism is active, and the prognosis of rectal cancer patient is poor. Meanwhile, two genes (EARS2 and WARS) are involved in tRNA aminoacylation of tryptophany and glutamine, which is associated with good rectal cancer prognosis. In addition, PKLR is involved in pyruvate metabolism, and ACO1 is involved in the TCA cycle. GAMT, which is mainly involved in creatine synthesis, was present in the two tumor prognostic models, but the creatine system in relation to malignancy requires extensive investigation 21 . In addition to supporting the biosynthesis of creatine, GAMT supports the metabolism of polyamine and methionine synthesis in tumor cells, both of which are in high demand in proliferating sarcoma cells 22 .
Notably, two genes, GPX3 and ENPP2, were also identi ed in the other two studies for MRGs in colorectal cancer 11,12 . GPX3, the only extracellular GPX of the family of oxidoreductases, can catalyze the detoxi cation of lipid hydroperoxides by reduced glutathione. GPX3 has dual properties in tumors, acting as a tumor suppressor in some tumors and as a tumor promoter in others 23 . In colorectal cancer, tumor cells with low GPX3 expression were more sensitive to chemotherapy drugs, such as oxaliplatin and cisplatin 24 . ENPP2 can encode a secreted glycoprotein autotaxin, which converts lysophosphatidylcholine into lysophosphatidic acid, and the autotaxin-LPA axis is essential for cell survival, migration, proliferation, and differentiation in many cancers [25][26][27][28] . Moreover, ATX expression is positively correlated with microvascular density and tumor angiogenesis in colorectal cancer 29 . In our study, patients with colon cancer and high ENPP2 expression in cancer tissues always had poor survival, but no signi cant correlation between ENPP2 expression and patient prognosis was found in rectal cancer.
Gene mutations were signi cantly varied between the two cancer types. In COAD, genetic alterations included SNP, delete mutation, and insert mutation, whereas the genetic alterations in READ included SNP only. Moreover, the major differences in metabolic pathways between the low-and high-risk patients with colon or rectal cancer also differed signi cantly. In colon cancer, arachidonic acid metabolism, tyrosine metabolism, glycerophospholipid metabolism, inositol phosphate metabolism, and ether lipid metabolism were activated in the high-risk tumors, whereas in rectal cancer, glycerophospholipid metabolism, arachidonic acid metabolism, linoleic acid metabolism, alpha linoleic acid metabolism, and glutathione metabolism were activated in the low-risk tumors. Basing on this result, we found that arachidonic acid metabolism and glycerophospholipid metabolism were all activated in high-risk groups for colon and rectal cancers. However, no changes in the same metabolic pathway were found in low-risk patients with colon or rectal cancer.
Our study has some limitations. Although our prognostic models were conducted based on massive data mining, additional functional and mechanism experiments are still needed for exploring the effect of each gene in the prognostic model on the metabolism of colorectal cancer. Moreover, some important clinical factors and lifestyle habits, such as dietary habits, obesity, smoking, and diabetes, were not included in our analysis. However, these factors in uenced the prognoses of patients with colorectal cancer [30][31][32][33] .

Conclusions
In summary, using MRGs in colon and rectal cancers, we constructed prognostic risk models in the corresponding tumors. Meanwhile, through immunohistochemistry, we veri ed the relationship between the expression levels of some proteins in the models and patient prognosis. These ndings provide evidence of the role of metabolism in the pathogenesis of colorectal cancer and potential biomarkers for the diagnosis and treatment of colorectal cancer.     Construction of COAD risk model using TCGA database. A, The univariate Cox regression was performed to calculate HR and 95% CI of MRGs. B and C, The Lasso regression was performed using prognosissigni cant MRGs. D, Kaplan-Meier analysis was used to represent that the patients in the high-risk group had a signi cantly shorter overall survival time, while the patients in the low-risk group had longer overall survival time. E, The association between survival time and risk score was detected. F, ROC curve analysis of the sensitivity and speci city of the OS for the combination of risk score and clinical characteristics. G, The heatmap of the key MRGs expression pro les was showed.

Figure 3
Construction of COAD risk model using TCGA database. A, The univariate Cox regression was performed to calculate HR and 95% CI of MRGs. B and C, The Lasso regression was performed using prognosissigni cant MRGs. D, Kaplan-Meier analysis was used to represent that the patients in the high-risk group had a signi cantly shorter overall survival time, while the patients in the low-risk group had longer overall survival time. E, The association between survival time and risk score was detected. F, ROC curve analysis of the sensitivity and speci city of the OS for the combination of risk score and clinical characteristics. G, The heatmap of the key MRGs expression pro les was showed. Figure 4 certi cate the risk model for prognosis. B, ROC curve analysis of the sensitivity and speci city of the OS for the combination of risk score and clinical characteristics. C, The association between survival time and risk score was detected.

Figure 4
Veri cation of COAD risk model using GSE17538 data from GEO. A, Kaplan-Meier analysis was used to certi cate the risk model for prognosis. B, ROC curve analysis of the sensitivity and speci city of the OS for the combination of risk score and clinical characteristics. C, The association between survival time and risk score was detected.

Figure 5
Construction of READ risk model using TCGA database. A, The univariate Cox regression was performed to calculate HR and 95% CI of MRGs. B and C, The Lasso regression was performed using prognosissigni cant MRGs. D, Kaplan-Meier analysis was used to represent the prognosis of patients. E, The association between survival time and risk score was detected. F, ROC curve analysis of the sensitivity and speci city of the OS for the combination of risk score and clinical characteristics. G, The heatmap of the key MRGs expression pro les was showed.

Figure 5
Construction of READ risk model using TCGA database. A, The univariate Cox regression was performed to calculate HR and 95% CI of MRGs. B and C, The Lasso regression was performed using prognosissigni cant MRGs. D, Kaplan-Meier analysis was used to represent the prognosis of patients. E, The association between survival time and risk score was detected. F, ROC curve analysis of the sensitivity and speci city of the OS for the combination of risk score and clinical characteristics. G, The heatmap of the key MRGs expression pro les was showed.

Figure 6
Univariable and multivariable Cox regression analyses of OS in COAD and READ. A, univariable cox regression analyses of OS in COAD using TCGA data. B, multivariable cox regression analyses of OS in COAD using TCGA data. C, univariable cox regression analyses of OS in COAD using GEO data. D, multivariable cox regression analyses of OS in COAD using GEO data. E, univariable cox regression analyses of OS in READ using TCGA data. F, multivariable cox regression analyses of OS in READ using TCGA data.

Figure 6
Univariable and multivariable Cox regression analyses of OS in COAD and READ. A, univariable cox regression analyses of OS in COAD using TCGA data. B, multivariable cox regression analyses of OS in COAD using TCGA data. C, univariable cox regression analyses of OS in COAD using GEO data. D, multivariable cox regression analyses of OS in COAD using GEO data. E, univariable cox regression analyses of OS in READ using TCGA data. F, multivariable cox regression analyses of OS in READ using TCGA data.

Figure 8
Page 36/42 Expression levels of MRGs mRNA in prognosis models of COAD and READ. A, the expression level of genes in COAD risk model in colon cancer tissues and corresponding normal tissues. B, the expression level of genes in READ risk model in rectal cancer tissues and corresponding normal tissues.

Figure 9
The genetic alterations of the genes in the prognosis models of COAD and READ. A, The genetic alterations of the genes in the COAD model. B, The genetic alterations of the genes in the READ model.   The expression of GSR and ENPP2 in colon cancer and rectal cancer detected by immunohistochemistry stain. A, the present pictures of GSR detected by immunohistochemistry stain. B-D, the relationship between GSR expression level and survival of patients with colon cancer. E-G, the relationship between GSR expression level and survival of patients with rectal cancer. H, the present pictures of ENPP2 detected by immunohistochemistry stain. I, the relationship between ENPP2 expression level and survival of patients with colon cancer. J, the relationship between ENPP2 expression level and survival of patients with rectal cancer.