CC Patients Data Basic information
Overall, 566 colon cancer and 19 normal controls data on Affymetrix Human Genome U133 Plus 2.0 Array platform was downloaded from GEO datasets. The corresponding clinical characteristics data of the samples were shown in Table 1. The clinical feathers include gender, age, survival status, survival time, clinical tumor-node-metastasis (TNM) staging, clinical T stage, clinical N stage and clinical M stage.
Table 1
Clinical information of CC patients from TCGA.
Vriables
|
Patients (566)
|
Fustat
|
|
Alive
|
370
|
Death
|
191
|
unknow
|
5
|
Age (year)
|
|
<=65
|
222
|
>65
|
344
|
Gender
|
|
Male
|
256
|
Female
|
310
|
TNM Stage
|
|
I
|
33
|
II
|
264
|
III
|
205
|
IV
|
60
|
0
|
4
|
T stage
|
|
T1
|
11
|
T2
|
45
|
T3
|
367
|
T4
|
119
|
Tis
|
3
|
T0
|
1
|
unknow
|
20
|
M stage
|
|
M0
|
482
|
MI
|
61
|
unknow
|
23
|
N stage
|
|
N0
|
302
|
N1
|
134
|
N2
|
98
|
N3
|
6
|
unknow
|
26
|
Filtered significant differences glycolysis gene between the CC samples and normal control
Six glycolysis-related genesets were collected from the MsigDB, including REACTOME_GLYCOLYSIS, WP_COMPUTATIONAL_MODEL_OF_AEROBIC_GLYCOLYSIS, WP_GLYCOLYSIS_IN_SENESCENCE, REACTOME_REGULATION_OF_GLYCOLYSIS_BY_FRUCTOSE_2_6_BISPHOSPHATE_METABOLISM, HALLMARK_GLYCOLYSIS and GO_GLYCOLYTIC_PROCESS. A total of 301 glycolysis-related gene in the 6 genesets were selected for the additional research, and 217 glycolysis-related genes were significant differentially expressed in CC samples compared with normal controls (Fig. 1).
Construction of the Glycolysis-Related Genes prognostic prediction model in CC Patients.
To construct the novel biomarkers model for predicting the CC patients’ prognosis, uniCox and multiCox regression analysis on the differentially expressed glycolysis-related genes were performed. 13 risk genes (NUP107, SEC13, ALDH7A1, ALG1, CHPF, FAM162A, FBP2, GALK1, IDH1, TGFA, VLDLR, XYLT2, OGDHL) were identified (P < 0.05, Table 2). Then, prognostic gene model based on the 13 glycolysis-related genes divided patients into low- and high-risk group was established as follows: Risk score = (-0.29×NUP107 Expression + -0.35×SEC13 Expression + 0.30×ALDH7A1 Expression + -0.29×ALG1 Expression + 0.45×CHPF Expression + -0.46×FAM162A Expression + 0.91×FBP2 Expression + -0.65×GALK1 Expression + -0.34×IDH1 Expression + 0.36×TGFA Expression + 0.18×VLDLR Expression + -1.28×XYLT2 Expression + -0.41×OGDHL Expression). Then, the 13 gene alterations statue in CC tissue were analyzed, results showed that alterations of NUP107, SEC13, ALDH7A1, ALG1, CHPF, FAM162A, FBP2, GALK1, IDH1, TGFA, VLDLR, XYLT2, OGDHL were 3%, 0.8%, 1.9%, 1%, 1.9%, 0.2%, 2.3%, 1.5%, 1.7%, 0.4%, 4%, 4% and 4%, respectively (Fig. 2A). Many mutation occurred in gene domains, and the specific mutation sites were presented in Fig. 2B. Finally, these 13 genes expression in the CC patients and normal tissues was further analyzed, NUP107, SEC13, ALDH7A1, ALG1, CHPF, GALK1, XYLT2 and OGDHL were highly expressed in CC, however, FAM162A, FBP2, IDH1, TGFA and VLDLR showed downregulated (P < 0.05, Fig. 2C).
Table 2
13 genes were selected via uniCox regression analysis.
ID
|
HR
|
HR.95L
|
HR.95H
|
coxPvalue
|
NUP107
|
0.64
|
0.49
|
0.85
|
0.00
|
SEC13
|
0.64
|
0.43
|
0.95
|
0.03
|
ALDH7A1
|
0.70
|
0.50
|
0.97
|
0.03
|
ALG1
|
0.60
|
0.44
|
0.82
|
0.00
|
CHPF
|
1.38
|
1.03
|
1.85
|
0.03
|
FAM162A
|
0.59
|
0.40
|
0.88
|
0.01
|
FBP2
|
2.41
|
1.33
|
4.38
|
0.00
|
GALK1
|
0.38
|
0.24
|
0.60
|
0.00
|
IDH1
|
0.54
|
0.36
|
0.80
|
0.00
|
TGFA
|
1.75
|
1.26
|
2.43
|
0.00
|
VLDLR
|
1.17
|
1.02
|
1.35
|
0.03
|
XYLT2
|
0.28
|
0.13
|
0.61
|
0.00
|
OGDHL
|
0.56
|
0.41
|
0.75
|
0.00
|
Efficacy of Risk Score in GC Patients.
Based on the gene model, 280 CC patients and 281 CC were clarified into the high- and low-risk group by the median risk score (Fig. 3A). KM analysis were carried out between low- and high-risk group, high-risk group had obviously poorer prognosis (P < 0.05, Fig. 3B). ROC analysis was carried out, AUC were calculated which was 0.716 showed good prediction effect (Fig. 3C). The expression of the 13 genes were calculated, TGFA FBP2, CHPF, VLDLR were significant higher expressed in high-risk group, and NUP107, OGDHL, SEC13, IDH1, GALK1, FAM162A, ALG1, XYLT2 ALDH7A1 were downregulated (P < 0.05, Fig. 3D). Risk plot indicated that high-risk group was closely related with poor prognosis (Fig. 3E). PCA presented that high- and low-risk groups can be significantly distinguished via our model reduce the dimension of multiple genes expression (Fig. 3F). All the analysis results indicated that our gene model based on risk score had a good efficacy.
The Independent risk factor of Risk Score in CC patients and the its’ relationship with patients Clinical Characteristics.
Univariate and multivariate independent prognostic analysis were carried out to identify independent prognostic factor including age, gender, TNM staging and risk score. Data showed that age, TNM staging and risk score were independent prognostic factors in CC patients, which all positively correlated to the poor survival prognosis. (P < 0.05, Fig. 4A-B). For more accuracy analysis the relationship between risk factors and patient OS, firstly, a series Kaplan-Meier curve analysis were carried out. Results reveal that age > 65years, TNM III-IV, T3-4, N1-3 and M1 were positive with poor OS (P < 0.05, Fig. 4C). Then, the relationship between patients Clinical Characteristics and patient OS in low- and high-risk group were performed, the high-risk patients were positively associated with poor overall survival in the age < = 65years, age > 65years, Female, Male, TNM I-II, TNM III -IV, T1-2, T3-4, M0, M1, N0, N1-3 subgroups (P < 0.05, Fig. 4D). These results once again prove the reliability of our glycolysis-related genes model in CC.
Enrichment analysis of different genes between low- and high-risk group
9656 different expressed genes were screened out between low- and high-risk group. Then, Go and KEGG Enrichment analysis were conducted. Results of GO analysis indicated that genes were enrichened on mitochondrion and ATP related pathway (P < 0.05, Fig. 5A). The results of KEGG were enrichened on Carbon and Oxidative metabolism (P < 0.05, Fig. 5B). These results show that the gene model can correctly distinguish high-risk group from low-risk group, and verify the correctness of our model.
Researches had demonstrated that glycolysis could promote EMT in cancers. For preliminary explore the role of EMT in CC, the expression of the biomarker of EMT (SNAI1, SNAI2, TWIST1, TWIST2 and ZEB2) were analysis in our study. SNAI1 and TWIST1 showed significantly higher expression in CC tissue (P < 0.05, Fig. 6A). Whereas, and SNAI1, SNAI2, TWIST1, TWIST2 and ZEB2 were all upregulated in high-risk group (P < 0.05, Fig. 6B). It suggested that EMT may play a role in high glycolysis tumor environment of CC patients. The relationship between glycolysis and EMT in CC need further deeper research.
To lay the foundation for the next step of basic research, we validated the expression of all the model genes WB. Our results showed that NUP107, SEC13, ALDH7A1, ALG1, CHPF, GALK1, XYLT2 and OGDHL were highly expressed, whereas, FAM162A, FBP2, IDH1, TGFA and VLDLR were lowly expressed in CC patients (P < 0.05, Fig. 7). These gene expression level was consistent with the genes’ expression in GSE143985.