Functional pathway screening using GSEA
Clinical features information of a total of 587 samples including 552 EC and 35 healthy samples were achieved from the TCGA. Based on the mentioned data, GSEA indicated that whether the identified gene sets showed significant differences between EC and adjacent healthy tissues. And we found that these genes are significantly enriched in glycolysis, cholesterol homeostasis, fatty acid metabolism and xenobiotic metabolism. Glycolysis was shown to be the most relevant pathway (Fig. 1).
Establishment of Glycolysis-related genes and EC prognosis models
Firstly we integrated mRNA expression profiles and clinical information so as to screen out 520 EC samples. We analyzed 520 EC samples and found a total of 179 participating genes on the Glycolysis pathway in order to research the relationship between Glycolysis and the prognosis of EC patients. Then we randomly selected 260 samples as training cohort and built a prognostic model for 260 samples. univariate Cox regression analysis screened out 11 genes with the cutoff of p < 0.05 (TABLE1). These prognosis-related Glycolysis genes were further analyzed with the least absolute shrinkage and selection operator (LASSO) Cox regression algorithm (Fig. 2A-B). Then multivariate Cox proportional hazards regression analysis built the risk signature. We constructed prognostic models and the risk scores were calculated (TABLE2). Nine genes including B3GALT6, PAM, LCT, GMPPB, GLCE, DCN, CAPN5, GYS2 and FBP2 were identified as prognosis-related genes. The risk score come out as the followed: 0.000755345* B3GALT6-5.19E-05* PAM + 0.029807032* LCT-0.000708518* GMPPB + 0.000784398* GLCE-0.00015091* DCN + 0.000258397* CAPN5 + 0.031956259* GYS2 + 0.00431111* FBP2. According to the median levels of risk score, EC patients were classified into low- and high-risk groups. In the model, survival analysis indicated that low-risk patients had significantly longer overall survival time than high-risk patients (Fig. 2C). We also performed the receiver operating characteristic curve (ROC) analysis. As shown in Fig. 2D, ROC curve analysis was also completed according to the 1,3,5-year survival of the area under the receiver operating characteristic curve (AUC) value, the specificity and sensitivity were highest when the risk score was 0.794, 0.765, 0.773. The risk score and survival status indicated by the prognostic model was displayed in Fig. 3A-C. To assess whether the model was an independent predictor of EC, univariate and multivariate Cox regression analyses were conducted, including risk scores and clinical factors. And the results showed that this prognostic model showed moderate and independent prognostic power for Glycolysis pathway (Fig. 3D-E).
Validation of Glycolysis-related genes and EC prognosis
In order to verify the authenticity of the above prognostic model, we built another prognostic model using the testing cohort(260 samples). Based on the training cohort' cut-off, samples were divided into low- and high-risk group according to the median levels of risk score. Survival analysis indicated that low-risk patients had significantly longer overall survival time than high-risk patients (Fig. 4A). ROC curve analysis showed that the specificity and sensitivity were highest when the risk score was 0.717, 0.613, 0.643 according to the 1,3,5-year survival of the area under the receiver operating characteristic curve (AUC) value (Fig. 4B). The risk score and survival status indicated by the prognostic model was displayed in Fig. 4C-E. To assess whether the model was an independent predictor of EC, univariate and multivariate analyses were completed, including clinical factors and risk scores. The results showed that this prognostic model showed moderate and independent prognostic power for Glycolysis pathway (Fig. 4F-G). These conclusions were all consistent with previous prognostic model trends, validating the reliability of our speculation that Glycolysis is involved in the development of EC and affects the prognosis of EC.
Complete Glycolysis-related prognostic model
We finally built a complete prognostic model based on entire cohort(520 samples). Based on the training cohort' cut-off and median levels of risk score, samples were classified into low- and high-risk group, and survival analysis indicated that low-risk patients had significantly longer overall survival time than high-risk patients (Fig. 5A). ROC curve analysis showed that the specificity and sensitivity were highest when the risk score was 0.763, 0.692, 0.705 according to the 1,3,5-year survival of the area under the receiver operating characteristic curve (AUC) value (Fig. 5B). The risk score and survival status indicated by the prognostic model was displayed in Fig. 5C-E. To assess whether the model was an independent predictor of EC, univariate and multivariate analyses were done, including clinical factors and risk scores. The results showed that this prognostic model showed moderate and independent prognostic power for Glycolysis pathway (Fig. 5F-G). These further validate the reliability of our previous two prognostic models.
Hierarchical analysis of clinical features and Glycolysis-related hub genes
Univariate and multivariate Cox proportional hazards regression analysis identified nine genes including B3GALT6, PAM, LCT, GMPPB, GLCE, DCN, CAPN5, GYS2 and FBP2 to be prognosis-related. Among the 9 genes, we found significant differences in the expression levels of 7 genes in the high-risk and low-risk groups (Fig. 6A). In addition, the heatmap showed the expression of the nine genes in high- and low-risk patients in the TCGA dataset. We observed significant differences between the high- and low-risk groups associated with tumor status, grade, histological type and stage (Fig. 6B). We further analyzed the relationship between nine genes and various clinical features including risk, tumor status, grade, histological type, stage and age. We found that tumor status, grade, histological type and stage were significantly related with the 9 genes. We analyzed 9 genes for different clinical features respectively. We found that expression level of CAPN5, DCN, GLCE and GMPPB were significantly different in different age groups (FIGS1A-D).For different histological type, the expression level of B3GAL, CAPN5, GLCE, GMPPB and PAM were significantly different (FIGS2A-E). For different grade, the expression level of CAPN5, DCN, GLCE, GMPPB and PAM were significantly different (FIGS3A-E). For different tumor status, DCN, GMPPB and PAM expressed differently (FIGS4A-C). In next, the stratification analysis was done according to histological type, grade, stage, tumor status and age. Patients were stratified into endometrioid subgroups, grade G1&G2 subgroup, grade G3&G4 subgroup, tumor free subgroup, with tumor subgroup, stage I & stage II subgroup, stage III & stage IV subgroup, age > 60 subgroup and age ≤ 60 subgroup. For the patients in endometrioid subgroup, the survival time of patients in the low-risk case was significantly longer than that of patients in the high-risk case (Fig. 7A), which was consistent with the results belonging to the grade G1&G2 subgroup, grade G3&G4 subgroup, stage I & stage II subgroup, stage III & stage IV subgroup, tumor free subgroup, with tumor subgroup, age > 60 subgroup and age ≤ 60 subgroup. (Fig. 7B-I).
Building predictive nomogram
For the goal of establishing a clinically method to predict the survival probability with EC patients, we created a nomogram based on the TCGA cohort to estimate the probability of the 3- and 5‐year OS. The predictors of the nomogram contained 6 independent prognostic factors including stage, age, histological type, grade, tumor status and risk score Fig. 8A). The C‐index of the model for evaluation of OS was 0.871. The 45° line represented the best prediction. Calibration plots suggested that the nomogram performed well (Fig. 8B-C). ROC curve analysis also showed that the risk score AUC value of the model was 0.757, the clinical factors AUC value was 0.772, both much significantly higher than the clinical stage (AUC = 0.690), grade (AUC = 0.622), histological type (AUC = 0.608), tumor status (AUC = 0.751) and patients’ age (AUC = 0.578). Interestingly, when combined the risk score with clinical factors, the ROC curve of combination model was much higher than each alone (AUC = 0.805).
Based on 9 glycolysis-related gene expression, principal component analysis of the training cohort, testing cohort, and entire EC cohort displayed a significantly different distribution pattern of high and low risk which indicating their difference in glycolysis phenotype (Fig. 9A-C).
Genetic Information of the Glycolysis-related genes
The genetic alteration in the Glycolysis-related genes was analyzed with cBioPortal software. The network constructed by B3GAL, DCN, GLCE, GYS2 and their most associated neighbor genes were exhibited (only four out of the 9 genes had a joint node, while the remaining 5 genes had no junctions and were not shown) (Fig. 10A). Figure 10B-C illustrated that the 9 genes were altered in 92 (17%) from the 547 patients; LCT and CAPN5 showed most diverse alteration including amplification, missense mutation etc.
Validation of Glycolysis-related hub genes
All of the 9 glycolysis-related hub genes were validated in TCGA data. We found that DCN had the higher expression level of EC tissues than that of healthy tissues, while the B3GALT6, PAM, LCT, GMPPB, GLCE, CAPN5, GYS2 and FBP2 had the lower expression levels of EC tissues than that of healthy tissues (FIGS5A-I). We further validated the 9 glycolysis-related hub genes including B3GALT6, PAM, LCT, GMPPB, GLCE, DCN, CAPN5, GYS2 and FBP2 using immunohistochemistry. PAM, GMPPB, GLCE, CAPN5, GYS2 and FBP2 had the consistent expression trend. B3GALT6 and LCT were not available in the database (Fig. 11A-G). AUC value was used to identify the diagnostic efficacy of distinguishing normal and cancerous tissues, AUC value of 9 genes combined diagnosis was 0.992, which means the 9 genes can well identify cancer tissue and normal tissue (Fig. 11H). Regarding prognosis, Kaplan-Meier curves showed that higher expression of CAPN5, FBP2 and GYS2 correlated significantly with poor overall survival (OS), while the lower expression of DCN, GMPPB and PAM correlated significantly with OS (FIGS7).
Association between Nine hub genes and genetic mutations
We compared the frequency of genetic mutations between high- and low- risk score groups through R package maftools. The high-risk group had somatic mutations in the following order: TP53 > PTEN > PIK3CA > ARID1A > TTN > PIK3R1 > KMT2D > CTNNB1 > CTCF > MUC16 (FIGS6A). The low-risk group had somatic mutations in the following order: PTEN > ARID1A > PIK3CA > TTN > PIK3R1 > CTCF > KMT2D > ZFHX3 > MUC16 > MUC5B (FIGS6B). Furthermore, we found that the patients in the low-risk group showed obvious mutation signatures, compared with patients in the high-risk group (FIGS6C-D).
Identification of nine cell cycle-related genes risk score associated biological pathways
GSEA further analyzes high- and low- risk group samples, revealing the main enrichment pathway. The samples of the high-risk group were mainly concentrated on the pathways such as dna repair, g2m checkpoint, myc targets v1 and myc targets v2. The samples of the low-risk group were mainly concentrated on the pathways such as bile acid metabolism, fatty acid metabolism, heme metabolism and xenobiotic metabolism (Fig. 12).