3.1 Identification of prognosis-related DE-MRGs
The detailed flow chart for the prognostic predictive model construction in this study is shown in Figure 1. From the metabolic pathways in the KEGG database, we extracted a total of 944 metabolism-related genes. After the integration of the MRG expression data of 552 TCGA EC cancerous and 35 nontumor samples, we obtained 156 upregulated and 64 downregulated MRGs (Figure 2A). Univariate Cox regression analysis identified 47 genes significantly correlated with EC patients' OS (Figure 2B and Table 1). The expression pattern of 47 prognosis-related MRGs is shown in the heatmap and box plot in Figure 2C-D.
3.2 Functional enrichment of the prognosis-related DE-MRGs
Function annotation analyses of the 220 DE-MRGs and 47 prognostic DE-MRGs were then performed (Supplementary Figure 1 and Figure 3). GO enrichment showed that the prognostic MRGs were mainly involved in "carboxylic acid biosynthetic process ", "cofactor binding ", "alpha-amino acid metabolic process " and "cellular amino acid metabolic process" (Figure 3A). The enriched GO terms and related gene expression profiles are presented in Figure 3B. KEGG pathway enrichment analysis showed that the prognostic DE-MRGs were mainly involved in "purine metabolism", "biosynthesis of amino acids", "glycolysis / gluconeogenesis" and "glycerophospholipid metabolism" (Figure 4A). The expression levels of the correlated genes in the enriched KEGG pathways are displayed in the heatmap (Figure 4B).
3.3 The PPI network of 47 prognosis-related DE-MRGs and hub gene alteration analysis
Through the STRING website, we built a PPI network of the 47 DE-MRG proteins (Supplementary Figure 2). There were 47 nodes and 81 edges included in the network based on the interaction score criteria. The top 15 hub proteins with the highest connectivity degrees in the PPI network were as follows: ALDH18A1, GOT1, DGAT2, GPD1, GPI, PHGDH, GPT, ENTPD3, ASNS, ADCY9, CKMT1A, LPCAT1, MTHFD2, PPAP2C and PSAT1 (Figure 5A-B). The alteration results of the hub genes showed that GPI, GPT, ADCY9, and LPCAT1 ranked as the most frequently altered genes. GPI, GPT, and LPCAT1 were frequently overamplified in endometrial cancer patients, while ADCY9 more often had missense mutations (unknown significance) (Figure 5C).
3.4 Identification of a nine DE-MRG-based prognostic model
Next, we randomly divided the 541 TCGA EC patients into a training cohort (n = 272) and a testing cohort (n = 269). LASSO and multivariate Cox regression analyses identified 9 genes significantly associated with prognosis: CYP4F3, CEL, GPAT3, LYPLA2, HNMT, PHGDH, CKM, UCK2 and ACACB (Figure 6 and Table 2). According to the results of multivariate Cox regression analysis, we constructed the prognostic model as follows: risk score = (0.110103 × expression value of CYP4F3)+ (0.013456 × expression value of CEL) + (0.104444× expression value of GPAT3) + (0.017777 × expression value of LYPLA2) + (-0.10986 × expression value of HNMT)+ (0.017183 × expression value of PHGDH)+ (0.200964 × expression value of CKM)+ (0.051913 × expression value of UCK2)+ (0.634313 × expression value of ACACB).
Based on the mean risk score from the training cohort, the patients were divided into high-risk (n=136) and low-risk (n=136) subgroups. Each individual's risk score and survival status were ranked and displayed on the dot plot, which showed significant differences in OS between the groups (Figure 7A-B). Likewise, the Kaplan-Meier curve analysis demonstrated that the OS of the higher-risk group was significantly shorter than that of the low-risk group (P =1.971e-06) (Figure 7D). The expression profiles of the 9 prognostic DE-MRGs showed that UCK2, PHGDH, ACACB, LYPLA2, CYP4F3, GPAT3, CEL, and CKM were expressed at higher levels in the high-risk subgroup, while HNMT was expressed at lower levels (Figure 7C). ROC curve analysis revealed that the area under the ROC curve (AUC) of the prognostic MRG model was 0.771 (Figure 7E).
3.5 Validation of the efficacy of the 9-MRG prognostic signature
The risk model was then introduced into the testing cohort and entire cohort, and each individual's risk score was calculated. Based on the training cohort cut-off risk score, the patients in the testing cohort were classified as 121 high-risk and 148 low-risk individuals. In agreement with the results from the training cohort, the survival status (Figure 8A), survival time (Figure 8B) and KM curve analysis of the high-risk subgroup presented worse outcomes than those of the low-risk subgroup in the testing cohort with a shorter overall survival time(P =4.151e-04)(Figure 8D). The expression patterns of the 9 MRGs were consistent with those in the training cohort (Figure 8C), and the AUC of the risk model in the testing cohort was 0.796 (Figure 8E). Similar results were also observed for the entire cohort. The low-risk subgroup presented longer survival times and better survival statuses (Figure 9A-B). In addition, the expression profiles of the 9 MRGs were in line with those in both the training cohort and the testing cohort (Figure 9C). Kaplan-Meier curve analysis showed that the low-risk subgroup had longer OS times (P =1.242e-08) (Figure 9D), and ROC curve analysis showed that the AUC of the model was 0.78 (Figure 9E).
3.6 Validation of the expression levels of the 9 MRGs in clinical samples
The expression signatures of the 9 MRGs were subsequently explored in 30 endometrial cancer clinical specimens. The results demonstrated that UCK2, PHGDH, ACACB, LYPLA2, CYP4F3, GPAT3, CEL, and CKM mRNA level were upregulated in cancerous tissues, while HNMT was downregulated, which was in accordance with the above findings (Figure 10A-I).
3.7 The clinical independence and correlation estimation of the risk signature
Then, we combined the risk model with other clinical factors and performed univariate and multivariate analyses to examine the clinical independence of the model. The results showed that the model was able to serve as an independent prognostic indicator (both P<0.001) (Figure 11A-B). The AUC value of the prognostic model was 0.781, which was significantly higher than that of patients’ age (AUC= 0.535) and weight (AUC= 0.633), clinical-stage (AUC= 0.710), tumor grade (AUC= 0.656), histology (AUC= 0.522), and lymph node status (AUC= 0.697)(Figure 11C). Next, we assessed the risk scores, clinical features, and nine-gene expression profiles of EC patients and displayed them in the heatmap shown in Figure 11D. Interestingly, the clinical characteristics of the patients were highly in accordance with the risk level calculated from the model. The high-risk subgroup patients were characterized by late-stage, high-grade, serous carcinoma, and more metastatic lymph nodes (Figure 11D-E), which all presented worse outcomes. The correlations between each gene from the prognostic model and the patients' clinical features were also measured. PHGDH, ACACB, HNMT, CYP4F3, and LYPLA2 were shown to be significantly associated with patient prognosis (Figure 11F). The other clinical features of each prognostic MRG from the signature are presented in Supplementary Figure 3.
3.8 Nomogram building and validation
Based on the patients' risk scores and clinical features, we built a comprehensive prognostic nomogram to estimate EC patients' survival probability for 5 years based on the entire TCGA set. Seven independent prognostic parameters, including metabolic risk signature, age, grade, weight, histology, stage and lymph node status, were integrated into the nomogram (Figure 12A). The calibration plots showed excellent consistency between the nomogram predictions and actual observations in terms of the 3- and 5-year survival rates in the TCGA cohort (Figure 12B-C).