Construction and Validation of a Metabolic Reprogramming Related Prognostic Model for Hepatocellular Carcinoma

analysis and analysis


Abstract Background
Metabolic reprogramming is an important hallmark in the development of malignancies. Numerous metabolic genes have been demonstrated to participate in the progression of hepatocellular carcinoma (HCC). However, the prognostic signi cance of the metabolic genes in HCC remains elusive.

Methods
We downloaded the gene expression pro les and clinical information from the GEO, TCGA and ICGC databases. The differently expressed metabolic genes were identi ed by using Limma R package. Univariate Cox regression analysis and LASSO (Least absolute shrinkage and selection operator) Cox regression analysis were utilized to uncover the prognostic signi cance of metabolic genes. A metabolism-related prognostic model was constructed in TCGA cohort and validated in ICGC cohort. Furthermore, we constructed a nomogram to improve the accuracy of the prognostic model by using the multivariate Cox regression analysis.

Results
The high-risk score predicted poor prognosis for HCC patients in the TCGA cohort, as con rmed in the ICGC cohort (P < 0.001). And in the multivariate Cox regression analysis, we observed that risk score could act as an independent prognostic factor for the TCGA cohort (HR (hazard ratio) 3.635, 95% CI (con dence interval)2.382-5.549) and the ICGC cohort (HR1.905, 95%CI 1.328-2.731). In addition, we constructed a nomogram for clinical use, which suggested a better prognostic model than risk score.

Conclusions
Our study identi ed several metabolic genes with important prognostic value for HCC. These metabolic genes can in uence the progression of HCC by regulating tumor biology and can also provide metabolic targets for the precise treatment of HCC. Background Hepatocellular carcinoma (HCC) ranks sixth among the most common neoplasms with one of the highest mortality rates among cancers [1,2]. And HCC is a highly recurrent and drug-resistant malignancy that is usually diagnosed at an advanced stage [3]. An in-depth recognition of the regulatory mechanisms in tumor development can improve the e ciency of early diagnosis and precise treatment, thus improving the clinical prognosis of HCC patients.
Metabolic reprogramming is recognized as an important hallmark in cancer [4]. It has been reported that metabolic reprogramming provides a selective advantage for the growth and proliferation of tumor cells by modifying their metabolic programs to adapt to their increasing requirements of energy and macronutrients [5]. It has the ability to regulate both genetic and epigenetic events in tumor cells [6].
Moreover, cellular metabolic pathways have been reported to in uence tumor immunotherapy by regulating T cell function and longevity [7]. And changes in tumor metabolic activity may help identify new forms of communication in the tumor microenvironment [8]. The liver provides a critical platform for many physiological functions including macronutrient metabolism, homeostasis of lipid and cholesterol, immune system support, decomposition of exogenous compounds and so on [9]. The metabolic processes in human hepatocellular carcinoma are also complicated. The upregulated GLUT1 and GLUT2 isoforms in HCC can enhance glucose uptake [10,11]. Aberrant changes in lipid metabolism such as phospholipids and fatty acids metabolites were found in serum samples of HCC patients [12,13]. And tumor metabolic heterogeneity in HCC can also affect the bone marrow-derived inhibitory cells (MDSC) and tumor-related macrophages (TAM) in the tumor microenvironment [14][15][16]. Therefore, metabolic reprogramming is of great signi cance in HCC, and its regulatory directions are extensive and complex.
However, the underlying mechanisms of metabolic abnormalities in the development of HCC remain unclear. Nevertheless, few studies have systematically investigated the metabolic reprogramming and its relationship with prognosis in HCC.
In this study, we analyzed the high-throughput data of HCC to systematically investigate the metabolic changes and their potential regulatory mechanisms in hepatocellular carcinoma. And we also performed LASSO Cox regression analysis to select a gene set with the most prognostic value to construct a metabolism-related prognostic model, thus shedding a light into the clinical signi cance and application value of the metabolic reprogramming.

Microarray Datasets
The gene expression matrix les (including 183 HCC samples and 15 adjacent noncancerous samples) from GSE112790 based on platform GPL570 were downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/).

RNA-Sequencing Datasets
The gene expression data and the corresponding clinical information of HCC were downloaded from The Cancer Genome Atlas (TCGA) website (https://portal.gdc.cancer.gov/) and the International Cancer Genome Consortium (ICGC) website (https://icgc.org/). 424 TCGA samples from the United States and 445 ICGC samples from Japan were further studied. The log2 transformations were performed for all the expression data. The gene expression data and the corresponding clinical information are publicly available and analyzed under the guiding of pertinent instructions and regulations.

Differentially Expressed Genes (DEGs) Analysis
We used the limma R package to determine the DEGs of HCC cohorts in TCGA, ICGC and GSE112790. By studying metabolism-related signaling pathways in GSEA software(c2.cp.kegg.v7.0.symbols.gmt), we screened out 944 metabolic genes. The metabolism-related DEGs were obtained by intersecting the previously obtained DEGs with metabolic genes.

Functional and Pathway Enrichment Analysis
Gene annotation enrichment analysis was performed by the clusterPro ler R package to investigate the metabolism-related DEGs at the functional level. Metabolism-related biological processes were identi ed by Gene Ontology (GO) analysis. And Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis identi ed the metabolism-related pathways. The analyses were performed with the cutoff of p < 0.05 and q < 0.05, false discovery rate (FDR) < 0.05.

Construction and Validation of a Metabolic Prognostic Model
We calculated the HR of DEGs in the TCGA and ICGC cohorts by using the univariate Cox regression analysis (P < 0.05). Then we used LASSO Cox regression analysis to screen out the genes with the most prognostic value in DEGs. The formula that risk score = Σ cox gene coe cient of gene Xi × scale express the value of Xi was applied to establish a risk score model to predict the patient survival. The HCC cohorts with integrate clinical information were analyzed to determine independent prognostic factors through multivariate Cox regression analysis. Finally, the area under the curve (AUC) of receiver Operating characteristic (ROC) curve was quanti ed by the timeROC R package to show the sensitivity and speci city of risk score in predicting the overall survival for HCC. We also established a Nomogram to show the prognostic value of risk scores by using the coe cients of multivariate Cox regression analysis. Concordance index (C-index) was calculated and used to evaluate the predictive accuracy of the prognostic model and the nomogram. Moreover, we plotted the calibration curves to appraise the predicting ability of the nomogram through the rms R package.
2.6 Correlation analysis of metabolism-related genes and tumor immunity in the TCGA HCC cohort.
We obtained the immune related genes from immune-related signaling pathways in GSEA software and performed the correlation analysis between risk genes and immune related genes. Then we utilized GSVA and GSEABase R packages to perform ssGSEA analysis on the TCGA cohort. Through limma and sparcl R packages, we clustered the HCC cohort into high, medium and low immune score groups, and used limma R package to identify the DEGs between high and low immune score groups. The immune DEGs and metabolism-related genes were intersected to obtain metabolic genes in the high and low immune score groups. Then, the metabolism genes related to immune grouping were subjected to GO and KEGG analyses in the Metascape website (https://metascape.org/gp/index.html), and their correlations with immune in ltration were obtained in the TIMER website (https://cistrome.shinyapps.io/timer/).

Statistical Analysis
R software was used to conduct all the statistical analyses. The statistical signi cance of two normally distributed variables was accessed by unpaired t-test and the correlation between the gene sets was determined with Spearman's test. Kaplan-Meier method was used to compare the survival curves of each subgroup, and log-rank test was used to determine the statistical signi cance of individual variable in survival. All statistical tests were two-tailed and p < 0.05 was considered statistically signi cant.

Results
3.1 Identi cation of metabolism-related DEGs between HCC samples and normal samples.
In order to identify metabolism-related DEGs between HCC and normal tissues, we conducted differential expression analysis in TCGA(LICH), ICGC(LIRI-JP) and GSE112790 datasets by using the limma R package (FDR < 0.05 and |log2 FC| > 1). We obtained 457 DEGs, of which 100 were related to metabolic reprogramming. (Supplementary Fig.S1 and Supplementary Table S1).

GO and KEGG analyses in metabolism-related genes.
To gain further insight into the function of the identi ed metabolism-related DEGs, we performed GO and KEGG analyses. The DEGs were mainly involved in biological processes associated response to xenobiotic stimulus, nucleoside phosphate biosynthetic process, carboxylic acid biosynthetic process and so on (Fig. 1a). Moreover, the DEGs were involved in multiple KEGG pathways, including retinol metabolism, drug metabolism-cytochrome P450, arachidonic acid metabolism, et al (Fig. 1b).

Construction and validation of a metabolism-related prognostic model in the TCGA and ICGC cohorts.
Considering the difference in the expression of metabolism related genes between HCC and normal tissues, we determined to analyze the clinical application value of the DEGs in depth. Among the 100 metabolic DEGs, 37 genes were demonstrated to be signi cantly associated with the overall survival status of HCC patients through the univariable Cox regression analysis ( Supplementary Fig. S2a), and 9 of them were identi ed to have the maximum prognostic value through the LASSO Cox regression analysis ( Supplementary Figs. S2b,c ). Finally, we applied these genes to establish a metabolism-related model in the TCGA cohort to estimate the prognostic value of metabolic reprogramming in HCC patients. And the identical formula was utilized in the ICGC cohort to determine whether the prognostic model was reliable in different populations (Figs. 2a-f). The formula for the prognostic model was that risk score = Σ Cox coe cient of gene Xi × scale expression value of gene Xi. Then, HCC patients were categorized into high-or low-risk groups with the optimal cut-off value of risk score. The high-risk score predicted poor prognosis for HCC patients in the TCGA cohort (P = 1.847e-07), as con rmed in the ICGC cohort (P = 7.243e-06). And we found that the risk score in TCGA cohort had excellent prognostic value in both univariate and multivariate Cox regression analyses, which was consistent with the results in ICGC cohort (Figs. 3a-d). The results showed that metabolic alterations had important clinical signi cance in the prognosis of HCC patients. And the risk score of the metabolism-related prognostic model can act as an independent prognostic factor for the TCGA cohort (HR (hazard ratio) 3.635, 95% CI (con dence interval)2.382-5.549) and the ICGC cohort (HR1.905, 95%CI 1.328-2.731) in HCC patients.

Evaluation of the metabolism-related prognostic model in the TCGA and ICGC HCC cohorts.
To further evaluate the prognostic accuracy of the metabolism-related prognostic model, we calculated the area under ROC curve (AUC) and the C-index value of the model (Figs. 4a-f). And we also constructed a nomogram with comprehensive consideration of risk score and clinical characteristics for clinical use (Fig. 4g). We found that the AUC of the prognostic model on OS status was 0.758 at 1 year, 0.742 at 2 years, 0.693 at 3 years, 0.7 at 4 years and 0.693 at 5 years in TCGA cohort, while 0.749 at 1 year, 0.748 at 2 years, 0.763 at 3 years, 0.83 at 4 years and 0.672 at 5 years in ICGC cohort (Figs. 4a,d). We also found the overall AUC (0.755) of risk score was superior to that of grade (0.478), stage (0.698), T (0.704), N (0.506), M (0.508) in TCGA cohort. And the overall AUC (0.749) of risk score in ICGC cohort was superior to that of priorMalignancy (0.526), indicating that the prognostic model was well constructed (Figs. 4b,e). In addition, through principal component analysis (PCA) of unsupervised learning, we found that HCC samples were well distinguished between high and low risk grouping in the predictive model (Figs. 4c,f). Furthermore, we observed that the C-index of risk score was 0.7260 in TCGA cohort and 0.7195 in ICGC cohort. And the C-index of nomogram was 0.7510 in TCGA cohort and 0.8240 in ICGC cohort, indicating higher predictive accuracy of nomogram than risk scores. Finally, we conducted the calibration curves at 3 years and at 5 years, to validate the nomogram's performance and observed that the predictive curves worked well (Figs. 4h,i).

Clinical relevance of metabolism-related prognostic model and its constructed genes in TCGA cohort
We conducted a clinical correlation analysis for risk score of the prognostic model to further understand the in uence of metabolic reprogramming on the clinical prognosis of HCC patients. The results showed that risk score had signi cant correlations with the status(p = 1.11e-06), grade(p = 0.003), stage(p = 0.014) and T(p = 0.013) in HCCs in TCGA cohort (Figs. 5a-d). The results suggested that Risk score correlates with the progression of hepatocellular carcinoma. And a higher risk score indicates a higher degree of malignancy in HCC. Therefore, metabolic alterations are closely related to the clinical prognosis of HCC patients.

The role of risk score in the prognosis of HCC patients strati ed by clinicopathological variables
To explore the prognostic value of risk score in HCC patients strati ed by different clinicopathological variables, we strati ed HCC patients by age, sex, grade, stage and TNM stage. The results showed that the high-risk group had a poor prognosis among patients strati ed by different clinical variables (Fig. 6). These results suggested that metabolism-related prognostic risk score was not limited by clinicopathological variables in predicting the prognosis of HCC patients.
3.7 Correlation between metabolism-related genes and tumor immunity in the TCGA HCC cohort.
Since our analysis found that there was a signi cant correlation between the expression of metabolismrelated risk genes and immune-related genes, we further analyzed the metabolic alterations in the high and low immune-rating groups (Supplementary Figs. S4a,b). And we found that risk score was signi cantly higher in the low immune group than in the high immune group (Supplementary Fig. S4e). There were 23 metabolism-related genes overexpressed in the high immune score group, while 37 in the low immune score group (FDR < 0.05 and |log2 FC| > 1, Supplementary Figs.S4c,d). By GO and KEGG analyses, it was found that the metabolic genes in the high immune score group were mainly involved in the organophosphate biosensitization and arachidonic acid signaling pathway, et al. While in the low immune score group, the metabolic genes were mainly enriched in lipid biosensitization, glutathione metabolism and so on (Figs. 7a,b). We also found that there was a close relationship between metabolism-related genes and tumor immune in ltration, especially in the high immune score group ( Supplementary Fig.S5). We speculated that metabolism-related genes may indirectly regulate tumor immunity through metabolic alterations to some extent, thus affecting the prognosis of HCC patients.

Discussion
Metabolic reprogramming plays an important role in the development of HCC [17]. Patients with reduced serum albumin levels had larger and more aggressive tumors in HCC [18]. HCC cells sustain the antioxidant reaction by delivering glucose into the pentose phosphate pathway through ROS-mediated PKM2 inhibition [19]. And TAM used the lactic acid released by the HCC cells to induce tumor stem celllike characteristics and drug resistance [20]. Therefore, further studies on the role of metabolic reprogramming in HCC development and its prognostic value are urgent. And our study provides insights into the underlying mechanisms of metabolic alterations and their clinical prognostic signi cance in hepatocellular carcinoma.
In this study, we screened out 100 metabolism-related DEGs to explore the metabolic alterations in the development of HCC. And the results of gene annotation enrichment analyses suggested that metabolism-related DEGs were enriched in the biological process, including response to xenobiotic stimulus, xenobiotic metabolic process, nucleoside phosphate biosynthetic process and carboxylic acid biosynthetic process, et al (Fig. 1a). Previous studies have shown that these biological processes are indeed involved in the development and progression of hepatocellular carcinoma by mediating druginduced liver disease, affecting DNA replication and repair, RNA synthesis, and acting as potential pharmacophore [21][22][23]. And the KEGG analysis revealed that metabolism-related DEGs were involved in pathways like retinol metabolism, drug metabolism, metabolism of xenobiotics by cytochrome P450, glycolysis/gluconeogenesis, fatty acid degradation and tyrosine metabolism, et al (Fig. 1b). This suggested that metabolic reprogramming may mediate tumorigenesis, metastasis, and tumor resistance in HCC by these pathways, which was consistent with the current researches [24][25][26]. In particular, glycolytic related genes such as ALDH3A1, ALDOA were up-regulated in HCC compared with normal tissues, while the expression of gluconeogenic related genes such as FBP1 and PCK1 were decreased. Enhanced glycolysis in hepatocellular carcinoma can be explained as metabolic adaptation to the tumor microenvironment. In addition, our ndings were consistent with those of Curtis C, et al [27], in which glycosoplasmas were transferred to regulate oxidative stress and countered SAM elevation, thereby regulating hepatocellular carcinoma biosynthetic pathways.
In the retrospective analysis, we constructed and veri ed a metabolism-related model to predict the prognosis for HCC patients (Fig. 2). The results showed that there was a signi cant difference in overall survival between the high-and low-risk groups. Moreover, through univariable and multivariable Cox regression analyses, we found that risk score could be used as an independent prognostic factor for HCC, and its prognostic value was similar or even superior to some traditional clinical characteristics to a certain extent. The 9 metabolism-related genes that constituted our prognostic model were RRM2, G6PD, HMOX1, LCAT, AKR1B10, CYP2C9, HMGCS2, TK1 and DPYS, which covered key enzymes of various metabolic pathways. It has been demonstrated that ribonucleoside diphosphate is responsible for transforming ribonucleoside diphosphate into 2'-deoxyribonucleoside, and the ribonucleoside reductase subunit M2 (RRM2) is a therapeutic target for HCC [28]. Up-regulated G6PD can promote the migration and invasion of HCC cells by inducing epithelial-mesenchymal transformation [29]. And HMOX1 in uences hepatocellular carcinoma progression by regulating iron metabolism [30]. LCAT is hypermethylated in HCC and can affect the progression of HCC [31]. Immunohistochemical staining with AKR1B10 showed a negative correlation with tumor proliferation in HCC [32]. As one of the most critical drug metabolic enzymes, CYP2C9 regulates the metabolic processes of many carcinogens and drugs in HCC [33]. HMGCS2 catalyzes the formation of HMG-CoA in HCC [34]. And TK1 can catalyze the generation of deoxythymine monophosphate (dTMP) and promote the proliferation of HCC and other malignancies [35]. Among the 9 risk genes, the relationship between DPYS and the development of HCC has not been reported. DPYS can affect the levels of dihydrouracil and dihydrothymine by regulating the metabolism of dihydropyrimidine, which is related to cell transformation and cancer progression [36]. Furthermore, we plotted the receiver operator characteristic curves (ROC) and calculated the AUC value to evaluate the sensitivity and speci city of the prognostic model (Fig. 4). The results suggested that the AUC value of risk score in the TCGA cohort was greater than that of age, gender, grade, stage and TNMstaging. This further demonstrated the clinical application value of this metabolism-related prognostic model. Finally, by establishing a nomogram and comparing the C-Index values, we found that comprehensive consideration of risk score and other traditional clinical features could signi cantly improve the prognostic accuracy. However, there are still some limitations in our study. For example, the patient's gene expression data and the corresponding clinical characteristics are all publicly available, which are often not comprehensive enough, and more prospective studies need to be carried out clinically. In addition, although we used hepatocellular carcinoma data from the United States and Japan for comparative analysis, more regional datasets need to be included to eliminate geographic differences. By exploring the differential expression of metabolic genes between the high and low immune score groups as well as the biological processes and signaling pathways involved, we observed that metabolic genes in the high immune score group participated in the NAD metabolic process, regulation of leukocyte apoptosis process, arachidonic acid metabolism and so on, while metabolic genes in the low immune score group were mainly involved in the glutathione metabolism, alcohol metabolism process, et al (Fig. 7). These results were consistent with the existing studies such as nicotinamide adenine dinucleotide (NAD+) regulated the immune function of macrophage in in ammation [37], and Arachidonic acid (AA) and its derivatives mediated in ammatory responses [38]. Activated T cells can resist the oxidation of glutathione (GSH) and prevent oxidative damage [39]. It can protect host immune cells through antioxidant mechanisms and plays an active role in immunocompromised states [40]. Alcohol-mediated oxidative stress can disrupt immune signaling pathways and increase the risk of immune system dysfunction [41]. However, oxidative metabolites of ethanol can induce Kupffer cells to release proin ammatory cytokines, promoting neutrophil in ltration [42]. This indicated that the mechanism and direction of alcohol metabolism regulating immunity were complex in the immunocompromised state. Moreover, metabolic genes in the high immune score group were signi cantly correlated with immune in ltrating cells in HCC, which further demonstrated the interaction between metabolic changes and the tumor immunity. These results suggested that metabolic reprogramming may affect the tumor microenvironment in HCC to some degree.

Conclusions
In summary, we systematically analyzed metabolic alterations in HCC through bioinformatic methods in this study. This increases our understanding of the impact of metabolic reprogramming on tumorigenesis and progression. Then, we established and veri ed a metabolism-related prognostic model by using LASSO analysis to clarify the clinical signi cance of metabolism-related genes in the prognosis of HCC. It shed a new light into the metabolic prospective in predicting the prognosis of HCC, which is helpful to further investigate the potential metabolic targets in the clinical treatment of HCC.