The Prognostic Value of Metabolic Genes in Gastric Adenocarcinoma

Background: A great number of metabolic genes have been discovered in gastric cancer (GC); however, their prognostic roles remain incompletely clear so far. Methods: The Cancer Genome Atlas-stomach adenocarcinoma, and GSE84437 dataset collected via the Gene Expression Omnibus (GEO) database were utilized for retrieving those clinicopathological data and RNA expression patterns. Besides, the signature was obtained through the lasso Cox regression model and univariate Cox regression analysis. A novel 13-gene metabolic signature (including GSTA2, POLD3, GLA, GGT5, DCK, CKMT2, ASAH1, OPLAH, ME1, ACYP1, NNMT, POLR1A, and RDH12) was constructed for predicting the prognosis for gastric adenocarcinoma. Results: The results suggested that, the survival in high risk group was remarkably dismal compared with that in low risk group. In addition, that as-constructed signature had been identied as a factor to independently predict the prognosis for gastric adenocarcinoma patients. Moreover, the signature-based nomogram showed certain benets in predicting the overall survival (OS). Besides, results of gene set enrichment analysis (GSEA) suggested that some pathways were markedly enriched, which help to illustrate the possible mechanisms. Conclusions: The new 13-gene metabolic model is constructed in this study to predict the prognosis for GC. It probably reects dysregulation in the metabolic microenvironment, in the meantime of providing metabolic treatment biomarkers, and predicting the treatment response to gastric adenocarcinoma.


Introduction
Previous studies indicate that, the gastric cancer (GC) incidence and mortality have decreased in the last few decades in developed countries. [1][2][3]. However, GC still leads to severe health problems in the developing countries [4]. Moreover, the incidence of GC shows a 15-to 20-fold difference between highincidence region and the low-incidence one [1]. The highest GC incidence rates are reported in East Asia, South and Central America, and Eastern Europe [4]. In Japan and Korea, GC ranks the top in terms of its morbidity among men [4]. Interestingly, in western countries, GC can hardly be cured since it is usually discovered at late stage. However, in Japan, GC may be detected early, leading to a better prognosis [1].
As one of the most important genes, metabolic genes have been investigated for several years, but their prognostic roles should be further discussed. Speci cally, metabolic reprogramming has emerged as the new cancer hallmark [5]. In 1920s, Otto Warburg et al. [6] rst observed that certain cancer cells showed preferential dependence on glycolysis, even when there was enough oxygen. Additionally, the non-glucose metabolites have also been found, such as glutamate [7], and fatty acid [8,9] in cancer, which represent the signi cant metabolic alterations.
Up to now, great attention has been paid to the discussion and the novel systems used to predict tumor prognosis, including immune score, stromal score, or DNA damage-repair for assessing the biological processes of tumor [10][11][12]. Such phenomenon reminds us that a metabolic scoring system can be built in various tumors. Previously, the study conducted by Liu et al. [13] presented an example of the use of metabolic genes to build a scoring system. As one of the prognostic markers, metabolic genes are associated with the survival for cancer patients [14][15][16]. Typically, one of the reasons to explain the prognostic value of metabolic genes is that their high expression is probably related to the therapeutic effect of cancer treatments (chemotherapy [17,18] and targeted treatment [19]). This work aimed to use data from TCGA-STAD, as well as Gene Expression Omnibus (GEO) dataset to build a scoring system for clinical prognosis.

Results
Construction and validation of that as-constructed gene signature for prognosis prediction in TCGA All cases were classi ed as the low-or high-risk group according to the best threshold of risk score. 0.520 for risk score, stage, N stage, age, grade, T stage, gender, and M stage, separately. In addition, high risk group had evidently poor overall survival (OS) compared with that in low risk one (P<0.001, Figure 4).
Afterwards, that as-constructed prognostic model was validated using the GSE84437 cohort. The AUC values for N stage, T stage, age, risk score, and gender were 0.691, 0.608, 0.574, 0.514 and 0.484, respectively. (Figure 4)

Correlations between prognostic model and various clinicopathological features
A total of 813 patients (including 380 from TCGA and 433 from GSE) with complete data about gender, age, and Tumor-Node Metastasis (TNM) stage were included to compare hazard ratios (HRs) of the clinicopathological characteristics. As for the TCGA dataset, results of univariate cox regression analysis revealed that, risk score, N stage, T stage, and stage were related to prognosis. However, multivariate cox regression analysis revealed that, only risk score (HR 5.955, 95%CI 3.267-10.855), gender (HR 1.596, 95%CI 1.038-2.452) and age (HR 1.035, 95%CI 1.014-1.056) were associated with prognosis ( Figure 5).
For the GEO dataset, gender, age, risk score, N stage, and T stage were included to carry out univariate cox regression analysis; however, only age (HR 1.025, 95%CI 1.013-1.038), T stage (HR 1.600, 95% CI 1.255-2.039), N stage (HR 1.501, 95%CI 1.275 -1.766), and risk score (HR 1.677, 95%CI 1.177-2.390) were included in the results ( Figure 6). Moreover, the nomogram was constructed by including the prognostic model. (Figure 7) Gene Set Enrichment Analysis (GSEA) Upon GSEA, a total of 31 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways with signi cant enrichment were identi ed based on TCGA-STAD, most of which were associated with the metabolism (including the metabolism of purine, pyrimidine, dicarboxylate, and glutamate) or metabolic disease (such as the Huntingtons Disease). In the meantime, the remaining ones showed no correlation with metabolism, but they displayed frequent deregulatin in tumors. (Figure 8) Discussion GC morbidity and mortality show a growing trend among numerous countries. In recent years, the mRNA gene signatures on the basis of characteristics involving cell cycle and immune signature are the hotspots in research on predicting cancer mortality risk. Nowadays, more knowledge is needed for the genomics in GC, and the next generation sequencing (NGS) is used to identify the treatment and/or clinical trial enrollment. Her2 testing (including immunohistochemistry (IHC), and in situ hybridization (ISH)) [20][21][22], microsatellite instability and PD-L1 expression [23] can be used as the prognostic factors.
In this study, a novel prognostic metabolic signature involving 13 genes was constructed on the basis of TCGA dataset, while its predicting e ciency was validated within the GSE84437 dataset. According to our results, the as-constructed signature contributed to effectively stratifying patient OS. Further the signature e cacy was demonstrated using the validating set, the training set, together with subgroup from TCGA, and the results suggested that this signature had a high prognostic e cacy. Furthermore, this signature indicated a signi cant correlation with negative clinicopathological parameters, which supported that this signature was robust in predicting prognosis.
According to GSEA analysis, a lot of the markedly enriched pathways were identi ed for this signature, and most of them showed relation to metabolism. Moreover, our ndings further supported that there was close association between our signature and the metabolic systems; in addition, this signature had great potential.
The genes reported by our study have been shown to be involved in GC. Firstly, we are interested in the Malic enzyme 1 (ME1). Lu, et al. [24] indicated that the over-expression of ME1 was related to shorter OS as well as the disease-free survival (DFS) among GC patients. Additionally, they also demonstrated that, NADPH production mediated by ME1 played a critical part within GC proliferation and progression.
According to the study conducted by Shi, et al. [25], ME1 was the critical factor for GC, which was possibly pro-oncogenic in GC progression and migration, and was associated with the OS of GC patients.
Moreover, for GSTA1, one of the included genes, the study conducted by Zhao et al. [26] indicated that it might be used as the prognostic marker and targeted biomarker for GC chemotherapy.
Lastly, we are interested in the Nicotinamide N-methyltransferase (NNMT), which is one of the phase II metabolizing enzymes for catalyzing nicotinamide and other pyridine methylation to pyridinium ions [27]. NNMT is found to be over-expressed in gastric carcinoma tissues relative to the adjacent non-carcinoma tissues at protein and mRNA levels. Further, Liang, et al. [28] indicated in their research that, NNMT upregulation in GC cells probably promoted the epithelial-mesenchymal transition (EMT). Moreover, as con rmed by the results of survival analysis, the NNMT level serves as the factor to independently predict the prognosis for the OS of GC cases [29].

Conclusion
All in all, the new 13-gene metabolic signature is constructed in this study to predict the prognosis for gastric adenocarcinoma using the TCGA dataset. Our results suggest that, this signature probably represent that abnormal metabolic microenvironment, which also provides the underlying biomarkers to predict treatment response and to carry out metabolic therapy. Nonetheless, more functional experiments should be performed to examine those metabolic genes for prognosis prediction in the future.

Data collection
The TCGA-STAD dataset, and the Gene Expression Omnibus (GEO) database were used to retrieve the clinical data and the expression patterns of messenger RNA (mRNA). Meanwhile, the metabolic genes were acquired based on a previous study.
Identi cation of the differentially expressed mRNAs from TCGA-STAD dataset Log2 transformation and normalization of transcripts in every million were used in the expression pattern data. Additionally, the Limma 3.6.1 R package was used to analyze the differential expression of those annotated genes encoding proteins. Meanwhile, those 9 above-mentioned gene expression pro les were identi ed from TCGA. The consistently altered metabolic genes would then be screened for subsequent prognostic analysis.
Establishment of the metabolic gene signature for prognosis prediction Metabolic genes related to prognosis were identi ed through lasso-penalized Cox regression and univeriate Cox regression analysis [30], and later a gene signature as constructed for prognosis prediction. Upon univariate Cox regression analysis, P<0.001 represented statistical signi cance. In addition, that gene signature for prognosis might be presented as follows, risk score= (Coe cient mRNA1 x mRNA1 expression) + (Coe cient mRNA2 x mRNA2 expression)+…+(Coe cient mRNAn x mRNAn expression). In addition, the "survminer" and "survival" [31] in R package were adopted for exploring the best threshold of risk score and plotting the Kaplan-Meier survival curve. Particularly, the best threshold for dividing cases as low-or high-risk group was determined using the "survminer" and "surv_cutpoint" functions of R package. Moreover, the "survivalROC" of R package [32] was adopted for investigating the signi cance of the gene signature in predicting the prognosis in a time-dependent manner.
The as-constructed gene signature for prognosis prediction was independent from additional clinical parameters from TCGA The thorough related clinical data from 380 patients with gastric adenocarcinoma were retrieved to carry out later analyses. Meanwhile, univariate as well as multivariate Cox regression analysis was conducted according to the forwarding stepwise procedure. A difference of P<0.05 indicated statistical signi cance.

Construction and validation of the predictive nomogram
Each of the independent prognostic factor was incorporated to construct the nomogram [23] in this study.
Besides, the nomogram discrimination and calibration were investigated using concordance index (Cindex) and the calibration plot (through the bootstrap approach using 1000 resamples), separately. In addition, decision curve analysis, C-index, and time-dependent receiver operating characteristic curve (time-ROC curve) were used to compare gene signature, the TNM mode, as well as the combined model comprising gene signature and TNM.
GSEA GSEA [24] was performed to identify the possible KEGG pathways in that as-constructed gene signature, and to search for those terms enriched within TCGA-STAD. A false discovery rate q<0.25 and P <0.05 indicated statistical signi cance.

Statistical methods
The R software v3.6.1 (R Foundation for Statistical Computing, Vienna, Austria) and SPSS 22.0 were used for all statistical analyses. A difference of P<0.05 indicated statistical signi cance. The Cancer Genome Atlas-stomach adenocarcinoma, and GSE84437 dataset collected via the Gene Expression Omnibus (GEO) database were utilized for retrieving those clinicopathological data and RNA expression patterns. Besides, the signature was obtained through the lasso Cox regression model and univariate Cox regression analysis.