Development and Validation of a Novel Glycolysis-Related Risk Signature for Predicting Survival in Pancreatic Adenocarcinoma

Abstract


Introduction
Pancreatic adenocarcinoma (PAAD), the third leading cause of cancer death worldwide, is the most common type of pancreatic cancer (PC) [1].Due to lack of speci c clinical symptoms, PAAD is usually diagnosed at an advanced stage with invasive and metastatic characters [2,3].At present, surgery, radiotherapy, and chemotherapy are still the major treatment methods for PAAD.The therapeutic potential of these methods is limited, leading to a high mortality rate [1,4].Moreover, the prognosis of PAAD patients is mainly evaluated through histopathology.However, patients with the same stage can show different prognoses and treatment responses [5].Thus, it is crucial to develop reliable prognostic biomarkers and improve novel therapy targets for PAAD.
Based on mass data generated by next generation sequencing technology and abundant clinical data, a variety of genomic databases have been built up, giving us a deeper understanding of the relationship between genomic alteration and disease.Through data mining, we have identi ed many mRNAs, miRNAs and lncRNAs as prognostic biomarkers of patients with PAAD [6-8].Whereas, the ability of single gene biomarkers to predict patient survival is still insu cient.Studies have shown that evaluating genetic traits involving multiple genes can improve prognosis prediction [9].However, not all pathways have been study to develop PAAD risk signature.Therefore, more efforts are needed to nd more e cient biomarkers for PAAD.
Accordingly, in the present study, we aimed to develop a novel risk signature for predicting survival of pancreatic adenocarcinoma.We have drawn hallmark gene sets from 177 PAAD patients with whole genome expression pro les from the TCGA database.We identi ed 199 mRNAs were signi cantly associated with glycolysis using Gene Set Enrichment Analysis (GSEA) and built a three-gene risk signature that can accurately predict patient prognosis.This risk signature can successfully predict PAAD patients who are in the high-risk group with poor prognosis.Additionally, based on Cox multivariate regression analyses, we proved that this three-gene risk signature had a better prognostic value than other clinical information of patients with PAAD.

Clinical data and mRNA expression dataset
The clinical data and mRNA expression pro les of PAAD patients were extracted from the TCGA database (https://cancergenome.nih.gov/).Additionally, the clinical data patients age, gender, grade, TNM stage, T stage, N stage, and M stage was included (Table 1).

Functional enrichment analyses
GSEA (http://www.broadinstitute.org/gsea/index.jsp)was used to explore whether the identi ed gene sets showed signi cant differences between the groups.The algorithm can present an overall trend of the raw data and does not need to specify differential gene threshold.Next, we analyzed the expression levels of mRNAs in PAAD samples and adjacent noncancerous tissues.Finally, we determined gene set for further analysis by normalized p values (p<0.05) and normalized enrichment score.

Data processing and risk-parameter calculation
Univariate Cox regression analysis was used to analyze and identify genes related to overall survival (OS), then subjected to multivariable Cox regression to further con rm the prognostic genes and obtain the coe cients.The selected mRNAs were divided into the risk (hazard ratio: HR>1) gene and protective (0<HR<1) gene.The expression level of selected genes weighted by their coe cients through linearly combining, we constructed a risk score formula as follows: Risk score=∑(βi×Expi) βi representing coe cients and Expi indicating the expression level of each selected mRNAs.The median risk score was used as cut-off value, the 177 patients were classi ed into high-risk and low-risk groups.

Statistical analysis
Kaplan-Meier survival curves and the log-rank method were used to validate the prognostic signi cance of the risk score.The survival function of the glycolysis based prognostic risk model was analyzed by receiver operating characteristic (ROC) curve.Moreover, we performed univariate Cox, multivariate Cox regression analysis, and data strati cation analysis to test whether the risk score was independent of the clinical features, including age, grade, and stage.P<0.05 was considered as statistically signi cant.Statistical analyses were performed using SPSS 20.0 software (SPSS, Inc., Chicago, IL, USA).

Results
Initial screening of genes using GSEA We obtained clinical data from 177 patients with PAAD, along with an expression data for 56753 mRNAs, from the TCGA database.The hallmark gene sets from the Molecular Signatures Database (MSigDB) were selected to represent well-de ned biological process or courses, which contained 50 speci c gene sets.GSEA was analyzed using the above database to detect whether the gene sets showed statistically differences between PAAD and adjacent normal tissues.A mount of 36 gene sets were upregulated in PAAD.Among them, HALLMARK_GLYCOLYSIS, and HALLMARK_ESTROGEN_RESPONSE_LATE gene sets were signi cantly enriched (Figure1).We then selected GLYCOLYSIS (P=0.028,NES=1.54) for further analysis, which has the top-ranking function and contained 199 genes.

Identi cation of survival-associating glycolysis-related mRNAs
In order to identify novel genetic biomarkers associated with the survival of patients with PAAD, we applied univariate Cox regression of 199 genes that were enriched via glycolysis.A total of 7 genes were signi cantly associated with OS (P<0.05) and then entered them into multivariate Cox regression analysis.Three independent genes (KIF20A, CHST2, and MET) were selected via multivariate COX regression analysis (Table 2).With HR>1 associated with poorer survival (KIF20A and MET), with HR<1 associated with better survival (CHST2).
Construction of a three-mRNA signature to predict patient outcomes Based on integrating the expression level and corresponding regression coe cients derived from multivariate Cox regression analysis, a prognostic risk score model was established as follows: Risk score = 0.1755 * expression of KIF20A + (-0.1400) * expression of CHST2 + 0.0214 * expression of MET We then ranked the risk score in ascending order.According to the risk score formula, 177 PAAD patients were classi ed into high-risk groups (n=88) and low-risk groups (n=89), using the median risk score as cut off value (Figure 2A).The survival time of the patients are shown in Figure 2B.Patients with the lowrisk group had lower mortality rates, whereas patients in high-risk group showed poorer survival.
Additionally, a heatmap displayed the expression pro les of 3 mRNAs (Figure 2C).Compared with the low-risk group, the high-risk mRNAs (KIF20A and MET) was upregulated in high-risk group, whereas the expression of the protective type of mRNAs (CHST2) was downregulated in high-risk group.The sensitivity and speci city of the three-mRNA signature by area under the receiver operating characteristic curve (AUC) value in the ROC curve at 1-year survival was calculated.The ROC curve analysis score was 0.718 (Figure 3), indicating the good sensitivity and speci city of the three-mRNA signature in predicting survival of PAAD patients.
Risk score generated from the signature as an independent prognostic indicator To verify whether the risk score is independent of other clinical features, we performed univariate and multivariate Cox regression analyses to evaluate the importance of these indicators in PAAD patients, which included risk score, age, gender, grade, and TNM stage as covariables (Figure 4).We found that the risk score (HR: 1.242; 95% CI: 1.157-1.334;P<0.001) age (HR: 1.028; 95% CI: 1.006-1.050;P=0.012), and grade (HR: 1.377; 95% CI: 1.020-1.859;P=0.037) were associated with patient survival in the univariate analysis.We identi ed risk score and age had independent prognostic value, as these factors showed signi cant differences in both univariate and multivariate analyses.All these indicated that the three-gene signature had competitive prognostic value for survival prediction.
Validation of three-mRNA signature for survival prediction by Kaplan-Meier curve analysis Compared high-risk group and low-risk group, the Kaplan-Meier analysis showed a signi cant difference in the survival of the patients (P<0.001; Figure 5A).Additionally, the risk score was a stable prognostic marker for patients with PAAD strati ed by age (<65 or >=65), TNM stage (stage I or stage II-IV), T stage (stage 1-2 or stage 3-4), and N stage (Figure 5B, 5C, and 5D).The patients in the high-risk group had signi cantly shorter OS than those in the low-risk group in M0 subgroup (Figure 5E).Since there were only 4 patients in M1 subgroup, it is regrettable that the analysis of M1 subgroup has not been completed.All these results indicated that the risk score was robust in predicting the prognosis of patients with PAAD.

Discussion
Currently, the study of energy metabolism has attracted people's attention.Human malignancy is inextricably linked to energy metabolism.Tumor cells have the ability to proliferate inde nitely and are prone to distant metastasis, which requires enough biosynthetic precursors and energy to promote cell division, migration, and invasion.Hypovascularization in PAAD reduce the transfer of biosynthetic precursors to cancer cells, leading to an energy crisis [10].However, tumor cells have "metabolic reprogramming" ability to change their energy metabolism [11].Then, the major pathway of glucose metabolism is transformed from oxidative phosphorylation to anaerobic glycolysis to meet cell proliferation needs [12].Biomarkers have been thought of in terms of cancer and energy metabolism.Although many studies have been conducted on PAAD and glycolysis [13][14][15], the research that involved in prognostic biomarkers of cancer related to glycolysis is limited.In this study, we attempted to identify a glycolysis-related prognostic biomarker for PAAD patients.
Recent studies showed that clinicopathological features such as age, sex, and metastatic diagnosis are not su cient to precisely predict the outcome of patients with cancer [16].The ideal biomarker is readily available, cost-effective, reliably measurable, and highly accurate.Thus, a growing number of mRNAs have been identi ed as biomarkers for predicting survival of PAAD.For example, Strippoli et al. report that the overexpression of ERCC1 showed to be signi cantly associated with shorter survival and poor disease control [17].Giovannetti et.al reported that hENT1 is predictive factor of response to gemcitabine, the overexpression is linked to a longer overall survival in PAAD [18].Additionally, miRNA-21 and miRNA-155 have been considered as novel biomarkers shown prognostic and predictive value [19,20].However, these biomarkers cannot serve as independent prognostic indicator for predicting patient prognosis.Particularly, single gene expression levels interfered by many factors, preventing these markers from providing powerful predictive effects.Therefore, we constructed a gene signature containing multiple related genes by statistical model, and improved the prediction e ciency by combining the prediction effect of each component gene.In our study, we reported a glycolysis-related risk signature (KIF20A, CHST2, and MET) and then proved the prognostic value in PAAD.

Conclusions
In conclusion, we rstly develop a glycolysis-related risk signature for predicting survival of pancreatic adenocarcinoma.The three-gene signature had competitive prognostic value for survival prediction which could help clinicians to stratify pancreatic adenocarcinoma patients and improve their prognosis.Additionally, it is of great signi cance to improve novel therapy targets for PAAD.

Figure 3 The
Figure 3