A Novel Glycolysis-related Gene Signature for Survival Prediction in Patients with Head and Neck Squamous Cell Carcinoma

Background: Head and neck squamous cell carcinoma (HNSCC) is most diagnosed at an advanced stage with poor prognosis. Single gene biomarkers cannot have enough predictive ability in HNSCC. Glycolysis participating in cancers was veried. Thus, this study aimed to identify glycolysis-related gene signature predict the outcome of HNSCC. Methods: The mRNA expression data of HNSCC downloaded The Cancer Genome Atlas (TCGA) project was analyzed by Gene Set Enrichment Analysis (GSEA). We use the Cox proportional regression model to construct a prognostic model. Kaplan–Meier and receiver operating characteristic (ROC) curves were employed to estimate the signature. We also analyzed the relationship of the signature and cancer subtypes. Results: We identied nine glycolysis-related genes including G6PD, EGFR, ALDH2, GPR87, STC2, PDK3, ELF3, STC1 and GNPDA1 as prognosis-related genes signature in HNSCC. HNSCC patients were divided into high and low risk group according to the signature. High risk group showed more poor prognosis and the risk score can precisely predict the prognosis of HNSCC. Additionally, the signature also can be used in cancer subtypes. Conclusion: This study established the 9-mRNA glycolysis signature which may serve as a prospective biomarker for prognosis and novel treatment target in HNSCC. HNSCC: Head and neck squamous cell carcinoma; TCGA: The Cancer Genome Atlas; GSEA: Gene Set Enrichment Analysis; ROC: receiver operating characteristic; MSigDB: Molecular Signatures Database; OS: overall survival; GEO: Gene Expression Omnibus; G6PD: Glucose-6-phosphate dehydrogenase; EGFR: Epidermal growth factor receptor; EMT: epithelial-mesenchymal transition.


Background
Head and neck squamous cell carcinoma (HNSCC) includes multiple head and neck cancers and is pervasive cancer that sources from the squamous epithelium of the oral cavity, pharynx and larynx.
HNSCC are the sixth in the most common cancers and have approximately 700,000 new patients globally every year [1]. Many HNSCC often attack local part in body with advanced stage and involve lymph nodes. Surgery, radiotherapy and chemotherapy are still the typical treatments. After platinum-based chemotherapy, the one-year survival rate of patients that are not suitable to chemotherapy or keep the progressive situation less than 5% [2]. Additionally, it often leads to short term and long term morbidity with the fty percent in cure rate [3]. The most common etiologies of HNSCC comprise smoking and drinking, and human papillomavirus (HPV) infection. A new agent named cetuximab was approved by FDA to therapy this cancer, which has brought hope to patients. But the application of additive cetuximab in chemotherapy only increases 2.7-month survival and decrease 20% in the relative risk of death [4]. Therefore, to improve the prognosis for these patients, it is an urgent need to identify more targets.
It is great of importance to identify dependable biomarkers which can predict clinical outcomes of patients for helping us to make decisions in treatment and diagnosis. Recently, some studies have been performed to con rm biomarkers in the prognosis of HNSCC. By using bioinformatics, Guorun Fan et al. [5] found that HSPH1, HSPD1, SERPINH1, HSPA4 and HSP90AA1 genes are potential targets and prognostic biomarkers for head and neck cancer. Patients carrying high expression levels of ALKBH5, FTO, METTL14, WTAP, YTHDC1, YTHDF1, and YTHDF2 possessed poor overall survival than patients with low expression [6]. Hao Li et al. [7] suggested that overexpressed IFIT1 and IFITM3 indicated poor prognoses for HNSCC. Non-coding RNA also was observed as biomarkers in HNSCC. Expression levels of miR-let-7a-5p and miR-3928 displayed better sensitivity and speci city in distinguishing HNSCC patients and healthy people [8]. After integrated bioinformatics analysis, Dan Xiong et al. [9] thought that LINC00958 and HOXC13-AS both are new diagnostic biomarkers for HNSCC patients. Nevertheless, the single gene biomarker cannot adequately present good action in prediction. Gene signatures are new means with better effect for predicting prognosis and survival in cancer [10,11].
Emerging ndings illustrated that metabolic reprogramming is an important hallmark in the development of cancer [12]. The approach of glycolysis is the most frequent metabolic reprogramming. Aerobic glycolysis named "Warburg effect" has revealed in various cancers. Utilizing this way, glucose is transformed into lactate as a kind of energy. As a result, cancer cells are surrounded by highly acidic microenvironment. it is believed that glycolysis is conducive to the biological behaviors of cancer cells [13]. The new glycolysis-related gene signature has possible implications for predict cancers, such as endometrial adenocarcinoma, diffuse glioma and breast cancer [14][15][16]. Up to now, the understanding of

Acquisition of clinical data and mRNA expression
The Cancer Genome Atlas (TCGA) (https://portal.gdc.cancer.gov/) is a genomics program with clinical features involving 33 cancer types [17]. We directly download the mRNA data of HNSCC with HTseq-FPKM extension. A total of 520 tumor samples and 49 normal samples from the database was used for the analysis. The following clinical pathological features: age, gender, stage, survival time and survival status were also download.

Gene set enrichment analysis
Gene Set Enrichment Analysis (GSEA) can determine two biological states using a set of genes with statistical signi cance. Molecular Signatures Database (MSigDB) was applied to select the gene sets of glycolysis [18]. GSEA software was installed to analyze the gene sets with the signi cant differences between the healthy group and the tumor group. P < 0.05 was the cutoff to choose gene sets for the next investigation. cBioPortal database analysis cBioPortal (www.cbioportal.org) possessed by the Center for Molecular Oncology at Memorial Sloan Kettering Cancer Center is a public website [19]. Currently, the database is considered a tool to con rm gene mutations because it has been recognized by multiple studies. We used the online cBioPortal to explore mutations of candidate genes in HNSCC.

Statistical analysis
Expression of all mRNA was calculated by the log2 analysis. We used Cox regression to identify the relationship of genes and overall survival (OS) under the P values less than 0.05. Consequently, multivariate Cox proportional hazards regression was used to con rm the candidate genes. Hazard ratio was selected to divide the high risk or low risk according which whether surpasses 1. The risk score module was constructed based on the Cox regression analysis that expression levels times regression coe cients.

Preliminary analysis of genes through GSEA
We successfully used the GSEA to analyze glycolysis-related gene sets in the two groups. Identi cation of glycolysis-related mRNAs Univariate Cox analysis was used to preliminary screening and 43 genes were obtained. Subsequently, multivariate Cox regression analysis evaluated the relation between the expression of 43 genes and the survival of HNSCC. 9 mRNAs including G6PD, EGFR, ALDH2, GPR87, STC2, PDK3, ELF3, STC1 and GNPDA1 were identi ed as independent indicators of prognosis. Among them, G6PD, EGFR, GPR87, STC2, PDK3, STC1 and GNPDA1 act the role of risk because the HR was > 1. ALDH2 and ELF3 are protective genes with the HR was < 1, (Table 1). Using the cBioPortal to assess the alterations in the candidate genes in HNSCC patients. In the patients with HNSCC, they have the 5% G6PD alteration, 13% EGFR alteration, 0.4% ALDH2 alteration, 9% GPR87 alteration, 06% STC2 alteration, 2.8% PDK3 alteration,1.2% ELF3 alteration, 2.8% STC1 alteration and 0.6% GNPDA1 alteration, Figure 3.

Construction of the mRNAs-signature
A prognostic risk score module was constructed by multivariate Cox regression: Risk score=0.002107×expression of G6PD+0.001515×expression of EGFR-0.02187×expression of ALDH2+0.006771×expression of GPR87+0.024477×expression of STC2+0.071147×expression of PDK3-0.00868×expression of ELF3+0.009341×expression of STC1+0.049854×expression of GNPDA1. Based on the risk score equation, we calculated the score of every patient and divided them into the high risk group and the low risk group (Figure 4a). As a result, Figure 4b shown the mortality rates in patients with the high risk score group was elevated than that in low risk score group. The expression situation of the nine mRNAs was shown in the heatmap ( Figure 5). AUC area of the ROC curve was 0.705 which means that the mRNAs signature has good in sensitivity and speci city to predict survival of HNSCC ( Figure 6).
The increasing risk score of patients have the upregulated G6PD, EGFR, GPR87, STC2, PDK3, STC1 and GNPDA1. The decreasing risk score of patients have the downhearted ALDH2 and ELF3.
Risk score from the nine-mRNA signature is an indicator of prognosis Univariate and multivariate analyses were used to analyze clinical pathological parameters for evaluating the value of risk score. In the Figure 7a and 7b, we found that the risk score, stage and age were well independent prognostic indicators due to statistical differences in univariate and multivariate analysis (P<0.05). No matter in univariate analysis or multivariate analysis, the risk score represented nice values in prognosis. Similarly, the analysis of survival time and risk score showed that HNSCC patients with the high-risk score exist in the poorer survival ( Figure 8). These results further con rmed the accuracy of analysis.

Validation of nine mRNA for expression and survival prediction
Hence, we continue to analyze the expression of nine mRNA in tumor samples and controls samples. As showed in Figure 9a-9i, the expression of G6PD, EGFR, ALDH2, GPR87, STC2, PDK3, ELF3, STC1 and GNPDA1 was signi cant differences in tumor samples and control samples (P<0.01). As showed from Figure 10a-10m, regardless of age, gender, G, M, N, T and stage in tumor, the signature is a stable prognostic marker that patients with the high risk score has more poor prognosis in HNSCC.

Discussion
Despite advances in improving the roles of genomic and immunity in HNSCC, to date biomarkers is still lacked. Additional biomarkers in the clinical will be needed to guide the prognosis in this cancer. Recently, studies found some mRNAs could be biomarkers to predict the prognosis for HNSCC and indicated they have the clinical signi cance. For example, Qun Li et al. [20] con rm that levels of P4HA1 mRNA and protein in tissues were obviously higher in HNSCC than that in controls. High expression was related to tumor category, lymphatic metastasis and pathological stage. Xu He et al. [21] downloaded the GSE9716, GSE61772 and GSE20549 from the Gene Expression Omnibus (GEO) database and reported that high POPDC3 expression associated with poor a prognosis for patients with HNSCC. But these genes cannot accurately predict patient survival because a single gene is modulated by many factors. So, gene signature including multiple genes has been the important model to predict outcome in cancers. Results from these signatures re ect more effects and present the more accurate prediction compared to biomarker of a single gene.
Some signatures have been discovered in cancers, a gene glycolysis-related signature consists of HMMR, B4GALT1, SLC16A3, ANGPTL4, EXT1, GPC1, RBCK1, SOD1, and AGRN could be a biomarker which were correlated with metastasis and overall survival in lung adenocarcinoma [22]. Using multivariate analysis, Pankaj Ahluwalia et al. [23] identi ed a novel prognostic signature which has clinical utility in colorectal cancer. In this study, we also conducted the GSEA to analyze the mRNA expression data from TCGA database and nine mRNAs expression levels have the differences. Additionally, Furthermore, in univariate Cox regression and multivariate Cox regression analyses, we identi ed that a signature of the nine genes pose the value to predict the prognosis to HNSCC. Consequently, the risk signature also can con rm the clinical results. Interestingly, the high-risk score was correlated with a poor prognosis. Glucose-6phosphate dehydrogenase (G6PD) is a vital enzyme in preventing cells from oxidative damage. Some studies have found that G6PD play an important role in cancers to in uence its occurrence and development. Chin An Yang et al. [24] suggested that G6PD could be a marker to predict risk, prognosis and chemo-sensitivity of glioma. Epidermal growth factor receptor (EGFR) is a key gene to regulate biological behavior of tumor cells. EGFR-targeted drugs have widely been utilized in therapy of lung and colorectal cancers. Xiaozhen Liu et al. [25] recruited 200 molecular apocrine breast cancer patients and found that EGFR may be helpful to assess the prognosis of the cancer. Expression of ALDH2 were signi cantly associated with poor prognosis in hepatocellular carcinoma patients [26]. We also found the protective role in prognosis of HNSCC. Intravesical recurrence-free survival of patients with positive GPR87 tumors is shorter than those with negative GPR87 tumors [27]. Overexpression of STC2 can suppress cell apoptosis and promote cell proliferation, migration, invasion, and make cell cycle arresting at the G1/S transition in HNSCC [28]. Consistent with this, in this study, the over expression impacted the poor prognosis of HNSCC. Multivariate analysis indicated that high expression of PDK3 acted an independent risk factor in event-free survival and overall survival of acute myeloid leukemia [29]. ELF3 can regulate the cell cycle and epithelial-mesenchymal transition (EMT) in non-small cell lung cancer through PI3K/AKT and ERK signaling pathways and downstream effectors [30]. H M H Abaza et al. [31] reported expression of STC1 gene could be a useful biomarker for predicting clinical outcome and monitoring therapeutic response in acute leukemia patients. GNPDA1 had good effects on CD4 + memory T cells, which were affected by the gemcitabine therapeutic effect in gemcitabine-resistant pancreatic cancer cell [32]. These results are in line with our that these genes served as carcinogenic roles or cancerocidal roles. As far as know, the mRNAs signature can perform better predict the prognosis of HNSCC.
Increased glycolysis is one of metabolic characteristics named Warburg effect. Deregulation or alteration of energy metabolism has been considered an important hallmark of cancer. Gangcai Zhu et al. [33] revealed that hypoxia can promote migration, invasion and glycolysis in HNSCC by HIF-1α-metadherin loop. Decreased expression of GRIM-19 enhanced aerobic glycolysis in HNSCC through DNA hypermethylation [34]. Moreover, high expression of HK2 was tested in head and neck cancerous tissues compared to normal counterparts. HK2-de cient HNSCC cells had greater sensitivity to cisplatin and 5uorouracil [35]. It is crucial that hexokinase is a key enzyme glycolytic prognosis. Hence, metabolism of glycolysis may involve the development and progression of HNSCC. Some single gene involving glycolysis which predict prognosis of HNSCC has been reported in the study, but no report in the glycolysis-related gene signatures. We rst reported and illustrated that the glycolysis-gene signature including G6PD, EGFR, ALDH2, GPR87, STC2, PDK3, ELF3, STC1 and GNPDA1 which has prognostic value for HNSCC.
Over all, this study rstly identi ed that a glycolysis related gene signature (G6PD, EGFR, ALDH2, GPR87, STC2, PDK3, ELF3, STC1 and GNPDA1) can predict the prognosis of patients with HNSCC. This signature could be a new target for the therapy of HNSCC.