Initial screening of genes by GSEA
The mRNA expression data set and clinical information of 501 patients with LUSC were obtained from the TCGA database (Figure 1). We found five glycolysis-related gene sets on the Molecular Signatures Database v7.0, including (1)BIOCARTA_GLYCOLYSIS_ PATHWAY, (2) GO_GLYCOLYTIC_PROCESS, (3) HALLMARK_GLYCOLYSIS, (4) KEGG_GLYCOLYSIS_GLUCONEOGENESIS, (5) REACTOME_ GLYCOLYSIS. We performed GSEA to explore whether the identified gene sets were significant difference between LUSC and normal tissues. We found these 5 gene sets were significantly enriched (Figure 2 and Table 2). Then, we collected 443 genes from 5 gene sets for further analysis.
Identification of glycolysis-related genes associated with patient survival
First, univariate Cox proportional hazard regression analysis was conducted to 443 genes that were significantly enriched in LUSC samples from GSEA. A total of 4 genes were obtained which were significantly correlated to the survival of patients (p<0.01). Next, we performed multivariate Cox regression analysis to further explore the association between the 4 mRNA expression profiles and the overall survival of patients. Finally, 3 genes, including hexokinase domain-containing protein 1 (HKDC1), aldehyde dehydrogenase 7A1 (ALDH7A1), and malate dehydrogenase 1 (MDH1), were included to construct prognostic model. As shown in Table 3, two of the three genes were verified as independent prognostic markers in LUSC. Among three genes, one gene (MDH1) was considered as protective factor owing to 0<HR<1, whereas the remaining two genes (HKDC1 and ALDH7A1) might be prognostic risky factors with their HR>1.
Subsequently, we explored the alterations of three selected genes in 501 LUSC samples using cBioPotral database. The results showed that the rate of alterations in HKDC1, ALDH7A1, and MDH1 genes were 1.9%, 1.1%, and 5%, respectively (Supplementary Figure 1).
The expression level of three genes was conducted between adjacent normal tissues and LUSC tissues. We found that all the three genes were upregulated in LUSC tissues compared with in normal tissues (Figure 3).
Construction of the three-gene signature to predict patient prognosis
To predict patient’s prognosis using glycolysis-related genes expression, a prognostic risk model was developed based on the regression coefficients of multivariate Cox
regression model to weight the expression level of each gene in the three-gene signature:
risk score=0.1598 × expression value of HKDC1 + 0.1571× expression value of ALDH7A1 + (-0.2636) × expression value of MDH1. Owing to 6 of 501 patients lacked the data of survival time, a total of 495 patients were included in the survival analysis. According to the risk score formula, patients were classified into the high-risk (n=247) and the low-risk group (n=248) with a medial value of risk score as a cut-off (Figure 4A). The survival time of each patient was shown in Figure 4B. As shown in Figure 4D, patients in high-risk group had a shorter survival than in low-risk group (p<0.001). The 3-year and 5-year survival rates of patient in high-risk group were 45.4% and 35.0%, respectively. However, the 3-year and 5-year survival rates of low-risk group were 71.9% and 58.1%, respectively. Additionally, a heatmap presented the expression profiles of three mRNAs (Figure 4C). As the risk score increased in the patients with LUSC, the mRNA expression of HKDC1 and ALDH7A1 was obviously upregulated; in contrast, the mRNA expression of MDH1 was downregulated. The area under the receiver operating characteristic (ROC) curve (AUC) for the risk score at 1, 3, and 5-year overall survival were 0.629, 0.665, and 0.636, respectively. (Figure 5).
Risk score from three-gene signature is an independent prognostic indicator
Univariate and multivariate Cox regression analysis were performed to evaluate the independent risk factors in patients with LUSC. Several clinicopathological parameters , including age, gender, clinical stage, T, N, and M, as well as risk score were included. The results showed that only risk score was associated with prognosis in the univariate Cox analysis (HR=2.553, 95%CI: 1.710-3.811, p<0.0001) (Table 4). In the following multivariate Cox analysis, we found age and risk score as independent prognostic indicators (Table 4). These results indicated that the risk score was reliable in predicting the prognosis of patients with LUSC.
Validation of three-gene signature for survival prediction by Kaplan-Meier curve analysis
To further verify the prognostic value of the risk score of the three-gene signature associated with glycolysis, patients with LUSC were stratified by age (≤65 or >65), gender (Female or Male), clinical stage (I+II or III+IV), T (T1+T2 or T3+T4), N (N0 or N1+N2+N3), and M (M0 or M1) (Figure 6). We found no significant difference between high-risk and low-risk in patients with tumor remote metastasis (Figure 6). However, in the subgroup of patients without tumor remote metastasis, the risk score of the three-gene signature was still an independent prognostic indicator. Additionally, regardless of the age, gender, clinical stage, T, N, patients in the high-risk group based on the risk score had a poor prognosis than patients in the low-risk group. These findings demonstrated that the three-gene signature predicts effectively the survival of LUSC patients.