Initial Screening Of Genes By Gsea
The mRNA expression data set and clinical information of 501 patients with LUSC were obtained from the TCGA database. We found five glycolysis-related gene sets on the Molecular Signatures Database v7.0, including (1)BIOCARTA_GLYCOLYSIS_ PATHWAY, (2) GO_GLYCOLYTIC_PROCESS, (3) HALLMARK_GLYCOLYSIS, (4) KEGG_GLYCOLYSIS_GLUCONEOGENESIS, (5) REACTOME_ GLYCOLYSIS. We performed GSEA to explore whether the identified gene sets were significant difference between LUSC and normal tissues. We found these 5 gene sets were significantly enriched (Fig. 1 and Table 2). Then, we collected 443 genes from 5 gene sets for further analysis.
Table 2
Gene set enriched in LUSC
Gene sets follow link to MSigDB | Size | NES | NOM p-val | FDR q-val |
BIOCARTA_GLYCOLYSIS_ PATHWAY | 3 | 1.43 | 0.029 | 0.029 |
GO_GLYCOLYTIC_PROCESS | 106 | 2.01 | 0 | 0 |
HALLMARK_GLYCOLYSIS | 200 | 2.29 | 0 | 0 |
KEGG_GLYCOLYSIS_GLUCONEOGENESIS | 62 | 1.56 | 0.028 | 0.028 |
REACTOME_ GLYCOLYSIS | 72 | 2.28 | 0 | 0 |
Identification Of Glycolysis-related Genes Associated With Patient Survival
First, univariate Cox proportional hazard regression analysis was conducted to 443 genes that were significantly enriched in LUSC samples from GSEA. A total of 4 genes were obtained which were significantly correlated to the survival of patients (p < 0.01). Next, we performed multivariate Cox regression analysis to further explore the association between the 4 mRNA expression profiles and the overall survival of patients. Finally, 3 genes, including HKDC1, ALDH7A1, and MDH1, were included to construct prognostic model. As shown in Table 3, two of the three genes were verified as independent prognostic markers in LUSC. Among three genes, one gene (MDH1) was considered as protective factor owing to 0 < HR < 1, whereas the remaining two genes (HKDC1 and ALDH7A1) might be prognostic risky factors with their HR > 1.
Table 3
Details of three genes for constructing the prognostic model
Gene | Ensemble ID | Location | HR (95%CI) | Coefficient | p value |
HKDC1 | ENSG00000156510 | chr10: 69,220,303 − 69,267,559 | 1.1733 | 0.1598 | 0.0446 |
ALDH7A1 | ENSG00000164904 | chr5: 126,531,200 − 126,595,390 | 1.1701 | 0.1571 | 0.0097 |
MDH1 | ENSG00000014641 | chr2: 63,588,609 − 63,607,197 | 0.7682 | -0.2636 | 0.0559 |
Subsequently, we explored the alterations of three selected genes in 501 LUSC samples using cBioPotral database. The results showed that the rate of alterations in HKDC1, ALDH7A1, and MDH1 genes were 1.9%, 1.1%, and 5%, respectively (Supplementary Fig. 1).
The expression level of three genes was conducted between adjacent normal tissues and LUSC tissues. We found that all the three genes were upregulated in LUSC tissues compared with in normal tissues (Fig. 2).
Construction Of The Three-gene Signature To Predict Patient Prognosis
To predict patient’s prognosis using glycolysis-related genes expression, a prognostic risk model was developed based on the regression coefficients of multivariate Cox
regression model to weight the expression level of each gene in the three-gene signature:
risk score = 0.1598 × expression value of HKDC1 + 0.1571 × expression value of ALDH7A1 + (-0.2636) × expression value of MDH1. Owing to 6 of 501 patients lacked the data of survival time, a total of 495 patients were included in the survival analysis. According to the risk score formula, patients were classified into the high-risk (n = 247) and the low-risk group (n = 248) with a medial value of risk score as a cut-off (Fig. 3A). The survival time of each patient was shown in Fig. 3B. As shown in Fig. 3D, patients in high-risk group had a shorter survival than in low-risk group (p < 0.001). The 3-year and 5-year survival rates of patient in high-risk group were 45.4% and 35.0%, respectively. However, the 3-year and 5-year survival rates of low-risk group were 71.9% and 58.1%, respectively. Additionally, a heatmap presented the expression profiles of three mRNAs (Fig. 3C). As the risk score increased in the patients with LUSC, the mRNA expression of HKDC1 and ALDH7A1 was obviously upregulated; in contrast, the mRNA expression of MDH1 was downregulated. The area under the ROC curve (AUC) for the risk score on 3-year overall survival was 0.665 (Fig. 4).
Risk score from three-gene signature is an independent prognostic indicator
Univariate and multivariate Cox regression analysis were performed to evaluate the independent risk factors in patients with LUSC. Several clinicopathological parameters, including age, gender, TNM stage, T stage, N stage, and M stage, as well as risk score were included. The results showed that only risk score was associated with prognosis in the univariate Cox analysis (HR = 2.553, 95%CI: 1.710–3.811, p < 0.0001) (Table 4). In the following multivariate Cox analysis, we found age and risk score as independent prognostic indicators (Table 4). These results indicated that the risk score was reliable in predicting the prognosis of patients with LUSC.
Table 4
Univariate and multivariate Cox regression analysis of clinicopathologic factors and glycolysis-related genes signature for OS
Clinical features | Univariate analysis | Multivariate analysis |
HR | 95%CI of HR | P value | HR | 95%CI of HR | P value |
Age (> 65 vs. <=65) | 1.376 | 1.000-1.892 | 0.050 | 1.489 | 1.078–2.055 | 0.016 |
Gender (Male vs. Female) | 1.360 | 0.944–1.958 | 0.099 | 1.300 | 0.901–1.877 | 0.161 |
TNM stage (III-IV vs. I-II) | 1.392 | 0.974–1.989 | 0.070 | 1.185 | 0.716–1.962 | 0.510 |
T stage (T3 + T4 vs. T1 + T2) | 1.412 | 0.973–2.048 | 0.069 | 1.335 | 0.845–2.110 | 0.216 |
M stage (M1 vs. M0) | 2.432 | 0.898–6.591 | 0.080 | 2.216 | 0.768–6.399 | 0.141 |
N stage (N1 + N2 + N3 vs. N0) | 1.062 | 0.780–1.444 | 0.704 | 1.087 | 0.765–1.545 | 0.642 |
Risk score (high risk vs. low-risk) | 2.553 | 1.710–3.811 | < 0.0001 | 2.663 | 1.790–3.962 | < 0.0001 |
Validation of three-gene signature for survival prediction by Kaplan-Meier curve analysis
To further verify the prognostic value of the risk score of the three-gene signature associated with glycolysis, patients with LUSC were stratified by age (≤ 65 or > 65), gender (Female or Male), TNM stage (I + II or III + IV), T stage (T1 + T2 or T3 + T4), N stage (N0 or N1 + N2 + N3), and M stage (M0 or M1) (Fig. 5). We found no significant difference between high-risk and low-risk in patients with tumor remote metastasis (Fig. 5). However, in the subgroup of patients without tumor remote metastasis, the risk score of the three-gene signature was still an independent prognostic indicator. Additionally, regardless of the age, gender, TNM stage, T stage, N stage, patients in the high-risk group based on the risk score had a poor prognosis than patients in the low-risk group. These findings demonstrated that the three-gene signature predicts effectively the survival of LUSC patients.