Identification of differential hypoxia-associated genes in HNSCC
First, in the TCGA HNSCC database (n = 502) and normal samples (n = 44), the expression of 200 HAGs was compared and then identified and analyzed. The RNA expression levels of 54 identified genes varied among the tumor and the normal groups, which are presented as heatmap and volcano plot (adjusted p < 0.05, Fig. 1A-B). Then, GO enrichment analysis was performed on these 54 HAGs, which identified the main pathways involved in these 54 genes: response to hypoxia/response to oxygen levels/cell chemotaxis/glucose metabolic process (Fig. 1C). Finally, the KEGG enrichment analysis was also performed on these 54 HAGs, and the findings revealed that they were involved in the following signaling pathways: MAPK signaling pathway, HIF-1 signaling pathway, insulin signaling pathway, and focal adhesion (Fig. 1D).
Development of prognostic biomarkers for hypoxia-associated genes
A univariate Cox regression approach was applied to screen HAGs significantly associated with OS in HNSCC. The analysis showed that 49 of 200 HAGs were highly related to OS in HNSCC, and 16 of these 49 genes varied considerably between HNSCC tissues and adjacent paracancerous tissues (Fig. 2A). To further narrow down the gene selection, LASSO analysis was performed on the 16 genes screened in the previous step and selected 12 genes (SERPINE1, HOXB9, SELENBP1, CXCR4, DTNA, ISG20, STC2, HS3ST1, ADM, STC1, CSRP2, and TGFBI) as candidate genes (Fig. 2B-C). Next, multifactorial Cox regression analysis was performed on the 12 candidate genes. Finally, eight genes (HOXB9, SELENBP1, DTNA, ISG20, STC2, HS3ST1, CSRP2, and TGFBI) were screened for the construction of prognostic risk models; among them, SELENBP1, CSRP2, and ISG20 were considered as biomarkers indicative of good prognosis (HR < 1), while TGFBI, STC2, HOXB9, DTNA, and HS3ST1 were indicative of poor prognosis (HR > 1). A risk model of HNSCC based on eight prognostic genes was generated (risk score = (0.199 × HOXB9 exp.) + (-0.238 × SELENBP1 exp.) + (0.359 × DTNA exp.) + (-0.134 × ISG20 exp.) + (0.193 × STC2 exp.) + (0.418 × HS3ST1 exp.) + (-0.159 × CSRP2 exp.) + (-0.094 × TGFBI exp.)). Based on this risk model, the Kaplan-Meier method was applied to further investigate patients’ survival in HNSCC. 499 patients with HNSCC were distributed equally into two subgroups based on median risk scores, compared with the high-risk subgroup, those in the low-risk subgroup lived longer (Fig. 2D-E). Then, the model's capability of predicting future HNSCC risk was evaluated using a time-dependent ROC analysis. The risk model predicted 1-year, 3-year, and 5-year survival with AUC values of 0.660, 0.714, and 0.672, respectively (Fig. 2F).
Validation and clinical value of prognostic features
The 8-gene prognostic risk model was validated in four GEO datasets (GSE27020, GSE41613, GSE42743, and GSE117973). According to the same risk score construction method in TCGA database, the four GEO datasets were classified into high and low risk groups based on the median risk scores and evaluated the performance of the risk model. The results of the analysis are shown in Figs. 3and Additional file 2: Figure S1. The AUC for 1-year, 3-year, and 5-year OS was calculated to present the accuracy of the model prediction. In GSE27020, the AUC was 0.739, 0.701, and 0.687 respectively (Fig. 3A); In GSE41613, the AUC was 0.674, 0.733, and 0.680 respectively (Fig. 3B); In GSE42743, the AUC was 0.662, 0.717, and 0.933 respectively (Additional file 2: Figure S1 A); In GSE117973, the AUC was 0.663, 0.748, and 0.736 respectively (Additional file 2: Figure S1 B). Meanwhile, in all four GEO datasets (GSE27020, GSE41613, GSE42743, and GSE117973), patients in the low-risk group survived longer than those in the high-risk group (Fig. 3C-F, Additional file 2: Figure S1 C-F). Taken together, both results in TCGA and the 4 GEO external validation datasets confirmed that the risk prognostic model had a high predictive value.
Association between the prognostic model and clinicopathological characteristics
Additionally, to explore the model's prognostic value in HNSCC patients after stratification based on clinicopathological variables in the TCGA database, patients were classified into subgroups according to their Age, Gender, Grade, M stage, N stage, Radiation therapy, stage, and T stage to plot Kaplan-Meier survival curves. The sample was divided into 16 subgroups: Young (< 60 years) and Old (≥ 60 years), Female and Male, Grade I-II and Grade III-IV, Mstage M0 and Mstage M1 + Mx, Nstage N0 and Nstage N+, Radiation therapy NO and Radiation therapy YES, Stage I-II and Stage III-IV, Tstage T0-T2, and Tstage T3-T4. In each subgroup, the previous thresholds were selected and the patients were further allocated into the low and high groups. OS was considerably shorter in the high-risk group compared to the low-risk group across all subgroups (Additional file 2: Figure S2 A-H, Additional file 2: Figure S3 A-H). The results suggest that the risk model can accurately predict the outcome of HNSCC patients without considering the clinicopathological subgroups. In conclusion, risk characteristics are crucial in determining a patient's prognosis for HNSCC.
Comparison of different immune status between the two risk score-related subgroups
To investigate the correlations between the risk scores model and immune status, immune cell infiltration and immune-related function scores were quantified by various algorithms (ESTIMATE, CIBERSORTx, TIMER, MCP counter, and ssGSEA). Figure 4A demonstrates that the low-risk group's Immune Scores were considerably higher than those of the high-risk group, while the ESTIMATE Score and Stromal Score did not differ significantly, indicating a higher level of immunity in the low-risk group (p < 0.01). The variation of tumor microenvironment cells may be the cause of risk score heterogeneity. Then, based on TCGA-HNSCC data, analyses of the difference in immune cell subpopulations revealed significant differences between the two subgroups for scores of immune cells (B cells, T cells, Myeloid dendritic cells, plasma cells, Macrophage M0, T cell CD4 memory resting, T cell follicular helper, Tregs, NK cells resting, dendritic cells resting, Mast cells, iDCs, pDCs, Th2 cells, and TIL) (p < 0.001; Fig. 4B-E). Furthermore, immune function scores (Checkpoint, Cytolytic activity, HLA, Inflammation promoting, T cell co-inhibition, T cell co-stimulation, and Type I IFN response) were significantly different (p < 0.05; Fig. 4F). In summary, these results suggest that risk score may influence the tumor microenvironment, especially immune infiltration.
Construction of the Prognostic nomogram containing characteristic risk scores and clinical staging
In univariate Cox regression analyses, risk score (HR = 1.822, 95% CI = 1.549–2.142, p-value < 0.001), Age (HR = 1.378, 95% CI = 1.053–1.804, p-value = 0.0193), Stage (HR = 1.421, 95% CI = 1.187–1.702, p-value < 0.001), Tstage (HR = 1.296, 95% CI = 1.124–1.495, p-value < 0.001), and Nstage (HR = 1.541, 95% CI = 1.303–1.822, p-value < 0.001) were considerably correlated to OS in patients with HNSCC (Fig. 5A). In multivariate Cox regression analysis, characteristic risk score (HR = 1.830, 95% CI = 1.479–2.265, p-value < 0.001), Nstage (HR = 1.480, 95% CI = 1.186–1.847, p-value < 0.001) and Age (HR = 1.449, 95% CI = 1.041–2.016, p-value = 0.0279) were shown to be independent predictors of OS in HNSCC patients (Fig. 5A). Then, the risk score, Age, and N stage were combined to construct a nomogram for predicting OS of 1-year, 3-year, and 5-year in HNSCC patients (Fig. 5B). The performance of the nomogram was evaluated, and it showed superior predictive potential compared to the risk score and other clinical characteristics (Age, Gender, Stage, T stage, Nstage, and Grade) for OS at 1, 3, and 5 years (AUC = 0.704, 0.758 and 0.733 respectively, Fig. 5C). The calibration curves for the nomogram demonstrated good agreement between the actual OS probabilities and the OS probabilities predicted by the nomogram (Fig. 5D). The nomogram had good discrimination compared to the risk score, Age, and Nstage, with a C-index close to 0.7 (Fig. 5E). The DCA curves showed that the nomogram had more net benefit than the risk score, Age, and Nstage (Fig. 5F).
Enrichment analysis based on hypoxia 8-gene signature
To clarify the signaling pathways related to the hypoxia 8-gene signature, we screened the genes with the highest correlation coefficients with risk scores (Pearson |R| > 0.5, P < 0.05) in the TCGA database and applied functional enrichment analysis to them. A total of 50 negatively and 51 positively correlated genes were screened by correlation analysis, and the correlation heatmap was presented in Fig. 6A. Next, GO enrichment analysis was performed on these 101 genes, which revealed that these genes were mainly involved in signaling pathways including divalent inorganic cation homeostasis, response to insulin stimulus, fibroblast proliferation, positive regulation of hemostasis and positive regulation of coagulation (Fig. 6B). Subsequently, KEGG enrichment analysis showed that they were involved in signaling pathways including Axon guidance, Inflammatory mediator regulation of TRP channels, Rap1 signaling pathway, and EGFR tyrosine kinase inhibitor resistance (Fig. 6C). Additionally, the signaling pathways involved in the hypoxia 8-gene signature were investigated using GSVAs and the results showed that in the high-risk subgroup, the top 5 signaling pathways significantly activated included Glycolysis, Adipogenesis, MYC Targets V1, MTORC1 Signaling, and PI3K-AKT-mTOR Signaling (Fig. 6D). Finally, the signaling pathways involved in the hypoxia 8-gene signature were screened by GSEA enrichment analysis, which showed that the significantly enriched signaling pathways in the high-risk subgroup included: Dilated cardiomyopathy, ECM Receptor interaction, Focal adhesion, Hypertrophic cardiomyopathy and pathways in cancer (Fig. 6E). Significantly enriched signaling pathways in the low-risk subgroup included Alpha-linolenic acid metabolism, Autoimmune thyroid disease, Oxidative phosphorylation, Parkinson disease, and Ribosome (Fig. 6F).
Core gene expression validation
Survival analysis found that high expressions of TGFBI, STC2, HOXB9, DTNA, and HS3ST1 were closely associated with poor prognosis in HNSCC patients. To further determine the expression of these 5 hypoxia-related genes in HNSCC tumor tissues, we collected 9 pairs of tissues from oral squamous cell carcinoma patients for qRT-PCR verification. Three genes (DTNA, STC2, and TGFBI) were differentially expressed between HNSCC tumor tissue and normal tissue, and STC2 and TGFBI were significantly up-regulated in tumor tissue, which was consistent with the analysis results (Fig. 7A-C). Consistent with the above qRT-PCR results, immunohistochemical staining of clinical HNSCC specimens showed high expression levels of STC2 and TGFBI in tumor tissues (Fig. 7D).