DOI: https://doi.org/10.21203/rs.3.rs-2747494/v1
Head and neck squamous cell carcinoma (HNSCC) is the most common malignant tumor of head and neck, which seriously threatens human life and health. However, the mechanism of hypoxia-associated genes(HAGs)in HNSCC remains unelucidated. This study aims to establish a hypoxia-associated gene signature and the nomogram for predicting the prognosis of patients with HNSCC.
Previous literature reports provided a list of HAGs. The TCGA database provided genetic and clinical information on HNSCC patients. First, a hypoxia-associated gene risk model was constructed for predicting overall survival (OS) in HNSCC patients and externally validated in four GEO datasets (GSE27020, GSE41613, GSE42743, and GSE117973). Then, immune status and metabolic pathways were analyzed. A nomogram was constructed and assessed the predictive value. Finally, experimental validation of the core genes was performed by qRT-PCR and immunohistochemistry.
A HNSCC prognostic model was constructed based on 8 HAGs. This risk model was validated in four external datasets and exhibited high predictive value in various clinical subgroups. Significant differences in immune cell infiltration levels and metabolic pathways were found between high and low risk subgroups. The nomogram was highly accurate for predicting OS in HNSCC patients.
The 8 hypoxia-associated gene signature can serve as novel independent prognostic indicators in HNSCC patients. The nomogram combining the risk score and clinical stage enhanced predictive performance in predicting OS compared to the risk model and clinical characteristics alone.
Head and neck squamous cell carcinoma (HNSCC) is the most common malignancy of the head and neck, arising from the mucosal epithelium of the mouth, pharynx, and larynx. Each year, HNSCC is identified in over 870,000 new cases worldwide, killing about 440,000 people[1]. Smoking, alcohol consumption, and exposure to environmental pollutants are all risk factors for HNSCC[2]. Although HNSCC can be treated with surgical resection supplemented with radiotherapy or chemotherapy plus radiotherapy, HNSCC patients' 5-year survival rate keeps low since there are few early diagnoses[3]. Therefore, to develop new therapies and improve patient prognoses, it is essential to identify new predictive biomarkers for HNSCC.
Over the past few years, high-throughput sequencing and RNA sequencing have made it easier to develop molecular markers, which have made individualized treatment and better cancer prognosis possible[4]. Among these, the presence of hypoxia in solid tumors is an intrinsic characteristic and the role of HAGs in cancer is receiving increasing attention[5]. In addition, the role of hypoxia in tumor angiogenesis, cell proliferation, differentiation, and apoptosis has been established[6, 7]. Hypoxia influences the immune microenvironment and is linked to a poor patient prognosis[8]. In HNSCC, hypoxia induces epithelial-mesenchymal transition (EMT) which provides a powerful driver for tumor progression[9] and enhances the proliferation, migration, and invasion of tumors [10, 11]. Ding et al. reported immune cells infiltrated into HNSCC in a variety of risk groups[12].
In this study, public databases were employed to evaluate the mRNA profiles and associated clinical characteristics of HNSCC patients. Univariate and multivariate Cox regression algorithms were used to screen eight genes from The Cancer Genome Atlas (TCGA) database, namely selenium-binding protein 1 (SELENBP1), Cysteine and glycine-rich protein 2 (CSRP2), interferon-stimulated exonuclease gene 20 (ISG20), transforming growth factor beta-induced protein (TGFBI), Stanniocalcin 2 (STC2), Homeobox B9 (HOXB9), Dystrobrevin Alpha (DTNA) and Heparan Sulfate-Glucosamine 3-Sulfotransferase 1 (HS3ST1). Prognostic features were then established in TCGA database and validated using Gene Expression Omnibus (GEO) datasets. Afterward, we explored potential mechanisms of prognosis by exploring the relationship between the risk model and immune status. Finally, the expression levels of core genes in HNSCC were validated by quantitative real-time polymerase chain reaction (qRT-PCR) and immunohistochemistry.
From the TCGA database, RNA sequence (RNA-seq) data and corresponding clinical parameters of 502 HNSCC patients and 44 adjacent normal tissues were obtained on 23 May 2022. HNSCC patients with expression and survival data were screened for analysis, and finally, 499 HNSCC patients were included for subsequent analysis. For external validation purposes, RNA-seq data and clinical parameters of HNSCC patients were obtained from the 4 GEO datasets (GSE27020, GSE41613, GSE42743, and GSE117973).
A total of 200 HAGs were extracted from the previously published study[13]. The "limma" package of R software was performed to distinguish the differentially expressed HAGs between HNSCC and adjacent normal tissues[14]. To explore the biological functions of the hypoxia-associated gene signature, GO and KEGG enrichment analyses were performed by the R-package "Clusterprofiler" (version 4.0)[15, 16].
To establish a prognostic model, the R package "glmnet" was used to perform the univariate Cox regression and LASSO analyses on the differentially expressed genes screened previously. The penalty factor λ was identified by the minimum parameters. Next, a risk prognostic model was developed by multi-factor Cox regression, and the following formula was used to calculate the risk score: Risk Score = Gene1 CoefixExpi + Gene2 CoefixExpi + ...GeneN CoefixExpi (Coef: coefficients, Exp: gene expression levels). 499 HNSCC patients from the TCGA database were classified into low and high risk subgroups based on the median risk score. Subsequently, the overall survival curves of different subgroups were compared by Kaplan-Meier analysis, the overall survival at 1, 3, and 5 years was described by time-dependent receiver operating characteristic (ROC) analysis, and the area under the curve (AUC) was used to access the model's predictive power. Finally, this prognostic gene signature was also demonstrated to have prognostic value in predicting OS in HNSCC patients using the datasets GSE27020, GSE41613, GSE42743, and GSE117973.
The TCGA-HNSCC samples were randomized into two groups to clarify the association between the prognostic model and various clinical characteristics, such as staging, grade, age, gender, T stage, N stage, and M stage. There were two subgroups of patients: stage I/II and III/IV subgroups, grade I/II and III/IV subgroups, age < 60 and age ≥ 60 subgroups, male and female subgroups, T0-T2 and T3/4 subgroups, N0 and N + subgroups, and M0 and M1 + Mx subgroups, respectively. To confirm the 8-gene prognostic features' independent prognostic value, Kaplan-Meier survival analysis was performed on specific subgroups with various clinical characteristics.
Immune differences between the two groups of the TCGA database were synthesized by using computational methods for assessing immune infiltration and function, including ESTIMATE[17], TIMER[18], MCP-counter[19], CIBERSORTx[20], and single-sample gene set enrichment analysis (ssGSEA). A two-sample Wilcoxon test was applied to compare immune infiltration and immune-related functions between the high and low risk groups.
Based on the TCGA HNSCC cohort, a nomogram containing characteristic risk scores and other clinical characteristics was developed to better predict HNSCC prognosis. Using univariate Cox regression analysis, clinical characteristics with significant associations with HNSCC prognosis were screened. Then, the variables screened in the previous step were included in a multivariate Cox regression analysis to search for independent predictive variables for OS in HNSCC patients and to construct the nomogram by statistically significant variables. Finally, the predictive performance of the established nomogram, risk score, and clinical characteristics was compared by using the decision curve analysis (DCA), calibration curves, ROC curves, and consistency index (C-index).
To better understand the biological processes of hypoxia-associated genes, the most relevant genes were identified in the TCGA database (Pearson |R|>0.5, P < 0.05), and functional enrichment analysis was performed using the R-package "Clusterprofiler". Subsequently, gene set variation analysis (GSVA) enrichment analysis[21] and Gene set enrichment analysis (GSEA)[22] were used to screen the HNSCC cohort from TCGA for signaling pathways significantly associated with risk groupings and to screen out significant pathways according to a false discovery rate (FDR) < 0.05.
The Zhongnan Hospital of Wuhan University provided nine pairs of HNSCC and adjacent non-cancerous tissues for this study, all of which were authorized by the ethics committee. Surgically resected patients with HNSCC who did not receive chemotherapy or radiotherapy were recruited for the study. Total RNA was extracted from tissues using TriQuick Reagent (Solarbio, Beijing, China); reverse transcription was carried out using a Prime Script RT kit (TaKaRa, Dalian, China); and quantitative PCR was performed using standard protocols from the SYBR Green PCR kit (Toyobo, Osaka, Japan). The primer sequences for PCR are shown in Additional file 1: Table S1. Candidate genes' relative mRNA levels were normalized to GAPDH mRNA expression, while differences were compared using paired t-tests. IHC was used to further verify the differential expression of core genes. Samples were dewaxed with xylene, rehydrated with ethanol, and repaired by immersion in EDTA (pH 9.0) at high pressure for 20 min. Then, the samples were incubated with 3% H2O2 for 25 min away from light to block endogenous peroxidase activity. After blocking with BSA at room temperature for 30 min, the sections were incubated overnight with monoclonal antibodies against DTNA (1:100, abcam), STC2 (1:200, abcam), and TGFBI (1:200, abcam) at 4°C and then incubated with secondary antibodies (1:300, Servicebio, Wuhan, China). The images were taken with the Nikon E100 microscope (Nikon, Japan).
The Student's t-test and one-way ANOVA were applied to compare continuous variables, while the chi-square test was used to compare categorical variables. Using a log-rank test, Kaplan-Meier survival curves were compared. Hazard ratios (HRs) and 95% confidence intervals (CIs) for genes and clinical parameters associated with hypoxia were calculated using univariate and multivariate Cox regressions. Statistical analyses were performed using R software (v4.0.2). P < 0.05 was considered a significant level.
First, in the TCGA HNSCC database (n = 502) and normal samples (n = 44), the expression of 200 HAGs was compared and then identified and analyzed. The RNA expression levels of 54 identified genes varied among the tumor and the normal groups, which are presented as heatmap and volcano plot (adjusted p < 0.05, Fig. 1A-B). Then, GO enrichment analysis was performed on these 54 HAGs, which identified the main pathways involved in these 54 genes: response to hypoxia/response to oxygen levels/cell chemotaxis/glucose metabolic process (Fig. 1C). Finally, the KEGG enrichment analysis was also performed on these 54 HAGs, and the findings revealed that they were involved in the following signaling pathways: MAPK signaling pathway, HIF-1 signaling pathway, insulin signaling pathway, and focal adhesion (Fig. 1D).
A univariate Cox regression approach was applied to screen HAGs significantly associated with OS in HNSCC. The analysis showed that 49 of 200 HAGs were highly related to OS in HNSCC, and 16 of these 49 genes varied considerably between HNSCC tissues and adjacent paracancerous tissues (Fig. 2A). To further narrow down the gene selection, LASSO analysis was performed on the 16 genes screened in the previous step and selected 12 genes (SERPINE1, HOXB9, SELENBP1, CXCR4, DTNA, ISG20, STC2, HS3ST1, ADM, STC1, CSRP2, and TGFBI) as candidate genes (Fig. 2B-C). Next, multifactorial Cox regression analysis was performed on the 12 candidate genes. Finally, eight genes (HOXB9, SELENBP1, DTNA, ISG20, STC2, HS3ST1, CSRP2, and TGFBI) were screened for the construction of prognostic risk models; among them, SELENBP1, CSRP2, and ISG20 were considered as biomarkers indicative of good prognosis (HR < 1), while TGFBI, STC2, HOXB9, DTNA, and HS3ST1 were indicative of poor prognosis (HR > 1). A risk model of HNSCC based on eight prognostic genes was generated (risk score = (0.199 × HOXB9 exp.) + (-0.238 × SELENBP1 exp.) + (0.359 × DTNA exp.) + (-0.134 × ISG20 exp.) + (0.193 × STC2 exp.) + (0.418 × HS3ST1 exp.) + (-0.159 × CSRP2 exp.) + (-0.094 × TGFBI exp.)). Based on this risk model, the Kaplan-Meier method was applied to further investigate patients’ survival in HNSCC. 499 patients with HNSCC were distributed equally into two subgroups based on median risk scores, compared with the high-risk subgroup, those in the low-risk subgroup lived longer (Fig. 2D-E). Then, the model's capability of predicting future HNSCC risk was evaluated using a time-dependent ROC analysis. The risk model predicted 1-year, 3-year, and 5-year survival with AUC values of 0.660, 0.714, and 0.672, respectively (Fig. 2F).
The 8-gene prognostic risk model was validated in four GEO datasets (GSE27020, GSE41613, GSE42743, and GSE117973). According to the same risk score construction method in TCGA database, the four GEO datasets were classified into high and low risk groups based on the median risk scores and evaluated the performance of the risk model. The results of the analysis are shown in Figs. 3and Additional file 2: Figure S1. The AUC for 1-year, 3-year, and 5-year OS was calculated to present the accuracy of the model prediction. In GSE27020, the AUC was 0.739, 0.701, and 0.687 respectively (Fig. 3A); In GSE41613, the AUC was 0.674, 0.733, and 0.680 respectively (Fig. 3B); In GSE42743, the AUC was 0.662, 0.717, and 0.933 respectively (Additional file 2: Figure S1 A); In GSE117973, the AUC was 0.663, 0.748, and 0.736 respectively (Additional file 2: Figure S1 B). Meanwhile, in all four GEO datasets (GSE27020, GSE41613, GSE42743, and GSE117973), patients in the low-risk group survived longer than those in the high-risk group (Fig. 3C-F, Additional file 2: Figure S1 C-F). Taken together, both results in TCGA and the 4 GEO external validation datasets confirmed that the risk prognostic model had a high predictive value.
Additionally, to explore the model's prognostic value in HNSCC patients after stratification based on clinicopathological variables in the TCGA database, patients were classified into subgroups according to their Age, Gender, Grade, M stage, N stage, Radiation therapy, stage, and T stage to plot Kaplan-Meier survival curves. The sample was divided into 16 subgroups: Young (< 60 years) and Old (≥ 60 years), Female and Male, Grade I-II and Grade III-IV, Mstage M0 and Mstage M1 + Mx, Nstage N0 and Nstage N+, Radiation therapy NO and Radiation therapy YES, Stage I-II and Stage III-IV, Tstage T0-T2, and Tstage T3-T4. In each subgroup, the previous thresholds were selected and the patients were further allocated into the low and high groups. OS was considerably shorter in the high-risk group compared to the low-risk group across all subgroups (Additional file 2: Figure S2 A-H, Additional file 2: Figure S3 A-H). The results suggest that the risk model can accurately predict the outcome of HNSCC patients without considering the clinicopathological subgroups. In conclusion, risk characteristics are crucial in determining a patient's prognosis for HNSCC.
To investigate the correlations between the risk scores model and immune status, immune cell infiltration and immune-related function scores were quantified by various algorithms (ESTIMATE, CIBERSORTx, TIMER, MCP counter, and ssGSEA). Figure 4A demonstrates that the low-risk group's Immune Scores were considerably higher than those of the high-risk group, while the ESTIMATE Score and Stromal Score did not differ significantly, indicating a higher level of immunity in the low-risk group (p < 0.01). The variation of tumor microenvironment cells may be the cause of risk score heterogeneity. Then, based on TCGA-HNSCC data, analyses of the difference in immune cell subpopulations revealed significant differences between the two subgroups for scores of immune cells (B cells, T cells, Myeloid dendritic cells, plasma cells, Macrophage M0, T cell CD4 memory resting, T cell follicular helper, Tregs, NK cells resting, dendritic cells resting, Mast cells, iDCs, pDCs, Th2 cells, and TIL) (p < 0.001; Fig. 4B-E). Furthermore, immune function scores (Checkpoint, Cytolytic activity, HLA, Inflammation promoting, T cell co-inhibition, T cell co-stimulation, and Type I IFN response) were significantly different (p < 0.05; Fig. 4F). In summary, these results suggest that risk score may influence the tumor microenvironment, especially immune infiltration.
In univariate Cox regression analyses, risk score (HR = 1.822, 95% CI = 1.549–2.142, p-value < 0.001), Age (HR = 1.378, 95% CI = 1.053–1.804, p-value = 0.0193), Stage (HR = 1.421, 95% CI = 1.187–1.702, p-value < 0.001), Tstage (HR = 1.296, 95% CI = 1.124–1.495, p-value < 0.001), and Nstage (HR = 1.541, 95% CI = 1.303–1.822, p-value < 0.001) were considerably correlated to OS in patients with HNSCC (Fig. 5A). In multivariate Cox regression analysis, characteristic risk score (HR = 1.830, 95% CI = 1.479–2.265, p-value < 0.001), Nstage (HR = 1.480, 95% CI = 1.186–1.847, p-value < 0.001) and Age (HR = 1.449, 95% CI = 1.041–2.016, p-value = 0.0279) were shown to be independent predictors of OS in HNSCC patients (Fig. 5A). Then, the risk score, Age, and N stage were combined to construct a nomogram for predicting OS of 1-year, 3-year, and 5-year in HNSCC patients (Fig. 5B). The performance of the nomogram was evaluated, and it showed superior predictive potential compared to the risk score and other clinical characteristics (Age, Gender, Stage, T stage, Nstage, and Grade) for OS at 1, 3, and 5 years (AUC = 0.704, 0.758 and 0.733 respectively, Fig. 5C). The calibration curves for the nomogram demonstrated good agreement between the actual OS probabilities and the OS probabilities predicted by the nomogram (Fig. 5D). The nomogram had good discrimination compared to the risk score, Age, and Nstage, with a C-index close to 0.7 (Fig. 5E). The DCA curves showed that the nomogram had more net benefit than the risk score, Age, and Nstage (Fig. 5F).
To clarify the signaling pathways related to the hypoxia 8-gene signature, we screened the genes with the highest correlation coefficients with risk scores (Pearson |R| > 0.5, P < 0.05) in the TCGA database and applied functional enrichment analysis to them. A total of 50 negatively and 51 positively correlated genes were screened by correlation analysis, and the correlation heatmap was presented in Fig. 6A. Next, GO enrichment analysis was performed on these 101 genes, which revealed that these genes were mainly involved in signaling pathways including divalent inorganic cation homeostasis, response to insulin stimulus, fibroblast proliferation, positive regulation of hemostasis and positive regulation of coagulation (Fig. 6B). Subsequently, KEGG enrichment analysis showed that they were involved in signaling pathways including Axon guidance, Inflammatory mediator regulation of TRP channels, Rap1 signaling pathway, and EGFR tyrosine kinase inhibitor resistance (Fig. 6C). Additionally, the signaling pathways involved in the hypoxia 8-gene signature were investigated using GSVAs and the results showed that in the high-risk subgroup, the top 5 signaling pathways significantly activated included Glycolysis, Adipogenesis, MYC Targets V1, MTORC1 Signaling, and PI3K-AKT-mTOR Signaling (Fig. 6D). Finally, the signaling pathways involved in the hypoxia 8-gene signature were screened by GSEA enrichment analysis, which showed that the significantly enriched signaling pathways in the high-risk subgroup included: Dilated cardiomyopathy, ECM Receptor interaction, Focal adhesion, Hypertrophic cardiomyopathy and pathways in cancer (Fig. 6E). Significantly enriched signaling pathways in the low-risk subgroup included Alpha-linolenic acid metabolism, Autoimmune thyroid disease, Oxidative phosphorylation, Parkinson disease, and Ribosome (Fig. 6F).
Survival analysis found that high expressions of TGFBI, STC2, HOXB9, DTNA, and HS3ST1 were closely associated with poor prognosis in HNSCC patients. To further determine the expression of these 5 hypoxia-related genes in HNSCC tumor tissues, we collected 9 pairs of tissues from oral squamous cell carcinoma patients for qRT-PCR verification. Three genes (DTNA, STC2, and TGFBI) were differentially expressed between HNSCC tumor tissue and normal tissue, and STC2 and TGFBI were significantly up-regulated in tumor tissue, which was consistent with the analysis results (Fig. 7A-C). Consistent with the above qRT-PCR results, immunohistochemical staining of clinical HNSCC specimens showed high expression levels of STC2 and TGFBI in tumor tissues (Fig. 7D).
HNSCC is characterized by late diagnosis, easy metastasis, relapse, and resistance to treatments. The five-year survival rate of patients is very low, which seriously endangers the health of patients [23]. To improve HNSCC treatment effectiveness, it is critical to find new molecular biomarkers. HNSCC progression has been revealed to be significantly associated with hypoxia in growing number of studies, however, the prognostic value of hypoxia in HNSCC remains unclear.
In this study, an eight hypoxia-associated gene signature (SELENBP1, CSRP2, ISG20, TGFBI, STC2, HOXB9, DTNA, and HS3ST1) that can predict patients' prognoses for HNSCC was established in the TCGA database and externally validated in four GEO datasets. At the same time, the risk factors of poor prognosis of HNSCC (TGFBI, STC2, HOXB9, DTNA, and HS3ST1) were verified by qRT-PCR and immunohistochemistry.
As a member of the family of selenium-binding proteins, SELENBP1 forms a covalent bond with selenium and is widely expressed across multiple tissues and cell lines, including the lung, heart, liver, and kidney[24]. Low SELENBP1 expression has been reported to contribute to a poor prognosis in the lung[25], ovarian[26], and colorectal cancers[27]. In our study, down-regulation of SELENBP1 was related to poor prognosis in HNSCC, a finding consistent with Ming et al. who found that SELENBP1 was up-regulated in the low-risk group of HNSCC[28]. CSRP2, a protein containing two LIM structural domains, is involved in tumor cell proliferation, migration, and invasion in breast cancer[29], gastric cancer[30], and lymphocytic leukemia[31]. However, it is controversial what function CSRP2 plays in various tumor types and its exact cellular functions need further elucidation. We found that a low level of CSRP2 expression in HNSCC was associated with poor prognosis, which is consistent with that of Nie et al. who found that CSRP2 was associated with a better prognosis in oral squamous cell carcinoma [32]. In addition, Zhou et al. also reported that high expression of CSRP2 in squamous cell carcinoma of the tongue implied a better prognosis[33]. ISG20 is a 3'-5' exonuclease that can degrade viral RNA in vitro[34]. In our analysis, the downregulation of ISG20 was associated with poor prognosis in HNSCC. This result appears to be inconsistent with the results of Gao et al. who found that in human gliomas, ISG20 suppressed adaptive immune responses and that those with gliomas expressing high ISG20 had a poor prognosis[35]. TGFBI is a secreted extracellular matrix (ECM) protein that is induced by transforming growth factor β (TGFβ). In our study, high TGFBI expression predicted poor OS in HNSCC patients, a result consistent with that of Kisoda et al. They reported that p-EMT-related genes including TGFBI were highly expressed in HNSCC samples compared to normal tissue, and this was linked to a poor prognosis[36]. It is important to note that even though many reports suggest that TGFBI suppresses tumors[37], there have been reports that it promotes tumors as well [38, 39]. The glycoprotein hormone STC2 regulates the balance of calcium and phosphate and is implicated in cancer progression[40]. Under hypoxia conditions, the expression of STC2 is activated[41], which drives the growth, proliferation, and tumorigenesis of tumor cells. The expression of STC2 is associated with the prognosis of hepatocellular carcinoma patients[42]. A study has shown that STC2 promotes the occurrence and development of osteosarcoma by inhibiting CD8 + T cells[43]. In this study, we studied the relationship between STC2 expression level and prognosis of HNSCC patients based on the TCGA database and found that high STC2 expression was associated with poor prognosis of HNSCC. HOXB9 is a HOX gene that, in addition to its key role in embryonic development, is also involved in the regulation of several human cancers[44]. Our finding that HOXB9 is highly expressed in HNSCC tissues with poor prognosis is consistent with the study by Darda et al., who demonstrated that HOXB9 expression is increased both in vitro and in HNSCC tissues[45].
The relationship between DTNA, HS3ST1, and the prognosis of HNSCC has not been fully elucidated. DTNA encodes a scaffolding protein that keeps muscle cells structurally intact. Liu et al. have screened for biomarkers of early-stage colon cancer diagnosis and prognosis by RNA sequencing and identified DTNA as a valuable diagnostic marker for colon adenocarcinoma[46]. HS3ST1 is a rate-limiting enzyme involved in the biosynthesis of heparan sulfate. Wang et al. found that conditional deletion of HS3ST1 significantly inhibited CRC tumor development in colorectal cancer, and targeting HS3ST1 may be an ideal strategy for CRC immunotherapy[47].
Considering their biological behavior and function in different cancers, as well as the results of this study, it is reasonable to assume that these eight prognostic HAGs can be used as potential biomarkers for the prognostic assessment of HNSCC patients.
Additionally, we combined characteristic risk scores and clinical staging to develop a nomogram. The nomogram calibration curves predicted the OS of HNSCC patients more accurately. Meanwhile, we used ESTIMATE, TIMER, MCP counter, CIBERSORTx, and single sample gene set enrichment analysis (ssGSEA) to assess the immune state between various risk groups, indicating that the tumor microenvironment may influence the prognosis of HNSCC patients.
Admittedly, there are also some limitations in our study. First, the prognostic model we developed was only obtained and processed from the TCGA database. Although we have validated it in four GEO datasets, the reliability and stability of the model cannot be guaranteed in a real-world cohort because of the limited sample size and biological complexity of the study cohort. In future studies aiming to establish prognostic features, not only the mechanism of the feature selection method should be considered, but also the sample size and data distribution.
Admittedly, there are also some limitations in our study. The prognostic model we developed was obtained and processed from the TCGA database and validated in four GEO datasets, and the results were confirmed through experimental verification in tissues from HNSCC patients. However, these tissue samples lack some clinical follow-up information. In the future, large-sample multicenter clinical studies are needed to guarantee the stability of the model.
We have developed a risk model based on 8 HAGs to predict prognosis in HNSCC patients. Additionally, we developed a nomogram combining characteristic risk scores and clinical staging, which showed better predictive performance than the risk score and clinical characteristics. Our findings suggest that the risk model based on HAGs proved valuable in prognostic prediction and clinical decision-making in HNSCC patients.
AUC area under the curve
CI confidence interval
C-index consistency index
DCA decision curve analysis
GEO Gene Expression Omnibus
GSEA Gene set enrichment analysis
GSVA gene set variation analysis
HAG hypoxia-associated gene
HNSCC head and neck squamous cell carcinoma
HR Hazard ratio
OS overall survival
qRT-PCR quantitative real-time polymerase chain reaction
RNA-seq RNA sequence
ROC receiver operating characteristic
ssGSEA single-sample gene set enrichment analysis
Ethical approval and consent to participate
This study was carried out in accordance with Helsinki Declaration and was approved by the Medical Ethics Committee, Zhongnan Hospital of Wuhan University (approve number:2022095K). Informed consent was obtained from all participants in this study.
Consent for publication
Not applicable.
Availability of data and materials
The datasets presented in this study can be found in GEO (https://www.ncbi.nlm.nih.gov/) and TCGA (https://www.genome.gov/Funded-Programs-Projects/Cancer-Genome-Atlas) databases.
Competing interests
The authors declare that they have no competing interests.
Funding
This work was supported by the grants from Hubei province unveiling science and technology projects (No. 2021BEC027), and General program of Health Commission of Hubei Province (No. WJ2023M122).
Authors' Contributions
Jingyi Luo performed the data analysis and drafted the manuscript.
Yuejiao Huang optimized the analysis protocol and completed the experimental part.
Jiahe Wu prepared the figures and contributed towards the study design.
Lin Dai, Mingyou Dong and Bo Cheng revised the figures, designed and supervised the study.
The first author: Jingyi Luo, Yuejiao Huang
The corresponding author: Lin Dai, Mingyou Dong, Bo Cheng
All authors have read and approved the final manuscript.
Acknowledgments
We sincerely appreciate the researchers for providing their Transcriptome data in GEO and TCGA databases, we are truly honored to acknowledge their contributions.