Development and Validation of a Hypoxia-Related Gene Pairs Signature to Predict Overall Survival in Head and Neck Squamous Cell Carcinoma

Objective Head and neck squamous cell carcinoma (HNSCC) are a highly aggressive tumor with an extremely poor prognosis. Thus, we aimed to develop and validate a robust prognostic signature that can estimate the prognosis for HNSCC. Methods Data on gene expressions and clinical were downloaded from TCGA and GEO database. To develop the best prognosis signature, a LASSO Cox Regression model was employed. Time-dependent receiver-operating characteristic (ROC) was used to determine the best cut-off value. Patients were divided into high-risk and low-risk hypoxia groups according to cut-off value. Survival differences were evaluated by log-rank test, while multivariate analysis was performed by a Cox proportional hazards model. Results A 17-HRGPs composed of 24 unique genes was constructed, which was signicantly related to OS. In the TCGA and GEO datasets, patients in the high hypoxia risk group have a poor prognosis (TCGA: P < 0.001, GEO: P < 0.05). After adjusting for other clinicopathological parameters, the 17-HRGP signature was independent prognostic factors in patients with HNSCC (P < 0.05). Functional analysis revealed that mRNA binding, gene silencing by RNA, RNA binding involved in posttranscriptional gene silencing signaling pathway were enriched in the low-risk groups. For this model, C-index was 0.684, which was higher than that of many established risk models. Macrophages M0, Mast cells activated, NK cells resting, and T cells CD4 memory resting, etc. were signicantly higher in the high-risk group, and B cells memory, Plasma cells, T cells follicular helper, T cells gamma delta, and T cells CD8, etc. were signicantly higher in the low-risk group. Conclusion In summary, our study constructed a robust HRGPs signature as molecular markers for predicting the outcome of HNSCC patients.


Introduction
Head and neck squamous cell carcinoma (HNSCC) are the sixth most prevalent malignancy in humans, with an annual worldwide incidence of more than 600,000 [1]. Tobacco use, alcohol consumption, and human papillomavirus (HPV) infection are potential high-risk factors for HNSCC [2]. Patients with these risk factors are more likely to have worse treatment outcomes and lower survival rate [3,4]. In the past few years, treatments for head and neck cancer include surgery, radiation therapy, targeted therapy, and chemotherapy alone or in combination. Unfortunately, the majority of patients are diagnosed with locally advanced stage, and bone and lungs are the most common metastatic sites [5]. Nowadays, research on HNSCC mainly focuses on prevention methods, molecular mechanisms, and new treatment strategies, while there are still shortcomings in prognosis prediction. Therefore, it is particularly important to develop and verify a new prognostic signature.
Recently, several studies based on gene expression signatures have been shown to classify HNSCC patients into different risk groups with different prognosis [6][7][8]. Unfortunately, due to a series of problems such as over tting of small development datasets and absence of su cient validation cohorts, it has not been accepted in routine clinical practice [9]. Because of the technical biases between different platforms and the heterogeneity of biological data, it brings great challenges to data processing [10]. To make full use of these public data, it is necessary to solve the problem of data comparability. Fortunately, the researchers have developed a new algorithm, which was based on the relative ranking of gene expression level to solve the problem of data comparability, and it has been produced robust results in previous studies [11][12][13][14].
Hypoxia is a common characteristic of solid tumors [15]. When cells are deprived of oxygen, they undergo a variety of biological reactions, including the activation of signaling pathways that regulate proliferation, angiogenesis and death. However, cancer cells have adapted to these changes, allowing tumors to survive and even grow under conditions of hypoxia, which is associated with poor prognosis and resistance to radiation therapy [16][17][18]. Therefore, an accurate, noninvasive biomarker is needed to assess hypoxia status of HNSCC. In our research, hypoxia genes that are signi cantly related to prognosis were screened out. We used these prognostic genes to develop a personalized prognostic signature, and veri ed it with an independent dataset.

Materials And Methods
Acquisition of head and neck squamous cell carcinoma dataset and hypoxia-related genes Level three RNA-seq data of 546 HNSCC samples, as well as their clinical follow-up information were acquired from TCGA data portal. GSE41613 (N = 97, Affymetrix Human Genome U133 Plus 2.0 Array) is a set of independent queues from different high-throughput platforms, and it is used as a data set for external veri cation of signatures. According to the annotation information on the platform, the probes were translated into the corresponding gene symbol. Average values would be represented as gene expression if multiple probes corresponded to the same gene. The expression data of cohorts together with the corresponding clinical parameters were downloaded from Gene Expression Omnibus (GEO). We further downloaded 200 hypoxia-related genes from the Molecular Signatures Database (MSigDB version 6.0). Hypoxia-related gene expression matrix was extracted from two public datasets. Hypoxia-related genes with low variation or imbalanced distribution were ltered out (determined by median absolute deviation >0.5). Finally, 95 key hypoxia-related genes were obtained from the two cohorts.
Establishment of prognostic hypoxia-related gene pairs (HRGPs) signature Each IRGP was computed by pairwise comparison the gene expression values in a given sample. In more detail, in a pairwise comparison, the output is 1 if the rst hypoxia gene is higher than the later one; otherwise, the output was 0. HRGPs in the intersection of the two cohorts were included in the initial candidate factors for prognosis prediction. With P<0.05 as the threshold, univariate survival analysis was performed using the Kaplan-Meier estimator method and the log-rank test, and HRGPs that were signi cantly related to overall survival (OS) were screened. It is worth mentioning that, in order to nd the optimal feature, the least absolute shrinkage and selection operator (Lasso) algorithm was used for feature selection (iteration =1000). The HRGPs' signature risk assessment score formula was as follows: risk score =∑ (Coef gene pair-n × expression of gene pair-n ). The Coef in the formula is the regression coe cient of the optimal prognostic gene pair obtained from the lasso regression model. To classify patients into high or low risk groups, the optimal cut off of the HRGP was determined using timedependent receiver operating characteristic (ROC) at 3 years in the TCGA cohort for OS.

Validation of the HRGPs signature
To verify the previously established signature, the risk score of each sample was calculated by the expression of each prognostic HRGPs and its coe cient. Patients in validation cohorts were assigned into high hypoxia and low hypoxia risk group based on the same cut-off values. The independent prognostic factors were determined by both univariate and multivariate Cox proportional hazard models.

Pro ling of in ltrating immune cells
To understand the enrichment of tumor in ltrating immune cells in the different groups, we employed the CIBERSORT algorithm, which is a novel algorithm that can calculate immune cell composition from bulk tissue gene expression pro les. CIBERSORT quanti ed the relative proportion of 22 in ltrating immune cells in each sample, mainly including T cells, B cells, neutrophils, mast cells and dendritic cells, amongst others. More speci cally, standardized gene expression pro les were uploaded to the CIBERSORT portal website (http://cibersort.stanford.edu/).

Gene ontology (GO) and gene set enrichment analysis (GSEA)
GO function enrichment analysis of the prognostic signature was performed using the R package "gPro ler". GSEA was performed with the Bioconductor R package "fgsea" with default parameters. Gene sets (c5.all.v7.1.symbols.gmt) of interest were retrieved from the Molecular Signature Database. An FDR < 0.25 was considered statistically signi cant.

Statistical analysis
All statistical analysis was performed with GraphPad Prism (version 7), R statistical software. Survival analyses were made using survival curves created by the Kaplan-Meier estimate with the Log-rank test. A Cox proportional hazards regression model was utilized to calculate the hazard ratio (HR) and con dence interval (CI) in univariate and multivariate analyses. Survival curves were visualized using the "survminer" package. ROC curves were plotted with the "survivalROC" package in R. Wilcox test were used to compare differences of immune cells abundance among high-and low-risk groups. The graph of the difference was drawn by GraphPad Prism software. Differences were considered signi cant for all statistical tests at P values < 0.05.

Results
Construction and validation of the HRGP signature Among 200 hypoxia-related genes from the Molecular Signatures Database, 95 HRGs were identi ed by two cohorts and 491 HRGPs were constructed. To explore whether IRGPs are related to OS, samples in the TCGA cohort were selected for the univariate Cox regression analysis. After the initial screening, 18 HRGs which were signi cantly related to the OS of patients were re ected in our eyes. Then, we used Lasso penalized Cox regression to construct a prognostic gene model containing 17 HRGPs, which are composed of 24 hypoxia-related genes ( Figure 1, Table 1). We further calculated a risk score for each patient in HNSCC samples with the risk score formula. The ROC curve analyses showed that the best cutoff value was 0.786 (Fig. 2). Then the patients of the TCGA cohort were assigned into high and low risk groups based on the cut-off value. Patients of high-risk group had a substantially worse outcome compared with patients of low-risk group (P < 0.001, Figure 3A). Using the same risk model and cutoff values as the TCGA cohort in the GSE41613 cohort, patients with a high-risk score had a greater mortality rate than those with a low risk score (P = 0.014, Figure 3B). It was worth noting that risk scores had greater prognostic value in the TCGA cohort, because they differ signi cantly in univariate and multivariate analyses, implying that risk scores are independent prognostic indicators for HNSCC ( Figure  4A). Then, we used the independent dataset GSE41613 as a validation cohort, and con rmed that the risk score can still be used as an independent prognostic factor ( Figure 4B).

Immune cells in ltration between different risk groups
Based on the CIBERSORT algorithm, we determined the abundance of 22 immune in ltrating cells in different risk groups. Figure 5 shows the comparison summary of tumor immune in ltrating cell components based on the CIBERSORT algorithm in the high and the low risk group. As expected, most of the immune in ltrating cells, such as regulatory T cells, follicular helper T cells, eosinophils, M0 macrophages, etc., were enriched in different groups. Dendritic cells activated, Eosinophils, Macrophages M0, Mast cells activated, NK cells resting, and T cells CD4 memory resting were signi cantly highly expressed in the high risk group, but B cells memory, Mast cells resting, T cells CD8, T cells regulatory (Tregs), T cells follicular helper, T cells CD4 memory activated , Plasma cells , T cells gamma delta, and NK cells activated were signi cantly higher in the low risk group( Figure 6).
Functional assessment of the HRGP signature GO analysis revealed that hypoxia genes relevant to the signature in the TCGA cohort were mainly involved in regulation of gene expression epigenetic, mRNA binding, gene silencing, gene silencing by RNA, RNA binding involved in posttranscriptional, T cell receptor complex signaling pathways (Figure 7). In order to explore the potential biological processes and pathways that may change signi cantly between different risk groups, we performed GSEA on TCGA dataset. According to the normalized enrichment score, the most signi cant enrichment signal pathway was determined. As showed in Figure  8, Sensory perception of smell, Olfactory receptor activity, Odorant binding and other related pathways in the low risk group were highly enriched. The enrichment of these pathways provides strong evidence for elucidating the molecular mechanism of HRGP signaling affecting HNSCC, thus predicting the prognosis of HNSCC.

Comparison with previously established prognostic signatures
In order to show that our prognostic model is superior to other models, we compared the HRGP prognostic signature with three published signatures [19][20][21], all data are from TCGA. As we thought, the cindex of the prognostic signature in this study was 0.684, and the HRGP prognostic signature was higher than the existing autophagy-related prognostic signature (c-index = 0.637), the immune-related gene prognostic signature (c-index = 0.592 ) and the RNA-binding protein-related gene prognostic signature (cindex = 0.57) have higher clinical application value.

Discussion
Out of all patients with HNSCC, approximately 60% patients diagnosed with local advanced stage (III-IV) [22]. Nevertheless, in patients with the same TNM staging, the clinical outcomes are different, which indicates that the current staging system is insu cient for prognosis. Tumor hypoxia usually develops as a result of the imbalance between tumor growth rate and tumor angiogenesis rate resulting in reduced intratumoral blood perfusion [23,24]. In fact, hypoxia itself can lead to changes in these biological processes of tumors, including changes in transcription pro les and higher metastatic potential [25]. These studies show that hypoxia can be used as a promising therapeutic target [26,27]. In recent years, hypoxia-related gene signatures for predicting prognosis have been developed for the majority of cancer types, such as colorectal cancer [28], bladder cancer [29], prostate cancer [30] and breast cancer [31].
However, it is important to recognize the inherent intra-tumoral heterogeneity and the technical differences brought about by different sequencing platforms. The di culty in establishing a prognostic risk model is to standardize the expression data correctly. We adopt a novel data analysis method that does not consider the technical deviation of different platforms, so it can improve the robustness of the prognostic model [32]. The prognostic signatures constructed in this study do not need to additional processing of the data, because these prognostic signatures were developed by using in-sample relative orderings of genes. In addition, an advantage that could not be ignored in this study was that the established signature and cutoff value were applicable to datasets from different platforms According to reports, many studies have obtained credible results using this approach [33].
A HRGPs signature was developed to predict OS of HNSCC patients in this study, and veri ed it on other platforms. The prognostic signature we established had potential clinical value, and could effectively divide patients into subgroups with different survival outcomes. The hypoxia genes concern in the signature were mainly involved in mRNA binding, gene silencing by RNA, RNA binding involved in posttranscriptional gene silencing signaling pathway. GSEA further showed that the ten pathways obtained from GO enrichment were highly enriched in the low risk group. We combined clinicopathological factors and the signature risk score to perform univariate and multivariate Cox regression analysis, and it was revealed that HRGPs could be an independent factor to predicting prognosis. The signature was subsequently con rmed in independent validation sets. In addition, compared with the existing three prognostic signatures, the HRGPs signatures identi ed in this study performed well, which is another important advantage in our research.
Notably, the prognostic model contains 24 unique hypoxia-related genes. Previous studies have demonstrated that the signature genes identi ed in this study have been identi ed for important roles in numerous types of human cancers. For instance, BHLHE40 can promote breast cancer cell survival and metastasis by regulating the secretion of HBEGF [34]. SERPINE1 promotes the proliferation and invasion of gastric adenocarcinoma cells by regulating EMT process. [35]. Patients with salivary adenoid cystic carcinoma have poor clinical outcomes when PIM1 is overexpressed [36]. Studies have shown that increased expression of DDIT4 may be a poor prognostic factor for acute myeloid leukaemia patients treated with chemotherapy alone [37]. The present study demonstrated that S100A4 protein may be used as a potential biomarker for prognosis in patients with colorectal cancer [38]. The study by Céline Hoffmann et al. found that CSRP2 is a novel direct cytoskeletal target of HIF-1 and promotes hypoxicinduced invasion of breast cancer cells by promoting invasive pseudopodia formation [39]. Moreover, another study showed that CXCR4 nuclear localization plays an important role in promoting the metastasis of renal cell carcinoma by facilitating HIF-1α entry into the nucleus [40]. Collectively, the above evidence indicates that the genes constituting the signature may regulate certain biological processes, and the disorder of its function may be closely related to the prognosis of HNSCC It is increasingly recognized that the tumor microenvironment plays an huge role in cancer initiation and progression [41]. According to the results of previous studies, the immune component imbalance may be responsible for the differences in survival rates between different subgroups as currently de ned by signature [13]. Increased CD8 T cell in ltration is associated with a good prognosis in cervical cancer patients [42]. In ER-positive breast cancer, the presence of M0 macrophages was associated with poor prognosis [43]. Naive B cells has been shown to be related to good prognosis in hepatocellular carcinoma patients [44]. It is worth pointing out that the results of the present study are consistent with the above results. It has been published that hypoxia can induce immunosuppression directly and indirectly [45]. The dysregulation of tumor immune in ltrating cells may explain the difference in prognosis between the subgroups.
Our study has several limitations should be acknowledged, but they are not enough to change the main conclusions. First, the signature was constructed by retrospective analysis of publicly available datasets. Secondly, further experimental research is required to clarify the biological function of tumor hypoxiarelated to HNSCC gene signature.

Conclusion
Taken together, the current study developed a new HRGP prognostic signature, which provided a novel potential strategy for HNSCC treatment. Moreover, this signature revealed the in ltration of immune cells in HNSCC and can assist in predicting and selecting patients most likely to bene t from immunotherapy.    Time-dependent ROC curve for the signature risk score in the TCGA cohort. Hypoxia-related gene pair risk score of 0.786 was used as cut-off to stratify patients into low-and high-risk groups Kaplan-Meier survival analysis of patients in the high-risk and low-risk groups in the TCGA cohort (a), and GSE41613 dataset (b).

Figure 4
Univariate and multivariate analyses identi ed independent prognostic factors for overall survival of head and neck squamous cell carcinoma in the TCGA cohort (a), and validated in GSE41613 dataset (b) Figure 5 22 different immune cells' abundance inferred by CIBERSORT for different risk groups.