A novel comprehensive immune-related gene signature as a promising survival predictor for head and neck squamous cell carcinoma

Background : Head and neck squamous cell carcinoma (HNSCC), the most frequent subtype of head and neck cancer, continues to have a poor prognosis with no improvement. Growing evidence has demonstrated that the immune system plays a crucial role in the development and progression of HNSCC. The goal of our study was to develop an immune-related signature for accurately predicting the survival of HNSCC patients. Methods: Gene expression profiles were established from a total of 546 HNSCC and normal tissues to establish a training set and 83 HNSCC tissues for a validation set. Differentially expressed prognostic immune genes were identified by univariate Cox regression analysis and a corresponding network of differentially expressed transcription factors (TFs) were identified using Cytoscape. The immunerelated gene signature was established and validated by univariate Cox regression analysis, least absolute shrinkage and selector operation (LASSO), and multivariate Cox regression analyses. In addition, the prognostic value of the immune-related signature was analyzed by survival and Cox regression analysis. Finally, the correlation between the immune-related signature and the immune microenvironment was established. Results: In this study, the TF-mediated network revealed that Foxp3 plays a central role in the regulatory mechanism of most immune genes. A prognostic signature based on 10 immune-related genes, which divided patients into high and low risk groups, was developed and successfully validated using two independent databases. Our prognostic signature was significantly related to worse survival and predicted prognosis in patients with different clinicopathological factors. A nomogram including clinical characteristics was also constructed for accurate prediction. Furthermore, it was determined that our prognostic signature may act as an independent factor for predicting the survival of HNSCC patients. ROC analysis also revealed that our signature had superior predictive value compared with TNM stage. As for the immune microenvironment, our signature showed a positive correlation with activated mast cells and M0 macrophages, a negative correlation with Tregs, and immune checkpoint molecules PD-1 and CLTA-4. Conclusions: Our study established an immune-related gene signature, which not only provides a in this study, and patients who did not die from HNSCC were excluded. The survival time threshold in our study was greater than one month. The overall survival (OS) was defined as the date of the study enrollment to the last follow-up time. genes from the TCGA dataset, which consists of data from 502 cases of HNSCC and 44 normal tissues. Using GO and KEGG enrichment analysis, we found that the immune-related genes are primarily associated with immune response, cancer, and drug resistance pathways (i.e. MAPK signaling pathway, EGFR tyrosine kinase inhibitor resistance, Ras signaling pathway and endocrine resistance). Similar studies have demonstrated that the MAPK signaling pathway participates in cancer progression, including proliferation, apoptosis and immune escape, and it is fundamental to cancer control[27].

promising biomarker for survival prediction, but may be evaluated as an indicator for personalized immunotherapy in patients with HNSCC.

Background
Head and neck squamous cell carcinoma (HNSCC) is the sixth most common malignant tumor.
Worldwide, approximately 830,000 patients suffer from head and neck cancer and about 430,000 people die from this disease annually [1]. Approximately 75% of these cases are attributable to tobacco smoking and alcohol abuse, which are the two major risk factors for head and neck cancer [2].
Human papillomavirus (HPV) infection is also a significant etiological factor for HNSCC [3]. However, despite advances in surgery, radiotherapy and chemotherapy for the treatment of HNSCC, the 5-year overall survival rate is only 40%-50% [4]. In addition, the survival rate of advanced cancer patients is only 34.9%, primarily due to metastasis and recurrence [5]. Traditionally, TNM stage, which consists of T classification (T), lymph-node metastases (N), and distant metastases (M), is the most important prognostic indicator for predicting postoperative outcome of HNSCC [6]. Nevertheless, the limitation to TNM staging in evaluating patient prognosis is gradually emerging. For example, patients with the same TNM stage and treatment regimen have different clinical outcomes and TNM stage does not predict response to therapy [7]. Therefore, it is necessary to identify molecular biomarkers that can accurately predict patient prognosis and to stratify high-risk HNSCC patients for more intensive treatment regimens.
It is well known that immune escape is an important mechanism of tumor development. An increasing body of evidence suggests that the immune system plays a vital role in patient outcome and tumor molecular profiles may be useful for predicting clinical outcome, as well as identifying therapeutic targets [8,9]. In recent years, cytotoxic T-lymphocyte-associated protein 4 (CTLA-4) and programmed cell death protein-1 (PD-1)/programmed cell death-ligand 1 (PD-L1) were found to be important immune checkpoint components, that enable tumors to escape immune surveillance [10,11]. As a result of intense study of the tumor immune-related mechanism, the emergence of checkpoint immunosuppressive therapy has completely changed treatment regimens for many advanced malignant tumors including HNSCC. Checkpoint blockade immunotherapeutics, such as nivolumab and pembrolizumab, are primarily used in the patients with recurrent or metastatic disease, but the observed objective response rates are in the range of only 16-25% [12,13]. Increasing evidence has also revealed that a prognostic signature containing several to dozens of genes have laid a foundation for predicting the survival of HNSCC [14][15][16]. To our knowledge, there is no immune-related prognostic signature which can not only predict survival, but may be associated with the immune checkpoint.
In this study, we obtained differentially expressed immune-related genes and its function enrichment of HNSCC through a bioinformatics analysis of the The Cancer Genome Atlas (TCGA) database. Next, we identified differentially expressed transcription factors (TFs) that were significantly associated with immune-related prognosis genes. Subsequently, an immune-related prognostic signature was constructed with the TCGA database and further validated for its prognostic value using the Group on Earth Observations (GEO) database. Moreover, the relationship between our signature, infiltrating immune cells, and immune checkpoints was determined. Finally, the mutation characteristics and Gene Set Enrichment Analysis (GSEA) of our gene signature were established. Our goal is to provide a novel molecular biomarker that more effectively predict prognosis and are strongly associated with the immune microenvironment in HNSCC patients.

Materials And Methods Study samples
The training dataset was acquired from the TCGA data portal (https:// portal.gdc.cancer.gov/cart; up to March 22, 2020), and consisted of processed RNA-Seq FPKM data for HNSCC tumors (n = 502) and adjacent normal tissues (n = 44). The corresponding clinical data, including age, gender, smoking, HPV status, alcohol abuse, differentiation grade, clinical TNM stage, T stage, N stage, M stage, recurrence, and survival information were also retrieved from the database. Next, data from 83 cases (GSE41613) were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) along with clinical data and follow-up time as a validation set. Only patients with complete clinical and expression data available at that time were included in this study, and patients who did not die from HNSCC were excluded. The survival time threshold in our study was greater than one month. The overall survival (OS) was defined as the date of the study enrollment to the last follow-up time.

Differentially Expressed Genes Of Irgs And Tfs
We obtained immune-related genes (IRGs) from the Immunology Database and Analysis Portal (ImmPort) (https://www.immport.org/) and TFs data were downloaded from the Cistrome Cancer resource (http://cistrome.org/CistromeCancer/CancerTarget/). To establish a training set, we used the R package Limma to identify differentially expressed genes (DEGs) for IRGs and TFs from HNSCC and normal tissues, where FDR < 0.05 and |log(FC)| ≥ 1 were set as the screening criteria [17]. Heat maps and volcano plots were also generated with R software. Furthermore, we assessed the biologic functions of differentially expressed immune-related genes using the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway databases. Enrichment analysis was done with the Cluster Profiler package [18] in R and functional categories with p < 0.05 were shown.

Regulatory Tf Networks
To further investigate the relationship between differentially expressed immune-related genes and HNSCC prognosis, we used the univariate Cox proportional hazard model. Genes with a p-value < 0.05 were selected for further analysis and genes with a hazard ratio (HR) value > 1 were defined as highrisk IRGs, whereas the remainder were considered low-risk. Finally, the association between the above prognosis immune-related genes and differentially expressed TFs was analyzed by Pearson's correlation test. The cut-off criteria included correlation coefficients > 0.5 and p-values < 0.05, which were determined by the cor.test function in R (Additional file 1: Table S1). To more clearly express the relationships, we used Cytoscape for constructing and visualizing the regulatory network [19].

Construction And Validation Of An Immunerelated Signature
First, to normalize the gene expression values in the training and validation datasets, gene expression values lower than the median were defined as 0, otherwise a value of 1 was assigned. Second, to construct an immunerelated prognostic signature, we selected genes with a p-value < 0.01 by univariate cox analysis. We then used LASSO Cox regression and multivariate Cox regression model to assess the relationship between prognostic immune-related gene expression and OS in the training set using the survival and glmnet R packages. The smallest parameter model for the immunerelated prognostic signature was constructed with a 10-fold cross-validation and used one standard error of the best penalty parameter λ to prevent overfitting [20]. Finally, the risk score for each patient was calculated by gene expression and its corresponding coefficients from the multivariate Cox regression analysis. Patients were then divided into high and low risk groups based on median risk score. To validate the immunerelated prognostic signature, we used the same formula as the training set to calculate each patient risk score followed by classification into high and low risk groups.

Mutation Characteristics and Gene Set Enrichment Analysis of the immunerelated signature
The mutation characteristics of our immune-related signature in all HNSCC patients from the TCGA dataset were obtained using cBioPortal (http://www.cbioportal.org/). GSEA was performed to identify the pathways that were significantly enriched between high risk groups based in the immune-related signature. An FDR < 0.25 and nominal p < 0.05 were used as the screening criteria to identify significant gene sets by the GESA software.

Statistical Analysis
Kaplan-Meier analysis was done to compare the OS between high and low-risk groups using the logrank test. Meanwhile, a receiver operating characteristic (ROC) curve was used to evaluate the accuracy of the immune-related gene signature. The clinicopathological characteristics on OS was determined by univariate and multivariate analyses on the basis of the Cox proportional hazards model for both training and validation sets. Furthermore, the correlation between immune-related signature and tumor-infltrating immune cells and immune checkpoint molecules was evaluated by Pearson's correlation test. All analyses were conducted using R software (version 3.6.2) and the results were considered significant when corresponding p-values < 0.05.

Differentially Expressed Genes And Functional Enrichment Analysis
For the training set, a total of 400 IRGs (305 upregulated and 95 downregulated) and 63 TFs (46 upregulated and 17 downregulated), which were differentially expressed in HNSCC tissues compared with normal tissues, were identified and presented in heat maps and volcano plots (Additional file 3:

Regulatory Network Of Tfs
We evaluated the association of immune-related genes and overall survival using a log-rank test with a univariate Cox analysis. We identified 51 genes with a p-value < 0.05 for analysis with differentially expressed TFs using a Pearson's correlation test. As a result, correlation coefficients > 0.5 with p < 0.05 were used to establish a network, which was done using Cytoscape software (Fig. 2, A). As shown in the network, we found that Foxp3 occupied a major position and positively regulated most of the low-risk IRGs including LTA, CXCR4, CXCR3, IL21R, CD247, ZAP70, SH2D1A, ICOS, and CTLA4. This suggests that Foxp3 may play an important role in the immune regulation of HNSCC.
Construction And Validation Of The 10 Immune-related Gene Signature The 24 prognostic immune-related genes, which were obtained from the univariate Cox analysis with P value < 0.01 (Fig. 2, B), were subjected to Lasso analysis and 18 genes were eliminated (Fig. 3, A-B).
The multivariate Cox regression model was then applied to select the final gene set. As a result, 10 immue-related genes were filtered to establish a prognostic model. The formula for the prognostic model was as follows: risk score = (-0.319322778 * DEFB1 status) + (-0.417843642 * EDNRB status) A 10-gene Immune-related Signature Is An Independent Prognostic Factor To determine whether the 10-gene immune-related signature was an independent prognostic factor for patient survival in the training and validation sets, univariate and multivariate Cox models were established ( Table 1). The results of the univariate Cox analysis showed that gender, smoking, TNM stage, T stage, N stage, recurrence and risk score were the factors associated with patient prognosis in the training set. TNM stage and risk score were the factors associated with patient prognosis in the validation set. In addition, our immune-related signature was an independent prognostic risk factor in both the training set (461 cases: HR = 2.08, 95% CI 1.85-3.56, P < 0.001) and the validation set (83 cases: HR = 9.11, 95% CI 3.21-25.82, P < 0.001) by multivariate analysis. The association of our signature with clinicopathologic factors was carried out and the results indicated that only T stage, TNM stage, recurrence, and HPV exhibited statistical significance (p < 0.05, Fig. 4).  showed consistency between predicted and actual survival in the training set (Fig. 5, D-E). A nomogram with clinicopathologic factors and risk score was established to quantitatively predict the prognosis of HNSCC patients. It revealed that our prognostic signature was a key factor for predicting the survival, while the clinicopathological characteristics showed an inferior impact (Fig. 5, F).
We further determined whether our 10-gene immune-related signature could predict the survival of patient subgroups. We stratified patients on the basis of clinicopathological factors including age, gender, smoking, alcohol, HPV, TNM stage, recurrence, and M0 stage. The results revealed that our signature might be an independent and significant prognostic predictor for clinical outcome in patient subgroups (Fig. 6). The patients with M1 stage disease or non-smoking status were not included because of the small number of cases.
Association between immnue-related gene signature, immune cell infiltration, and immune checkpoint molecules

Genetic Alterations And Gsea Analysis In The High-risk Groups
The genomic alterations of our 10 immune-related genes in each patient were analyzed using the cBioPortal tool (Fig. 8, A). The genetic alteration percentages ranged from 0.2-3%, and likely have little influence on mRNA expression. The 10 immune-related genes were altered in 62 (13%) of the 496 patients. DEFB1 and GNRH1 were primarily affected by deep deletion, while the CTLA-4, DKK1 and EDNRB were frequently amplified. The pathways enriched in the high-risk group of the training set were analyzed by GSEA. As a result, there were six pathways that were significantly enriched in the high-risk patients ( Table 2, Fig. 8, B).

Discussion
Head and neck squamous cell carcinoma is considered to be a heterogeneous disease and its biological behavior is frequently aggressive. The high mortality rate (40-50%) observed for HNSCC is primarily due to the frequent recurrence of advanced tumors [21]. There is a significant need to improve the survival rate as TNM stage is no longer an accurate prognostic indicator. Therefore, it is crucial to identify new markers that predict clinical outcome, achieve personalized approaches to therapy, and establish early intervention treatments. To date, many studies have tried to establish prognostic signatures, including gene sets [22], miRNAs [23], lncRNAs [24] and methylation analyses [25], as promising predictors of prognosis for HNSCC. In recent years, the immune system has been recognized as playing an important role in cancer development and progression [26].
Nevertheless, the contribution of immune-related molecular mechanisms to HNSCC remain unclear.
In the present study, we screened 400 differentially expressed immune-related genes from the TCGA dataset, which consists of data from 502 cases of HNSCC and 44 normal tissues. Using GO and KEGG enrichment analysis, we found that the immune-related genes are primarily associated with immune response, cancer, and drug resistance pathways (i.e. MAPK signaling pathway, EGFR tyrosine kinase inhibitor resistance, Ras signaling pathway and endocrine resistance). Similar studies have demonstrated that the MAPK signaling pathway participates in cancer progression, including proliferation, apoptosis and immune escape, and it is fundamental to cancer control [27].
Transcription factors regulate gene expression and their dysregulation or mutation is well known to contribute to tumorigenesis [28]. Foxp3, a member of the forkhead transcription factor family, is one of the key transcription factors that controls the development and function of Treg cells [29]. An analysis of the TF-mediated network was done to reveal the regulatory mechanisms of prognostic immune-related genes. The results indicated that Foxp3 was a crucial TF that upregulated most of the low-risk prognostic immune-related genes. This suggests that Foxp3 may be a key factor in the immune regulatory mechanism of HNSCC. Foxp3 may also control the immune microenvironment by regulating the expression of genes that contribute to the immunotherapy of HNSCC. Foxp3 and CTLA4 were also determined to be positively correlated in our study, consistent with that of previous studies [30,31].
It has been demonstrated that immune gene signatures can predict prognosis in many solid tumors including ovarian cancer [32], clear cell renal cell carcinoma, cervical cancer [33], lung adenocarcinoma [35], and hepatocellular carcinoma [36]. Recently, a 27-gene immune-related prognostic model was generated to predict the prognosis of HNSCC, but it did not analyze the relationship of immune checkpoint and was not clinically applicable as too much genes were measured. In the present study, we developed a prognostic signature based on 10 immune-related genes from TCGA datasets and validated them using GEO datasets. The patients in the high risk group were considered to have short survival times in both datasets, in accordance with previous studies [15]. ROC analysis indicated that our immune signature exhibited better sensitivity and specificity for survival prediction at 2-, 3-and 5-years, even exceeding the predictive ability of TNM stage. Moreover, multivariate Cox analysis indicated that the 10-gene immune-related signature is an independent prognosis risk factor for HNSCC. Patients exhibiting recurrence had higher risk scores compared with patients without recurrence, suggesting that the prognostic signature had a broader predictive value for recurrence. A previous study demonstrated that HPV-positive HNSCC patients have improved survival compared with HPV-negative patients [37]. This is consistent with our findings that HPV-positive patients had a lower risk score. We further predicted survival time of patients with different clinical factors based on our 10 immune-related gene signature. This demonstrates that our signature can act as accurately and strongly biomarkers for predicting prognosis in HNSCC patients with various clinicopathologic factors.
Other studies corroborate that our 10 immune-related gene signature is closely related to the development of cancer. For example, DEFB1 (encoding human ß-defensin-1 [HBD-1]), is a potential tumor suppressor which has been shown to participate in the innate immune response and can suppress tumor migration and invasion in oral squamous cell carcinoma [38]. EDNRB promoter methylation, which is associated with the histologic diagnosis of premalignancy and the presence of malignancy, may be a promising marker for the early detection of premalignant lesions in oral cavity cancer [39]. The ADM gene, which plays a role in carcinogenesis by regulating cellular processes including proliferation, differentiation, migration, growth, immunosuppression and hypoxia, increases lymph node metastasis risk in oral and oropharynx cancer [40]. Increased BTC mRNA expression is associated with worse survival in HNSCC [41]. Dkk-1, a tumor suppressor gene, contributes to better prognosis for HNSCC patients upon loss of its locus [42]. Overexpression of FAM3D-AS1 was demonstrated to inhibit cell proliferation, invasion, EMT, and cell survival rate in colorectal cancer [43].
GNRH1 has been shown to promote cell proliferation and apoptosis in ovarian and endometrial carcinoma [44]. The downregulation of STC2 plays a vital role in the metastasis and progression of HNSCC [44]. TNFRSF12A contributes to carcinogenesis by promoting angiogenesis, proliferation, apoptosis, migration and inflammation in tumors, can cause cachexia, and is a promising therapeutic target to prolong survival [46]. CTLA-4, a checkpoint for tumor immunotherapy, can induce T cells to be nonreactive and participate in the repression of T cell proliferation, cell cycle progression, and the immune response [47].
To further understand the immune-related prognostic signature and immune microenvironment relationship, 22 immune infiltrating cells, derived from CIBERSORT, and 3 immune checkpoint molecules were selected for analysis. In cancer, Tregs contribute to tumor immune escape by suppressing the antitumor response. Some studies had reported that higher Foxp3 + Treg cell infiltration was associated with poorer patient survival in most tumors, but not in HNSCC [48,49]. Our results show that the high level of Tregs infiltration is significantly associated with the low risk score, which is associated with favorable prognosis. Macrophages that infiltrate into the tumor microenvironment may facilitate tumor growth, angiogenesis, invasion, and metastasis, and are associated with poor prognosis in HNSCC [50]. Our study indicated that they are positively associated with risk score and an increase in M0 macrophages portends a poor prognosis.
Recently, the immunocheckpoints involving PD-1, PD-L1, and CTLA-4 represent promising immunotherapy targets for antitumor therapy [51]. PD-1 is a transmembrane protein that is mainly expressed on the surface of T lymphocytes. CTLA-4 is a membrane glycoprotein that is frequently expressed on Tregs. The mechanism of action of CTLA-4 and PD-1 remain controversial. Our study revealed that high CLTA-4 expression was significantly associated with a lower risk score. This may be caused by the upregulation of Tregs in HNSCC, suggesting that CTLA-4 may be involved in some aspect of the antitumor effect. Studies have reported that HNSCC patients exhibiting high PD-L1/PD-1 expression tend to have prolonged survival outcomes and a lower probability of recurrence [52,53]. In addition, our signature was negatively correlated with PD-1, but not PD-L1.
Studies have found that the genetic variation of HBD-1 contributes to lower RNA expression and may be involved in carcinogenesis of oral squamous cell carcinoma [54]. The genetic alterations of our 10 immune-related genes may help explain the aberrant expression of these genes to some extent in tumors, and patients that carry such genetic alterations may be more responsive to immunotherapy.
Meanwhile, our GSEA results indicated that six enriched pathways in the high-risk immune group were significantly correlated with the biological processes in HNSCC progression. For example, the pathway of focal adhesion in HNSCC participates in the development of distant metastasis to lymph nodes [55].
The keratan sulphate in the tumor environment plays a vital role in the promotion or regulation of tumor development [56]. The chondroitin sulfate glycosaminoglycan biosynthesis pathway can promote and regulate tumor progression and metastasis by influencing important biological processes such as cell growth, adhesion, signal transduction, and lipid metabolism [57].
The above results demonstrate that our signature has potential clinical prognostic value and is associated with response regulated by the immune microenvironment. This may provide a potential target for diagnosis and for development of new targeted therapies.
There are several limitations to our study. First of all, our study was performed using public databases and has a retrospective design, so further studies should be done with prospective clinical data sets to validate our results. Second, although we established and verified our model using different gene expression databases, concerns regarding sample bias and model over-fitting still remain.
Furthermore, the biological functions of our immune signature should be further validated in biological experiments.

Conclusion
In conclusion, the 10 immune-related gene signature that we developed for HNSCC patient survival and outcome prediction represents a promising prognostic biomarker compared with TNM stage.    The association between our prognostic signature and HPV, T stage, TNM stage and recurrence. A The HPV-positive patients exhibited a lower risk score compared with HPVnegative patients (p = 3.719 x 10-5). B The immune-related signature risk scores for T3&4 stage HNSCC patients were notably higher compared with that of T1&2 patients (p = 0.006).
C The immune-related signature risk scores for stage III&IV HNSCC patients were notably higher compared with that of stage I&II patients (P = 0.002). D Patients with recurrence of HNSCC have higher risk scores compared with those having no recurrence (p = 0.020).   Correlation between our prognostic signature and that of tumor-infiltrating immune cells and immune checkpoint molecules. A Association between risk score and T regulatory cells (Tregs). B Association between risk score and activated Mast cells. C Association between risk score and M0 macrophages. D Association between risk score and CTLA-4. E Association between risk score and PD-1. F Association between risk score and pd-l1.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download. Additionalfile1.doc