Prognostic model of head and neck squamous cell carcinoma based on 12 ferroptosis-related genes

Background: The purpose of our study is establishing a model based on ferroptosis-related genes predicting the prognosis of patients with head and neck squamous cell carcinoma (HNSCC). Methods: In our study, transcriptome and clinical data of HNSCC patients were from The Cancer Genome Atlas, ferroptosis-related genes and pathways were from Ferroptosis Signatures Database. Differentially expressed genes (DEGs) were screened by comparing tumor and adjacent normal tissues. Functional enrichment analysis of DEGs, protein-protein interaction network and gene mutation examination were applied. Univariate Cox regression analysis and least absolute shrinkage and selection operator (LASSO) regression were used to identied DEGs. The model was constructed by multivariate Cox regression analysis and veried by Kaplan-Meier analysis. The relationship between risk scores and other clinical features was also analyzed. Univariate and multivariate Cox analysis was used to veried the independence of our model. The model was evaluated by receiver operating characteristic analysis and calculation of the area under the curve (AUC). A nomogram model based on risk score, age, gender and TNM stages was constructed. Results: We analyzed data including 500 tumor tissues and 44 adjacent normal tissues and 259 ferroptosis-related genes, then obtained 73 DEGs. Univariate Cox regression analysis screened out 16 genes related to overall survival, and LASSO analysis ngered out 12 of them with prognostic value. A risk score model based on these 12 genes was constructed by multivariate Cox regression analysis. According to the median risk score, patients were divided into high-risk group and low-risk group. The survival rate of high-risk group was signicantly lower than that of low-risk group in Kaplan-Meier curve. Risk scores were related to T and grade. Univariate and multivariate Cox analysis showed our model was an independent prognostic factor. The AUC was 0.669. The nomogram showed high accuracy predicting the prognosis of HNSCC patients. Conclusion: Our model based on 12 ferroptosis-related genes performed excellently in predicting the prognosis of HNSCC patients. Ferroptosis-related genes may be promising biomarkers for HNSCC treatment and prognosis.


Introduction
HNSCC is the sixth most common cancer worldwide and has been showing an increase in incidence in recent years [1]. In 2018, more than 500,000 cases of HNSCC were reported worldwide, with 380,000 yearly deaths [2]. HNSCC is a highly heterogeneous disease, with approximately 60 precents of patients being diagnosed in advanced stages [3]. These patients have a poor prognosis and no effective clinical treatments are currently available [4]. A major di culty related to HNSCC is the lack of early diagnostic and predictive biomarkers [5]. Accordingly, biomarkers for HNSCC are urgently needed for an early identi cation, prognosis and treatment of this condition.
Ferroptosis is an iron-dependent and oxidative form of regulated cell death (RCD) [6]. It is induced through the production of reactive oxygen species (ROS) in cascade reactions of free unstable iron atoms with lipid peroxides [7]. An excessive accumulation of ROS is toxic and related to the damage of membrane structures and other cell-damaging reactions including oxidative stress, eventually leading to cell death [8]. Oxidative stress can induce DNA damage, impacts cellular signaling pathways and the development of tumor vessels, all of which contribute to the formation, proliferation and metastasis of cancers [9]. Promoting oxidative stress within cancer cells can lead to their death or apoptosis and, in this way, can be applied in the treatment of cancers [10]. Similarly, ferroptosis can be used in the treatment of cancers due to its capacity to activate oxidative stress. Many types of proteins are involved with processes related to the activation of ferroptosis [11], including iron metabolism and two key inhibitory processes [7]. One of these inhibitory processes involves a reduction in cystine transfer into cells by the cystine/glutamate antiporter system (Xc-) [12]. This process is related to the production of GSH [13], with low or absent levels of GSH inducing oxidative stress. The other inhibitory process involves a reduction in glutathione peroxidase 4 (GPX4) [6], which is related to lipid peroxidation [14]. Ferroptosis-related proteins play an important role in cancers and have been used as tumor markers [15]. As one example, GPX4 serves as a prognostic marker in patients with PDAC [16]. It has been suggested that other ferroptosisrelated proteins may function as potential tumor markers.
Ferroptosis-related proteins have been shown to play an important role in HNSCC, in particular through their effects on GSH. As a key protein involved in ferroptosis, GSH levels are signi cantly altered in HNSCC and this imbalance in GSH may indicate a metastasis of HNSCC. Accordingly, changes in GSH levels have proved to serve as a prognostic factor for the diagnosis of HNSCC in its early stages [17].
Such results suggest that the genes regulating ferroptosis-related proteins can function as prognostic markers of HNSCC. As one example, SLC7A11 which impacts GSH by regulating cystine and affects amino acids in cancer cells, may provide a potential biomarker for HNSCC treatment [18]. Currently, research directed at constructing prognostic models for HNSCC have been based on single ferroptosisrelated genes, such as the model based on CDGSH iron sulfur domain2 (CISD2) [19]. However, levels of the sensitivity and speci city associated with a single tumor marker are low. In contrast, multigene models have the potential of resolving these issues and therefore may prove more effective in identifying cancer stages and prognosis. The value of this approach has been revealed in predicting the prognosis of other cancers, especially in that of the prognosis model for HCC as based on ferroptosis-related genes [20]. Construction of a prognostic model for HNSCC as based on ferroptosis-related genes would be a decidedly warranted endeavor. Therefore, our goal in this study was to establish a prognostic model of HNSCC based on ferroptosis-related genes. Such a model would then afford a new and reliable prognostic marker for the treatment of HNSCC.

Data capturing
The RNA sequencing data for HNSCC patients including 500 tumor tissues and 44 adjacent normal tissues were downloaded from The Cancer Genome Atlas (TCGA; https://cancergenome.nih.gov ) as well as corresponding clinical data. The data of ferroptosis-related genes was downloaded from the FerrDb (http://www.zhounan.org/ferrdb/ ).
The study need not be authorized by ethics committee, since it is of bioinformatics.

Ferroptosis-related DEGs analysis
The differential expressed genes (DEGs) were identi ed by the R package with the criteria of | log2FC | > 0.5 and false discovery rate (FDR) < 0.05. Then, the expression levels of the DEGs in multiple samples were visualized by volcano plot and heatmap.

Functional enrichment analysis
Gene ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were performed by "cluster pro ler" R package for the functions of the DEGs. GO enrichment was carried out mainly from the following three levels: cellular components (CC), biological processes (BP), and molecular functions (MF), while KEGG analysis was about metabolic pathways and molecular mechanisms.

Construction of Protein-Protein Interaction (PPI) network
To further explore the relationship and interaction among ferroptosis-related DEGs, a PPI network was built by STRING (https://string-db.org/ ), and Cytoscape software (version 3.6.0) was used for an intuitive understanding of PPI.

Examination of gene mutations
For a further exploring of the connection between HNSCC and mutations of DEGs, gene mutation examination was needed. After downloading the exon data set of TCGA samples and extracting mutation data, 224 samples from TCGA was used to detect the mutation frequency of DEGs and analyze their translational effects. Waterfall chart was drawn to display the result. The analysis of gene mutation was based on Cancer Genomics Research Center (http://www.cbioportal.org/ ).

Signature establishment and veri cation
Univariate Cox regression analysis was used to identi ed the ferroptosis-related genes associated with overall survival (OS) by the "limma" R package. The LASSO algorithm was used to identi ed the ferroptosis-related genes with prognostic values by the "glmnet" R package. The risk score was evaluated by the coe cient of each prognostic ferroptosis-related gene got from multivariate Cox regression analysis. The risk scoring formula was constructed as Risk scores = Risk score = β gene(1) × expression gene(1) + β gene(2) × expression gene(2) + ···+ β gene(n) × expression gene(n). After patients were divided into high-risk and low-risk groups with the median risk score as the critical point, the Kaplan-Meier curve was performed to describe the relationship between survival probability and survival time for high-risk group and low-risk group. The analysis including distribution of risk score, risk-related survival status and prognosis DEGs heatmap were also involved. Then the relationships among the risk score and clinical parameters were analyzed. Univariate Cox regression and multivariate Cox regression were used to identi ed the independent prognostic factors among risk score, age, gender, grade, stage and TNM. ROC analysis and calculated the area under the curve (AUC) were performed for evaluating the predictive performance of risk score.

Nomogram construction
To provide doctors and patients with a quantitative method for individual survival prediction, the prognostic nomogram model was constructed based on the regression algorithm combining age, gender and TNM stage with risk score. The consistency between the predicted results of the model and the actual results was appraised by calibration curves.
Statistical analysis R language was applied in the TCGA database analysis. ROC curve and area under ROC curve were executed by "survival ROC" package in R software. R language (version 3.6.3) was used for all statistical analysis in this study. All statistical tests were bilateral tests, with statistical signi cance of P < 0.05.

Identi cation ferroptosis-related DEGs in HNSCC
The RNA sequencing data and clinical data of HNSCC patients were obtained from TCGA and 259 ferroptosis-related genes were downloaded from FerrDb. 73 DEGs were identi ed between tumor tissues and normal tissues with 61 up-regulated genes and 12 down-regulated genes. (Figs,1A, 1B, 1C) Functional enrichment of ferroptosis-related DEGs GO and KEGG analyses was performed in 73 DEGs to gain a further research about their biological functions and pathway and the rst 10 biological processes of them respectively were summarized. The related biological processes were mainly involved in response to oxidative stress, multicellular organismal homeostasis, cellular response to oxidative stress, carboxylic acid biosynthetic process, organic acid biosynthetic process, reactive oxygen species metabolic process, response to metal ion, peptidyl-serine phosphorylation, peptidyl-serine modi cation and superoxide anion generation. (Fig. 2A) The results of GO analysis about cell components and molecular function were also shown. (Fig. 2B,2C) In addition, the pathways in relation to DEGs were involved in HIF-1 signaling pathway, mTOR signaling pathway, AGE-RAGE signaling pathway in diabetic complications, cysteine and methionine metabolism, ferroptosis, necroptosis, uid shear stress and atherosclerosis, microRNAs in cancer, central carbon metabolism in cancer, pancreatic cancer by KEGG analysis. (Fig. 2.D) The PPI network was constructed for a further exploring about the interactions among 73 DEGs. (Fig. 3A) There were 67 remaining DEGs after removing unconnected nodes. (Fig. 3B) The interactions among some basic genes were shown. (Fig. 3C)

Gene mutation analyses
The gene mutations of the genes were examined for the connection between HNSCC and mutations of 73 ferroptosis-related DEGs and nonsense mutation and missense mutation were the two most common types. There were 10 genes with mutation rate ≥ 5%, among which CDKN2A was the gene with the highest mutation rate. (Fig. 4) Construction and assessment of the ferroptosis related DEGs prognostic signature 16 ferroptosis-related genes related to OS were chosen following the criteria of P value less than 0.05 by univariate Cox regression analysis. (Fig. 5) LASSO analysis was employed to shrink and choose prognosis-related ferroptosis genes to improve the accuracy. 12 genes involved prognosis were further identi ed from the 16 genes selected before by lambda method to construct the prognostic signature of HNSCC. (Fig. 6A, B). The risk score was calculated based on their expression level and associated Cox regression coe cient based on multivariate Cox regression. The patients with HNSCC were divided into high-risk and low-risk groups based on median risk score. In order to predict the clinical outcome of HNSC patients, the distribution of different risk groups (Fig. 7A) and the survival rate of patients (Fig. 7B) were de ned according to the risk scores. The levels of expression of 12 ferroptosis-related genes in both high-risk and low-risk groups were shown in heatmap (Fig. 7C). It was shown by the Kaplan-Meier log-rank test that patients in the high-risk group had a worse OS than those in the low-risk group (Fig. 7D).
Subsequently, the relationships among risk scores and other clinical parameters for the clinical signi cance of risk scores in HNSCC were analyzed. As a result, increasing risk scores were associated with higher T stages (P = 0.007) (Fig. 8A) and progressive tumor grades (P = 0.023) (Fig. 8B). It was con rmed by univariate (Fig. 9A) and multivariate (Fig. 9B) COX regression analysis that the risk score was an independent prognostic predictor for the OS of patients with HNSCC (HR = 1.071, P < 0.001). ROC analysis was used to evaluate the predictive performance of the risk score, and the AUC of risk score was 0.669 (Fig. 9C), which was higher than that of other risk factors such as gender, age, and TNM stage.

Constructing a predictive nomogram
After statistical analysis, ve independent clinical variables were chosen to combine with risk scores to build a comprehensive model for monitoring progression in HNSCC, including age, gender, T stages, M stages and N stages. According to LASSO logistic regression algorithm, it was shown in our nomogram that the survival rate of HNSCC individually was constructed based on these ve clinical indicators and risk scores. (Fig. 10A) Moreover, the nomogram was used to predict the 1-year, 3-year and 5-year OS of patients (n = 447). The calibration curves were used to describe the good curve t between 1-year, 3-year and 5-year progress events and actual observations. (Fig. 10B, C, D)

Discussion
The results of our study provide the rst evidence demonstrating the feasibility of combining ferroptosisrelated genes in a multigene model to predict the prognosis of HNSCC.
To accomplish this goal, 73 DEGs were obtained by identifying differentially expressed ferroptosis-related genes between tumor and normal tissues. To further verify the potential role of ferroptosis-related DEGs, GO and KEGG enrichment analyses were performed. Results of these analyses showed that these genes were abundantly expressed in processes involving oxidative stress, superoxide anion generation, reactive oxygen species metabolic processes, cellular responses to oxidative stress and peptidyl-serine phosphorylation, thus indicating that these 73 DEGs may play an important role in HNSCC. Oxidative stress is induced by excess formation of ROS, including superoxide anion; and, the generation of superoxide anions and ROS metabolic processes are associated with the proliferation, differentiation and growth of HNSCC [21,22]. In addition, oxidative stress, which is associated with smoking [23] and HPV infection [24], also represents a source for HNSCC [25,26]. Peptidyl-serine phosphorylation, like Akt phosphorylation, is sensitive to oxidative stress by inactivating PRAS40 and removing mTORC1 [27]; and, Akt phosphorylation is an indispensable step in the carcinogenesis of HNSCC [28]. This Akt phosphorylation is closely related to the disease progression of HNSCC [29] as can be achieved with many downstream targets of key cellular processes like GSK-3β, CREB1 and TSC2 [30].
Among these 73 DEGs, 12 genes with prognostic value were screened to establish the risk scoring model. Several proteins with increased expression levels were accompanied with poor prognosis, such as SLC7A5, EGFR, CISD2 and ASNS. Levels of SLC7A5 were highly expressed in HNSCC [31] and are closely related to tumor size and grade, effects which constitute the Xc system in ferroptosis [32] and provide amino acids for promotion of cancer cell growth [33]. EGFR, which affects GSH through the phosphorylation of Akt [34], is related to the later stages, tumor size, invasion, decreased survival rates and poor prognosis of HNSCC [35,36]. Further evidence indicating a role for EGFR has been demonstrated in response to treatment with Cetuximab, an EGFR-targeted drug shown to have survival bene ts for metastatic HNSCC disease [4]. CISD2 encodes the NEET protein [37], NAF-1, which is related to tumor growth and metastasis by the iron ion [38]. CISD2 can be used as a molecular biomarker indicating a poor prognosis of HNSCC [19] due to its signi cant relationships with T stage, lymph node metastasis, clinical stage and disease progression of HNSCC. ASNS affects GSH synthesis and participates in tumor growth, with the protein related to ASNS being associated with metastasis and recurrence of HNSCC [39,40]. Based on results from previous studies, these 12 ferroptosis-related genes are closely related to the development and treatment of HNSCC and may thus serve as reliable markers in identifying potential therapeutic targets for HNSCC patients.
Based on our multigene model, signi cant differences in risk scores of head and neck cancer patients are present among different T-stage groups. These results suggests that our model correlates with T staging and can function as a reliable indicator in predicting tumor malignancy. Furthermore, our model has a better ability of predicting prognosis for HNSCC than that of TNM staging, with increased AUC (Area Under Curve) values being obtained with ROC analysis. In addition, our model shows a better prognostic capacity based on AUC values as compared with previously published models based on TCGA HNSCC database analysis [41][42][43]. The enhanced ability of our model in evaluating the prognosis of HNSCC should substantially improve the clinical treatment of this condition.
Two notable limitations present in our study include: 1) the retrospective nature of the study, which will require data from prospective studies to verify the prognostic ability of the model and substantiate our ndings and 2) the bioinformatics analysis performed fails to reveal potential functional mechanisms that may be involved with this model.

Conclusions
In a word, the prediction model we built for the prognosis of HNSCC patients, is based on 12 ferroptosisrelated genes and clinical characters of patients. And it can provide a reliable prediction about OS to patients with HNSCC. It provides new markers for the treatment of HNSCC.

Declarations
Ethics approval and consent to participate Not applicable.

Consent for publication
Not applicable.

Availability of data and material
Data availability could be obtained from the TCGA website.

Competing interests
The authors declare that they have no competing interests.     Gene mutation analyses. A total of 10 genes had the mutation rate more than 5%, and CDKN2A had the highest mutation rate in DEGs, which was 41%.

Figure 5
Identi cation of DEGs about overall survival rate. There were 16 DEGs related to OS in total in the forest map by the univariate Cox regression. Up-regulated genes were red and down-regulated genes were green.    Evaluation of the prediction performance based on the risk scores and clinical features. (A) The relationships among OS and risk scores and clinical features, which involved in age, gender, grade, stage and TNM, were shown in the forest map by the univariate Cox analysis. (B) It was proved that the 12-gene signature was an independent prognostic indicator for HNSCC patients by the multivariate Cox analysis.
(C) AUC of the model was 0.669 shown in the ROC curve.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. LiTablesSupplementalDataforFig7.exel.xlsx