Signatures Based On Tumor Microenvironment Genes Can Reliably Predict The Prognosis of Laryngeal Cancer Patients

Purpose: To develop a tumor microenvironment (TME) related genes based prognostic model for laryngeal cancer patients risk prediction. Methods: A innovative prognosic model was generated based on TME related genes (760 genes). Based on the model, laryngeal cancer patients were categoried into high and low-risk group. The immune status including immune cell inltration ratio as well as checkpoints have been exploreds. Results: It can be shown here 15 genes demonstrate signicant differences, which can better predict laryngeal cancer patient prognosis. The accuracy of the model prediction is evaluated and approved by the AUC value. From the immune cell inltration ratio analysis, there is a signicant difference in the inltration degree of several types of immune cells and 6 immune checkpoints between high and low-risk laryngeal cancer patients. At the same time, the close related genes as well as TME pathways have been also investigated. Conclusion: This study has explored a potential prognostic biomarker and developed a novel TME-associated prognostic model for laryngeal cancer, which provides a valuable reference for future clinical research.


Introduction
The incidence of laryngeal cancer accounts for about 1 to 5% of human systemic tumors [1] . In the eld of otolaryngology, it is second only to nasopharyngeal cancer and nasal cavity as well as sinus cancer [2] .
Laryngeal cancer occurs more commonly in men than in women (5.8 cases per 100,000 vs 1.2 per 100,000, respectively) [3] . The incidence of laryngeal caner has a male to female ratio of 5:1. With the decrease in tobacco use, the incidence of laryngeal cancer has been decreasing 2.4% each year for the last 10 years [4] . However, the 5-year survival rate of 60.9% has changed little over the past several years.
The major risk factor for laryngeal cancer is tobacco [5] . In addition to tobacco, there are still some other factors that can induce the development of laryngeal cancer Laryngopharyngeal re ux, human papillomavirus infection, environmental or occupational exposures, and alcohol are some other major contributors to laryngeal cancer development [6][7][8] . Treatment for laryngeal cancer has evolved from radical resections and intensive radiation or chemoradiation. Despite signi cant advances in early diagnosis and multidisciplinary cancer treatment, the long-term prognosis is still far rom sati cication [9] . Therefore, it is urgent to identify the sensitive and speci c molecular markers in patients with laryngeal cancer.
Recent studies have shown that the tumor microenvironment (TME), composed of a variety of immune cells and stromal cells, is critical in the occurrence and development of cancer [10] . The multiple immune cells, according to the innate and adaptive systems, as well as other cytokines and signaling molecules, contribute to the tumor microenvironment. Although some immune cells possess antitumoral properties, others function to suppress antitumor immunity and are manipulated by the tumor to evade the immune system [11] . Increasing evidence indicates that innate and adaptive immune mediators in the tumor microenvironment play an important role in tumor progression and development of laryngeal carcinomas [12] . However, the relationship between TME-related genes and the prognosis of laryngeal cancer patients has rarely been reported.
Based on the fact of poor accuracy of prognosis, the laryngeal cancer has brought great di culty to clinical treatment. Since TME has attracted more and more attention in the cancer research currently, a more comprehensive approach to build a TME-associated prognostic model for laryngeal cancer is required. To address these issues, in this study, we analyzed the expression of 760 tumor microenvironment-related genes in laryngeal cancer patients and their prognostic value in laryngeal cancer patients. Moreover, the 22 speci c immune cell in ltration as well as 6 immune checkpoints in laryngeal cancer have been also studied comprehensively. All of these promising outcomes enriched the precise prognosis of the disease, which provide tremendous help for future laryngeal cancer study.

Data Source
We downloaded the gene information pro le and corresponding clinical information of 513 head and neck squamous cell carcinoma (HNSC) patients from the database: The Cancer Genome Atlas (TCGA,Https://tcga-data.nci.nih.gov/tcga/). Of which, 180 of these 513 patients were laryngeal cancer patients. Excluding samples with incomplete survival information, 169 laryngeal cancer patients were used for the following analysis.
In addition, we also downloaded a data set from the Gene Expression Omnibus (Geo https://www.ncbi.nlm.nih.gov/GEO/) database, GSE27020, which contains 109 laryngeal cancer patients with complete survival information. Samples in the GEO Dataset were examined using the Affymetrix Human Genome U133A Array platform.

LASSO Cox regression analysis
Based on the expression values of 760 TME related genes in laryngeal carcinoma samples, a single factor Cox regression analysis for prognosis of laryngeal carcinoma was developed, with a threshold of P < 0.01. Then the glmnet package of R language was generated for LASSO Cox regression analysis in order to further select the genes associated with the prognosis of laryngeal carcinoma. The selected genes were established to calculate the risk score for each sample by the following formula: where Coe is the risk coe cient of each factor calculated by the LASSO-Cox model; Xi is the expression value of each factor, in this study refers to the expression value of the gene. Then the patients were divided into high-risk group and low-risk group according to the median of the risk score.

Survival analysis
The survival analysis was performed using the R language survival package and survminer (https://CRAN.R-project.org/package=survminer). The package estimates the overall survival rate of different TCGA groups based on the Kaplan-Meier method, and uses log-rank test to investigate the signi cance of survival difference between different groups. Calculation of immune cell in ltration ratio The software CIBERSORT was utilized to calculate the relative proportion of 22 immune cells in HCC patients [13] . The CICERSORT software could use the deconvolution algorithm with the preset 547 barcodes to characterize the composition of immune in ltrating cells according to the gene expression matrix. The sum of the proportions of all estimated immune cell types in each sample is equal to 1.

Establishment of Nomogram prognostic prediction model
Nomogram is widely used to predict the prognosis of cancer. In order to predict the survival probability of patients at 1 year, 3 years as well as 5 years, we preformed the R language RMS package to establish the prediction model based on all independent prognostic factors determined by multi-factor Cox regression analysis nomogram, draw the calibration curve of nomogram, and analyze the relationship between the predicted probability of Nomogram and the actual incidence rate. Differential gene analysis The differentially expressed genes analysis were based on the limma function package [14] of the R language (version3.5.2), with the difference factor greater than 1.5 times and FDR ≤ 0.05 as a criteria. Functional enrichment analysis For the obtained differentially expressed genes, we use the "clusterPro ler" [15] function package in the R language to perform Gene Ontology (GO, including Biological Process, Molecular Function and Cellular Component) terms and Kyoto Encyclopedia of Genes and Enrichment analysis of Genomes (KEGG) pathways. The signi cantly enriched GO term and KEGG pathway were further investigated using the Pvalue < 0.05 corrected by the "BH" method as the standard.

Statistical analysis
The multi-factor Cox regression model was built to analyze whether risk score could predict the survival of patients with laryngeal carcinoma independently of all other factors. The Wilcoxon signed rank sum test method was used to compare the differences of immune cell in ltration in different groups, with P < 0.05 as the signi cant threshold. The statistical analysis was established by R software, with version number v3.5.2.

Results
Construction and veri cation of the prognosis model The single-factor cox regression analysis was developed by the TCGA data set samples with 760 TMErelated gene expression values as continuous variables. The P -value < 0.01 was used as a threshold. 50 genes were nally screent out. Protective genes with HR value less than 1 are favorable for prognosis, while risk genes with HR value greater than 1 are unfavorable for prognosis. 2 of the 50 genes are protective genes, and the remaining 48 genes genes are risk genes. The forest diagram of the top 20 genes with the smallest P-value among the 50 genes is shown in Fig. 1A.
After that, the 50 selected genes were subjected to LASSO Cox regression analysis. According to the lambda values of the LASSO Cox analysis, we have determined that the optimal number of genes is 15 (Fig. 1B, whereas the lambda value is the smallest). The 15 genes are BCAT1, RANBP1, CORO2B, RASIP1, EZH2, TCEAL4, FST, CD63, GEMIN7, SPOCK1, S100A11, CXCL3, RUVBL2, ETV5 and EID1. Then the gene expression was weighted with the regression coe cients of the LASSO Cox regression analysis to establish a risk score model for predicting patient survival, according to Risk Score = (0.1459482*GABARAPL2)+(0.1549568*SAR1A)+(0.1554117*ST13)+(0.1892597 *GAPDH)+ (0.1121593*FADD)+(0.0857282*LAMP1). We calculated the risk score of each patient, and divided the samples of the TCGA data set and GEO data set into high-risk groups and low-risk groups according to the median of the risk scores. Based on the survival analysis, we found that in the TCGA data set and GEO validation set, high-risk laryngeal cancer samples have poor overall survival compared with lowerrisk group (Fig. 1C).
In addition, it can be demonstrated from the time-dependent ROC that the AUC of the 1-year, 3-year, and 5year survival period of the TCGA data set are 0.7478, 0.8688 and 0.7829 respectively (Fig. 1D), indicating the risk model can effectively predict the prognosis of patients with laryngeal cancer. At the same time, we found that the 15 genes were differentially expressed between the high and low-risk groups (Fig. 1E). In general, the risk score constructed by 15 TME-related genes: BCAT1, RANBP1, CORO2B, RASIP1, EZH2, TCEAL4, FST, CD63, GEMIN7, SPOCK1, S100A11, CXCL3, RUVBL2, ETV5, and EID1 can better predict the prognosis of patients with laryngeal cancer. Risk score is an independent prognostic marker of HCC We included age, sex, TNM stage, HPV index and risk score in multivariate Cox regression analysis to determine whether the risk score was an independent prognostic indicator. The results are shown in Fig. 2A. It was found that risk score was signi cantly associated with overall survival, and the samples with high risk score had a higher risk of death and were unfavorable for prognosis (HR = 5.50, 95% CI : 3.54-8.6,P < 0.001).
In order to further explore the prognostic value of risk score in laryngeal cancer patients with different clinicopathological factors (including age and TNM stage), we regrouped patients, performing a Kaplan-Meier survival analysis. It could be demonstrated that the overall survival rate of the high-risk group is signi cantly lower than that of the low-risk group of the samples in different ages and stages (Fig. 2B-2C). These results con rm that the risk score can be used as an independent indicator to predict the prognosis for HCC patients.

Nomogram model can better predict the prognosis of HCC patients
We use four independent prognostic factors: age, gender, radiotherapy status as well as risk score to construct the nomogram model (Fig. 3A). For each patient, draw three lines upwards to determine the points obtained from each factor in the Nomogram. The "Total Points" axis is determined by the sum of these points, of which draw a line to generate the probability of HCC patients surviving 1, 3, and 5 years. The one-year and two-year corrected curves in the calibration chart are relatively close to the ideal curves (the 45-degree line that passes through the origin of the coordinate axis with a slope of 1), indicating that the predicted results of the model in one, three, and ve years are in good agreement with the actual results ( Fig. 3B-3D). Differential gene analysis and functional enrichment results of patients with laryngeal cancer patients In order to further investigate the differences of gene expression between high-risk and low-risk groups, we conducted a differential gene analysis and a functional enrichment analysis between the two sets of samples. Compared to the low-risk group, a total of 673 genes display a speci c expression manner, including 583 up-regulated genes and 90 down-regulated genes (Fig. 4A).
We performed GO and KEGG enrichment analysis for these 673 genes. It was found that these 673 genes were signi cantly enriched in GO terms such as extracellular matrix organization and KEGG Pathway such as ECM-receptor interaction. The top 20 most signi cantly enriched genes and pathways are shown in Fig. 4B and 4C. Immune status of laryngeal cancer patients in high and low-risk groups We used the CIBERSORT method combined with LM22 feature matrix to estimate the difference of immune in ltration between 22 immune cells in high-risk and low-risk groups of patients with laryngeal cancer. Figure 5A summarized the results of immune cell in ltration in 169 laryngeal cancer patients. There are signi cant differences in the in ltration ratios of 6 types of immune cells, such as B.cells.naive, between high and low-risk groups (Fig. 5B).
Since the expression of immune checkpoints has become a biomarker of immunotherapy for laryngeal cancer patients, we analyzed the correlation between patient risk score and key immune checkpoints (CTLA4, PDL1, LAG3, TIGIT, IDO1, TDO2). It could be seen that the risk score is closely associated with all the 6 checkpoints, which demonstrate the signi cant differences between the high and low-risk groups of laryngeal cancer patients (Fig. 5C).

Discussion
Laryngeal cancers represent one-third of all head and neck cancers and maybe a signi cant source of morbidity and mortality. They are most often diagnosed in patients with signi cant smoking history [16] . Currently, researchers have been trying to nd breakthroughs in the prevention, early screening, diagnosis and treatment of laryngeal cancer, and have made many progress in these areas, but the prediction of its prognosis is still inferior [17] . In this study, we established a innovative prognotic analysis based on TMErelated gene expression. Among 760 TME-related genes, 50 of them display speci c expression. Interestingly, majority of the differentially expressed genes are risk genes. The tumor microenvironment is established by a wide array of cells from both adaptive and innate immune systems. Based on their phenotypical and functional characteristics, these cells can be subclassi ed as antitumoral or protumoral [18] . Based on our results, the protumoral may play a key function in the laryngeal cancer development as well as progress. These cells promote tumor cell immune evasion by dampening the immune response and generating immune tolerance [19] . As previously suggested, they include regulatory T cells (Tregs), myeloid-derived suppressor cells (MDSC) and tumor-associated macrophages (TAM) [20] . This is consistent with our nding shown in Fig. 5A. These types of cells may be involved in tumor immune escape mechanisms. At the same time, there are evidences showing the increased amount of protumoral immune cells in the peripheral blood of laryngeal cancer patients compared with healthy donors [18] , Eventhough, the percentage of protumoral immune cells contribute the tumor size of laryngeal cancer [21] .
Here, we deeply investigate the immune cell in ltration in different groups of laryngeal cancer patients, where differences in the proportion of immune cell in ltration may be an intrinsic feature of individual differences receiving immunotherapy. In addition, the correlation between different types of immune cells is weak, which indicates that there is a large heterogeneity in the in ltration of different immune cells in tumor patients. Here, in this study, all the 6 immune checkpoints demonstrate signi cant differences in the high and low-risk groups. Immune checkpoints function through the regulation of the extent of T cells activation and to prevent excessive activation and autoimmunity [22] . These inhibitory pathways are critical in the tumor microenvironment and have been employed by tumor cells as a mechanism of immune evasion [23] . The relation between expression immune checkpoints in the peripheral blood and aggressive tumor growth in patients with laryngeal cancer has also been described by several studies [24,25] , indicating that the poor prognosis of laryngeal cancer patients with high risk may be due to the immunosuppressive microenvironment.
Here, using TME-related gene expression as an independent variable, we generated a risk score model aiming for laryngeal cancer prognostic prediction. The accuracy of the model prediction is evaluated and approved by the AUC value, which demonstrates not only the rationality of our method on the one hand but also the importance of the prediction model for future laryngeal cancer prognosis. In addition to the prognosis model establishment for laryngeal cancer, another feature of this study is to explore multiple key associated genes. BCAT stands for branched-chain amino-acid transaminase enzymes. It has been found that up-regulation of BCAT1 is associated with poor prognosis in numerous types of tumors, such as gastric cancer [26] . The RanBP1 protein, which binds Ran and regulates its interaction with effectors, is overexpressed in many cancer types [27] . Several observations indicate that RanBP1 contributes to regulate the function of the mitotic apparatus [28] . Enhancer of zeste homolog 2 (EZH2), function as a master regulator of chromatin, provides several molecular-based evidences in cancer therapy. Genetic, transcriptional, and posttranscriptional dysregulation of EZH2 is frequently observed in many cancer types [29,30] . The elevated expression of SPOCK1 has been shown to correlate with EMT-related markers in human gastric cancer tissue, clinical metastasis and a poor prognosis in patients with gastric cancer [31] . In addition, knockdown of SPOCK1 expression signi cantly inhibits the invasion and metastasis of gastric cancer cells in vitro and in vivo. CXCL3 belongs to the CXC-type chemokine family and is known to play a multifaceted role in various human malignancies [32] . RUVBL2 represents RuvB-like 2 protein, deregulation of which in hepatocellular carcinoma is in uenced at the genomic, epigenetic and transcriptional levels. It has been suggested as a promising prognostic marker as well as a therapeutic target for hepatocellular carcinoma [33] . Beside these, the connection between the other genes and laryngeal cancer are not clear yet. All of these deserve further investigation.
In conclusion, in the light of the fact that there remains no gold standard prognosis and no reliable disease-speci c prediction for laryngeal cancer, we establish a innovative prognostic model based on TME-related gene expression. According to the model, several related genes and immune cells as well as 6 key immune checkpoints demonstrate signi cant difference in high and low-risk group. Overall we shed light on questions and challenges posed by the laryngeal cancer, and we establish a innovative prediction target which can provide great help for future understanding of the disease. Figure 1 Construction of a prognostic model for laryngeal cancer. (A) Forest plot of the top 20 most signi cant TME genes related to the prognosis of laryngeal cancer. HR is the Hazard ratio, and 95% CI is the 95% con dence interval. (B) The graph of determining the tuning parameter lambda in the LASSO regression model. The horizontal axis is log (lambda), and the vertical axis is the partial likelihood deviation value respectively. The lambda value corresponding to the smallest value is the best. (C) Kaplan Meier survival curve in the TCGA data set. The horizontal axis indicates time, while the vertical axis indicates survival rate. Different colors represent different groups. The P-value is based on log-rank test. (D) The timedependent ROC curve. The horizontal axis is the false positive rate, while the vertical axis is the true positive rate respectively. The accuracy of the prediction is evaluated by the AUC (area under the ROC curve) value. (H) The mRNA expression heat map of the 15 genes selected in the TCGA data set and GEO validation data set.