A Novel Immune-Related lncRNA Signature predict the Immune Landscape and Prognosis for Head and Neck Squamous Cell Carcinoma

Background: The tumor immune microenvironment is known to play an important role in head and neck squamous cell carcinomas (HNSCC). Reliable prognostic signatures that could accurately predict immune landscape and survival in HNSCC patients are vitally needed to promote a better individualized and effective treatment. Methods: HNSCC transcriptome data and clinical data in The Cancer Genome Atlas (TCGA) were embedded in our study. The differentially expressed irlncRNAs were identied by differential co-expression analysis, and recognized differently expressed irlncRNA (DEirlncRNA) pairs using univariate analysis. Cox and lasso regression analysis was used to identify DEirlncRNA pairs related to overall survival (OS) and build the prediction model. Then, we compared the areas under curve (AUC) with other published lncRNA signatures, counted the akaike information criterion (AIC) values of 3-year receiver operating characteristic curve, and identied the cut-off point to set up an optimal model for distinguishing the high- or low- risk groups among patients with HNSCC. Receiver operating characteristic (ROC)curves and Kaplan–Meier plot curves were used to validate the prediction model. Besides, We then reevaluated them from the viewpoints of clinical factor, tumor-inltrating immune cells, chemotherapeutics ecacy, and immunosuppressed biomarkers. Results: We built a risk score model based on 18 DEirlncRNA pairs. The risk model is closely related to the OS of HNSCC patients. The hazard ratio (HR) is 1.376 [95% CI (condence interval) 1.302-1.453] and log-rank P-value < 0.0001. Compared with two recently published lncRNA signatures, DEirLncRNA pairs signature has higher AUC score which showed the better prognostic performance. Additionally, the signature score showed a positive correlation with aggressive outcomes of HNSCC, such as low immunity score, signicantly reduced CD8+ T cell inltration and lowly expressed immunosuppressed biomarkers. However, high-risk groups of patients may have high chemotherapeutics sensitivity. Conclusions: The signature established by paring irlncRNA


Background
Head and neck tumors are the sixth most common malignancy in the world, head and neck squamous cell carcinoma (HNSCC) is the most common histological type [1]. HNSCC encompasses a heterogeneous group of malignancies that arise in the oral cavity, larynx and pharynx and are mainly associated with tobacco, alcohol consumption and human papillomavirus (HPV) infection [2][3][4]. However, due to the early symptoms of HNSCC are not obvious, the 5-year survival rate is still below 50%.
Furthermore, HNSCC is characterized by high lymph node metastasis rate, local recurrence and metastasis can reduce the survival rate to 35% [5]. Currently, HNSCC is estimated to have an annual incidence of 900,000 and causes 450,000 death every year [1].
HNSCC is a disease characterized by profound immunosuppression [6]. immune checkpoint inhibitors (ICIs) therapy is becoming a new hope for HNSCC treatment. In recent years, considerable progress has been made in ICIs therapy for HNSCC. However, the response rate of recurrent or metastatic HNSCC for PD-1/PD-L1 inhibitors was only found to be 13.3-22% in previous clinical trials [6]. Therefore, biomarkers which could effectively predict the e cacy of ICIs are crucial for patient selection. Generally, PD-L1 expression level is used as the representative marker for predicting the e cacy of ICIs. However, for most tumors, only the detection of PD-L1 alone cannot screen the dominant population [7][8][9].
Long non-coding RNAs (lncRNAs) refer to functional RNA molecules with a length more than 200 nucleotides and have minimal or no function to encode proteins [10]. It has become increasingly recognized as lncRNAs can interact with DNA, RNA and proteins which regulate gene expression at the transcription and post-transcription levels and is associated with various types of cancer [10,11].
Recently, several lncRNA signatures have been established for prognosis prediction in some cancers, including HNSCC. Mao et al. [12] identi ed an eight lncRNA signature is an independent prognostic factor of HNSCC patients. Bao et al. [13] identify genome instability-associated lncRNAs and constructed a signature to evaluate the prognosis of patients with breast cancer.
It is know that the diagnosis, evaluation, and treatment of cancer is closely related to the tumor immune microenvironment, especially to in ltrating immune cells [14,15,16]. Now research con rms that, lncRNAs are critical regulators of gene expression in the immune system. Research shows that lncRNAs direct the expression of genes related to the activation of immune cells that directing the development of diverse immune cells so resulting in tumor immune cell in ltration and altering the immune microenvironment [17,18]. Therefore, in this study, we tried to identify immune-related lncRNAs (irlncRNAs) to investigate the relationship between irlncRNAs and immune in ltration, construct a powerful prognostic model for risk assessment of HNSCC and predict the response of patients to immunotherapy. Different from simple genes in the past, we utilized a novel modeling algorithm, paring, and iteration to construct an irlncRNA signature, which adopted two-biomarker combinations will help improve the accuracy of the cancer prognostic model.

Data collectin
HNSCC RNAseq expression pro le data were downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/). The database contained 501 cases tumor tissue sequencing information with HNSCC patients and 44 cases of normal tissue. Clinical and follow-up data of patients with HNSCC were retrieved from the HNSC project of TCGA, valid data were extracted after eliminating the data with follow-up time of 30 days and repeated data. "Ensembl_Stable_id" is converted to "Gene Symbol" in the RNAseq expression pro le according to the GTF le which downloaded from Ensembl (http://asia.ensembl.org), and annotation to distinguish the mRNAs and lncRNAs for subsequent analysis. The list of immune-related genes (ir-genes) was downloaded from the ImmPort database (http://www.immport.org), and correlation analysis was conducted between ir-genes and all lncRNAs to screen for irlncRNAs. Immune gene correlation coe cients greater than 0.4 or less than − 0.4 and p value less than 0.001 were considered as irlncRNAs. The differentially expressed irlncRNAs (DEirlncRNAs) were identi ed through the "limma" R package, FDR < 0.05 and |log 2 FC|>1.5 were de ned as DEirlncRNAs.

Pairing DEirlncRNAs
The DEirlncRNAs were cyclically singly paired, and construct a 0-or-1 matrix. If the expression level of the former lncRNA was higher than that of the latter, it was de ned as 1; otherwise, it was de ned as 0. Then the constructed 0-or-1 matrix is further screened. When the amounts of DEirlncRNA pairs of which expression quantity was 0 or 1 accounted for more than 20% and less than 80% of total pairs, it was considered a valid match.
Building the prognostic pairing DEirlncRNAs-based signature Univariate Cox analysis was performed by "survival" R package to screen DEirlncRNA pairs with prognostic values and P < 0.01 DEirlncRNA pairs is considered to be signi cantly related to the prognosis of HNSCC patients. Next, Lasso regression analysis performed by "glmnet" and "survival" R package to select the minimum error value to eliminate over-tting data, the optimal penalty parameter "lambda" value was calculated via a 1,000 cross-validations. Then multivariate analyses were performed using the Cox regression model to identify the prognostic-associated DEirlncRNA pairs factors in HNSCC. Based on the coe cients from the multivariate regression analysis and the status of DEirlncRNA pairs, we constructed a DEirlncRNA pairs signature(DEirLnc-PSig) for outcome prediction as follows: where βi indicates the coe cient for each DEirlncRNA pairs and Pi indicates the DEirlncRNA pairs status of 0-or-1 matrix.

Validation of the constructed risk model
A time-dependent receiver operating characteristic (ROC) curve and area under curve(AUC) was estimated using the " survivalROC" R package to predict the accuracy of DEirLnc-PSig for HNSCC patients. The 1-, 2-, and 3-year ROC curves and AUC of the model were plotted. The AIC values of every points of the 3-year ROC curve were evaluated to identify the maximum in ection point, which was considered as the cut-off point to classify patients into the high-risk group with high DEirLnc-PSig or low-risk group with low DEirLnc-PSig.
The Kaplan-Meier method was used to construct survival curves, and the log-rank test was used to assess the difference in survival between the high-and low-risk groups with a signi cant level of 0.05. The R packages utilized in these steps included "survival" and "survminer". To verify the clinical application value of the DEirLnc-PSig, multivariate Cox regression and strati ed analysis were used to assess the independence of DEirLnc-PSig from other key clinical factors. A forest map was used to demonstrate the results. The R packages utilized in these operations were "survival" and "survminer". In addition, we compared our de ned DEirLnc-PSig with two previously published lncRNA signatures to further compare the prognostic model performance.

Investigation of tumor-in ltrating immune cells
To explore the differences in immune cell in ltration between different risk groups, we used the TIMER, CIBERSORT, XCELL, QUANTISEQ, MCPcounter, EPIC, and CIBERSORT methods to assess the samples from the TCGA project of the HNSCC dataset. Differences in the number of different types of immune in ltrating cell between high-and low-risk groups of the constructed model were analyzed using the Wilcoxon's signed-rank test and the results are shown in a box chart. The correlation between the risk scores and immune cell in ltration were evaluated using Spearman's correlation analysis. The correlation coe cients of the results were shown in a lollipop diagram. The signi cance threshold was set as p < 0.05. The R packages utilized in these operations were "limma", "scales", "ggplot2", "ggtext" and "ggpubr".

Exploration of the signi cance of the model in the clinical treatment
We calculated the half inhibitory centration (IC 50 ) of common administrating chemotherapeutic drugs in the TCGA project of the HNSCC dataset by " pRRophetic" R package to evaluate the model in the clinic for head and neck carcinoma treatment. Antitumor drugs such as cisplatin, docetaxeland, methotrexate and paclitaxel are recommended for HNSCC treatment by NCCN guidelines. The difference in the IC 50 between the high-and low-risk groups was compared by Wilcoxon's signed-rank test and the results are shown as box drawings.

Analyses of the immunosuppressive molecules expressing related to ICIs
The relationship between the model and the expression level of genes related to ICIs were performed by "limma" and "ggpubr" R package and violin plot visualization.

Identi cation of irlncRNAs and differentially expressed irlncRNAs
A study process ow chart is given in Fig. 1. A total of 805 irlncRNAs were identi ed by co-expression analysis of ir-genes and lncRNAs. The differential expression analysis was subsequentially carried out using "limma" R package, 132 DEirLncRNAs (75 upregulated and 16 downregulated) were identi ed between the tumor and normal tissues (Fig. 2). Establishment of DEirlncRNA pairs and a DEirlncRNA pairs signature Using an iteration loop and a 0-or-1 matrix screening among 132 DEirlncRNAs and 6,686 valid DEirlncRNA pairs were identi ed. By conducting the univariate Cox regression analysis, 500 survivalrelated DEirlncRNA pairs were selected (p < 0.01). To further avoid result over tting, the Lasso regression analysis was employed to screen out 33 DEirlncRNA pairs showing the highest correlation with survival ( Fig. 3a, b). Those 33 DEirlncRNA pairs were further subjected to multivariable Cox regression analysis, nalizing the most signi cant 18 DEirlncRNA pairs was developed as a DEirlncRNA pairs signature for HNSCC patients by the stepwise elimination method and obtaining corresponding coe cients (Fig. 3c). We then applied the 1-, 2-, and 3-year ROC curves to verify the predictive capacity of the DEirLnc-Psig; the AUCs of the ROC were almost all ~ 0.75, suggesting a satisfactory predictive performance (Fig. 4a). In addition, we compared the 3-year ROC curve with other clinical features and the superiority of DEirLnc-PSig have been further demonstrated (Fig. 4b).
Clinical evaluation by risk assessment model and independence of the DEirLnc-PSig from other clinical factors We recognized the maximum in ection point as the cut-off point on the 3-year ROC curve using the Akaike information criterion (AIC) values and according to the cut-off point (Fig. 4c, d), HNSCC patients from the TCGA cohort was divided into a low-risk group(n = 255) and a high-risk group(n = 235). We then compared the survival of low-risk group with the high-risk group and found that the DEirLnc-PSig possessed powerful capacity to predict the prognosis of HNSCC patients between the low-and high-risk group, in which the low-risk group are signi cantly better than the high-risk group (p < 0.001, log-rank test) (Fig. 5a, b, c).
To evaluate whether the prognostic value of the DEirLnc-PSig was independent of common clinical variables, univariate and multivariate Cox regression analyses were performed on age, gender, pathologic grade, clinical stage and our DEirLnc-PSig-based prognostic risk score model (Fig. 6a, b). The results demonstrated that risk score showed statistical differences by univariate and multivariate Cox regression analysis which presented as an independent prognostic predictor. In addition to the DEirLnc-PSig, there were other two clinical factors, age and clinical stage, that were observed to be signi cant in the multivariate analysis. Therefore, strati cation analysis was performed to determine whether the DEirLnc-PSig possessed a prognostic value that was independent of the age and clinical stage. Patients in the TCGA set were strati ed into a young-patient group ( < = 65years, n = 124) and an old-patient group (> 65 years, n = 253). Using the DEirLnc-PSig, patients in each age group could be further divided into high-risk or low-risk group. There was a signi cant difference in OS between the high-and low-risk groups in the young-patient group (p < 0.001, log-rank test) as was in the old-patient group (p < 0.001, log-rank test) (Fig. 6c, d). Next, all HNSCC patients were then strati ed into the early-stage group (stage I and II, n = 70) and the late-stage group (stage III and IV, n = 307). Using the DEirLnc-PSig, patients in each stage group could be further divided into high-risk or low-risk group. There was a signi cant difference in OS between the high-and low-risk groups in the early-stage group (p = 0.013, log-rank test) as was in the late-stage group (p < 0.001, log-rank test) (Fig. 6e, f). These results indicated that the DEirLnc-PSig was an independent prognostic factor associated with OS in HNSCC patients.
Performance comparison of the DEirlncRNA pairs signature with existing lncRNA-related signatures in survival prediction We further compared the prediction performance of the DEirLnc-PSig with two recently published lncRNA signatures: 3-lncRNA signature derived from Jiang's study (hereinafter referred to as JiangLncSig) [19] and 5-lncRNA signature derived from Liu's study (hereinafter referred to as LiuLncSig) [20] using the same TCGA patient cohort. As shown in Fig. 7, the AUC at 3 years of OS for the DEirLnc-PSig is 0.809, which is signi cantly higher than that of JiangLncSig (AUC = 0.632) and LiuLncSig (AUC = 0.5). These results demonstrated the better prognostic performance of the DEirLnc-PSig in predicting survival than two recently published lncRNA signatures.

Estimation of tumor-in ltrating immune cells and immunosuppressive molecules with risk assessment model
Since the model was established by irlncRNAs, we consequently investigated whether the model was related to the tumor immune microenvironment. Correlation analysis of immunocyte in ltration revealed that the high-risk group was more positively associated with CD4 + T cells, cancer associated broblast, monocyte and macrophages, whereas were negatively associated with CD8 + T cells and T cell regulatory (Tregs), as revealed by the Wilcoxon signed-rank test (see supplementary le, Table S1 and Figure S1). In addition, the immune scores in the low-risk group were signi cantly higher than those in the high-risk group (p = 0.0038) in the analysis with xCell, which suggesting that the tumor cells in the low-risk group had more immune cell in ltration than those in the high-risk group. A detailed Spearman correlation analysis was conducted, and the resulting diagram exhibited a lollipop shape, as shown in Fig. 8a. Immunotherapy with ICIs has changed HNSCC treatment strategies dramatically, improve survival of HNSCC patients. We investigated whether the risk model was related to ICI-related biomarkers and discovered that high risk scores were positively correlated with low expression of PDCD1 (p < 0.001, Fig. 8b), LAG3 (p < 0.01, Fig. 8c) and CTLA4 (p < 0.001, Fig. 8d).

Analysis of the correlation between the risk model and chemotherapeutics
As common therapy for HNSCC, we attempted to identify associations between risk and the e cacy of chemotherapeutics in treating head and neck squamous cell carcinoma in the TCGA project of the HNSCC dataset. We showed that a high risk score was associated with a lower IC 50 of chemotherapeutics such as docetaxel (p < 0.001), paclitaxel (p = 0.02), whereas it was associated with a higher IC 50 for methotrexate (p < 0.001), which indicated that the model acted as a potential predictor for chemosensitivity (Fig. 9).

Discussion
Despite the effectiveness of targeted therapies and immunotherapy agents in HNSCC, not all patients respond to treatment and development of resistance is almost universal after a time. In clinical practice, ICIs therapy does not appear to improve survival in HNSCC patients. Precision medicine approaches that seek to individualise therapy through the use of predictive biomarkers and strati cation strategies, have been the key to improve therapeutic success in HNSCC [21,22].
In recent years, many studies have shown that lncRNA is an important and signi cant factor in the current and future prognosis of patients with HNSCC [22]. For this reason, many prognostic lncRNA expression signatures have been applied to diagnose and predict prognosis of HNSCC. However, it should be pointed out that HNSCC are a type of immunosuppressive disease [23]. In the case of HNSCC, a deeper understanding of the heterogeneity and complex of the tumor immune microenvironment is essential to develop sensitive predictive biomarkers for providing information of tumor microenvironment and help to improve the e cacy of tumor immunotherapy. An ideal signature would not only to predict the prognosis of cancer patients, but also re ect the characteristics of the tumour and immune microenvironment that have an impact on disease progression and treatment response. Therefore, we established a prediction model of immune-related lncRNAs and evaluated the relationships between the risk score draw from DEirLnc-PSig, immune checkpoints, tumor-in ltrating immune cell abundances and the response to chemotherapy.
Unlike most signatures, which are based on quanti ed transcript expression levels, we attempted to construct a reasonable model with two-lncRNA combinations and did not adopt their exact expression levels in the signature. Thus, pairs with higher or lower expression only had to be detected instead of examining speci c expression values of every lncRNA. The results of this study showed that this novel model had a good predictive performance and achieved a better prediction performance with an AUC value of 0.809, which is higher than other two expression based lncRNA signature, risk score derived from the DEirLnc-PSig is reliability and independent for predicting HNSCC prognosis.
In this study, some of the irlncRNAs in the process of modeling that have been already identi ed play an important role in malignant phenotypes of various cancer types, such as LINC01063 [24,25], HOXC-AS1 [26,27], LINC02454 [28,29] and SCAT1 [30]. Furthermore, previous reports have shown that LINC00460 [31], C5orf66-AS1 [32] and TMPO-AS1 [33] are associated with prognosis and treatment outcomes in HNSCC. Jiang et al. [31] revealed that LINC00460 promotes EMT in HNSCC by facilitating peroxiredoxin-1 into the nucleus. Lu et al. [32] rst report that C5orf66-AS1 was able to prevent OSCC progression by inhibiting OSCC cell growth and metastasis via regulating CYC1 expression. Xing et al. [33] disclosed that a novel TMPO-AS1/miR-320a/SOX4 pathway associated with nasopharyngeal carcinoma(NPC) progression, suggesting that TMPO-AS1 may be a potential therapeutic target for NPC. However, there are also some lncRNAs that have been disclosed for the rst time in this model which suggested that the proposed model can identify novel biomarkers for further research.
Tumor-in ltrating immune cells which are integral to the tumor immune microenvironment (TIME) are recognized as highly important for prognosis prediction, while the responsiveness to immune checkpoint blockade may be subject to the features of TIME. To explore the relationship between risk scores and tumor-in ltrating immune cells, we used seven common acceptable methods to estimate the immune in ltrating cell, including xCell [34,35], TIMER [36,37], QUANTISEQ [38,39], MCPcounter [40], EPIC [41], CIBERSORT-abs [42], and CIBERSORT [43]. Multiple studies have indicated that greater in ltration of CD8 T cells and high immune scores correlated with better response to standard chemotherapy and to immune checkpoint blockade therapy in tumor patients [44,45]. By integrating analyzing, our results revealed that the high-risk group identi ed by DEirLnc-PSig was negatively associated with CD8 + T cell and immune scores. Correspondingly, the high-risk group was also negatively correlated with checkpointrelated biomarkers such as PDCD1, LAG3 and CTLA4. Our results demonstrated that patients in high-risk groups identi ed by our model may not bene t from immunotherapy. But these conclusions need to be veri ed through further studies especially supported by clinical trials. On the other hand, we checked several more drugs like cisplatin, paclitaxel, doxorubicin and methotrexate, we observed from the estimated IC 50 of these chemotherapeutic drugs, that the high-risk group could be more sensitive to chemotherapies than those in low-risk group. It would be more helpful to identify patients at high clinical risk and give sensitive chemotherapy rather than immunotherapy. However, more benchwork and clinical studies are needed to further validation.

Conclusions
In summary, our study revealed a comprehensive landscape of the tumor immune microenvironment in HNSCC, and provides a promising prognostic signature for HNSCC. Patients with the low risk scores could bene t more from ICIs therapy while patients with the high risk scores may bene t more from chemotherapy. However, further investigations are needed to verify the accuracy in estimating prognoses and to test its clinical utility in patient management.

Consent for publication
Not applicable.

Availability of data and materials
The datasets analyzed during the current study are available in the TCGA repository, https://portal.gdc.cancer.gov/.

Competing interests
The authors report no competing interest in this study. Authors' contributions ZMC designed the research, collected the data, analyzed the date and wrote the manuscript. LC took part in designing the research. YL collected the data, analyzed the date. WBL took part in designing the research and solved the disagreements between ZMC and LC. All authors read and approved the nal manuscript.

Figure 5
Survival prediction of HNSCC patients with the risk score model. a-b Patients were scored and grouped into the low-risk group (green) and high-risk group (red). Then scatter diagrams of patient's survival were plotted based on the risk scores from low to high, with green referring to survival patients and red referring to deaths. c Kaplan-Meier survival curves Figure 6 Univariate and multivariate Cox analysis to identify the risk factors and strati cation analyses by age and stage. a Univariate. b Mutivariate. c-d KM curve analysis of OS in high-and low-risk groups for young patients and old patients. e-f KM curve analysis of OS in high-and low-risk groups for early-stage patients and late-stage patients. Statistical analysis was performed using the log-rank test.

Figure 8
Estimation of tumor-in ltrating cells and immunosuppressed molecules by the risk assessment model. a Lollipop diagram of the correlation between the risk scores and immune cell in ltration. b-d Violin plot showed the expression of genes related to ICIs between low-and high-risk group