Immune ‐ related Long Noncoding RNA Signature for Patients with Cervical Cancer

Background: Cervical cancer (CC) is a common gynecological malignancy for which prognostic and therapeutic biomarkers are urgently needed. The signature based on immune ‐ related lncRNAs(IRLs) of CC has never been reported. This study aimed to establish an IRL signature for patients with CC. Methods: The RNA-seq dataset was obtained from the TCGA, GEO, and GTEx database. The immune scores (cid:0) IS (cid:0) based on single-sample gene set enrichment analysis (ssGSEA) were calculated to identify the IRLs, which were then analyzed using univariate Cox regression analysis to identify signicant prognostic IRLs. A risk score model was established to divide patients into low-risk and high-risk groups based on the median risk score of these IRLs. This was then validated by splitting TCGA dataset(n=304) into a training-set(n=152) and a valid-set(n=152). The fraction of 22 immune cell subpopulations was evaluated in each sample to identify the differences between low-risk and high-risk groups. Additionally, a ceRNA network associated with the IRLs was constructed. Results: A cohort of 326 CC and 21 normal tissue samples with corresponding clinical information was included in this study. Twenty-eight IRLs were collected according to the Pearson’s correlation analysis between immune score and lncRNA expression (P < 0.01). Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) with the most signicant prognostic values (P < 0.05) were identied which demonstrated an ability to stratify patients into low-risk and high-risk groups by developing a risk score model. It was observed that patients in the low ‐ risk group showed longer overall survival (OS) than those in the high ‐ risk group in the training-set, valid-set, and total-set. The area under the curve (AUC) of the receiver operating characteristic curve (ROC curve) for the four IRLs signature in predicting the one-, two-, and three-year survival rates were larger than 0.65. In addition, the low-risk and high-risk groups displayed different immune statuses in GSEA. These IRLs were also signicantly correlated with immune cell inltration. Conclusions: Our results showed that the IRL signature had a prognostic value for CC. Meanwhile, the specic mechanisms of the four-IRLs in the development of CC were ascertained preliminarily.


Introduction
Cervical cancer(CC)is a malignant gynecologic tumor threatening the health of women. The morbidity and mortality for CC ranks fourth worldwide among women [1]. Infection with high-risk human papillomavirus (HPV), especially HPV16 and HPV18, is the main etiologic risk factor for CC, and plays an important role in diagnostic tests [2]. Surgery is the main treatment method for CC in early stages while advanced-stage CC can be treated with radiotherapy, chemotherapy, or concurrent chemoradiation, thereby improving the survival rate of CC patients [3]. However, a considerable number of CC patients have poor prognosis due to metastasis or recurrence within two years after treatment [4]. Hence, effective prevention to reduce morbidity and individual treatments to improve the prognosis of CC are important for obstetricians and gynecologists.
The immune system can recognize tumor antigens expressed on the surface of tumor cells. It generates an immune response via the activation of effector cells and triggers the release of a series of effector molecules to attack and eliminate tumor cells and to inhibit tumor growth [5]. The immune imbalance in tumor microenvironment plays an important role in the occurrence and development of cancer. With the development of cellular molecular biology and immunology, immunotherapy has become a new treatment approach for cervical cancer [6]. Tumor immunotherapy acts mainly by increasing the immunogenicity of tumor cells and the effect of cell damage sensitivity, stimulate and enhance the antitumor immune response, with the aid of biological agents, doping effects of immune cells and molecules into the body, together, the body's immune system can not only kill cancer cells remaining small, still can prevent tumor metastasis and recurrence [7].
Long noncoding RNAs (lncRNAs), de ned arbitrarily as transcripts lacking protein-coding potential, are RNAs with more than 200 nucleotides [8]. LncRNAs are not only associated with the invasion, migration, and proliferation of CC, but are also involved in autophagy and epithelial-mesenchymal transition (EMT) [9]. Emerging studies have demonstrated that lncRNAs also modulate the immune response to tumors.
For instance, the down-regulated lncRNA AGER has been demonstrated to be closely related to T-cell status in lung cancer [10]. lncRNA GM16343 is regulated by Interleukin 36β to strengthen the anti-tumor immune response of CD8 + T-cells [11]. lncRNA LINK-A has been demonstrated to downregulate antigen presentation and intrinsic suppression in triple-negative breast cancer [12]. LncRNAs can also affect the development of cancer by regulating NK cells. LncRNA GAS5 can inhibit tumor growth, and the overexpression of GAS5 can regulate Mir-544/RUNX3 to enhance the killing effect of NK cells in liver cancer [13]. LncRNAs can also modulate tumor immunity by regulating Treg cells. The expression of Mir-448 can be increased by interfering with lncRNA SNHG1, which down-regulates the expression of indoleamine 2,3-dioxygenase (IDO), inhibits the differentiation of Treg cells, and reduces the immune escape of breast cancer cells [14]. EGFR is an important member of the tyrosine kinase receptor family. By binding to EGFR speci cally, LNC-EGFR promotes Treg cell differentiation and promotes the immune escape of liver cancer cells [15]. LncRNAs can affect the tumor microenvironment and thus play an important role in immunotherapy. Nonetheless, the effect research on immune-related lncRNAs in CC is rarely reported.
The purpose of our research was to identify an IRL signature, which might serve as prognostic and therapeutic biomarkers in CC. We developed a prognostic signature and mechanisms of IRLs using single-sample gene set enrichment analysis (ssGSEA), survival analysis, a Cox regression risk model, gene set enrichment analysis (GSEA), ceRNA network, and other analysis methods.

Page 4/25
The RNA sequencing pro les associated with CC were obtained from The Cancer Genome Atlas (TCGA) [16] (https://toil.xenahubs.net), which consisted of 306 CC and 13 normal tissue samples. The lowexpression genes were ltered, and the genes whose expression level was greater than 0 in more than a third of the samples were retained. Additionally, the RNAs were identi ed as mRNAs or lncRNAs based on their annotation information in the GENCODE database [17]( https://www.gencodegenes.org/). GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array platform was used to obtain the microarray dataset GSE6791 from gene expression omnibus (GEO) repository (GeneExpressionOmnibus,GEO http://www.ncbi.nlm.nih.gov/geo/) [18], of which twenty CC and eight normal tissue samples were included. All 326 CC samples and 21 normal tissue samples were contained corresponding clinical information. For the RNA sequencing pro les obtained from TCGA TARGET GTEx, empirical Bayes and linear regression along with Benjamini and Hochberg multiple comparison methods from limma package [19](Version 3.10.3, http://www.bioconductor.org/packages/2.9/bioc/html/limma.html were performed to gain adjusted-Pvalue and |logFC| (adj.P.Value 0.05, |logFC| 0.585). For GSE6791, GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) was used with the seqmap [20] tool to map probes to mRNA and lncRNA sequences. The immune-related genes (IRGs) were downloaded from the InnateDB database (http://www.innatedb.com) [21]. Focusing on screening genes that were up-and down-regulated consistently, we used venn analysis to select the intersection genes of the aforementioned datasets.
Single-sample gene set enrichment analysis (ssGSEA) [22] was used to identify the immune scores(IS of each sample. Pearson's correlation coe cient between lncRNAs and IS was calculated for each corresponding samples to identify immune-related lncRNAs IRLs (P 0.01).

Signature development of IRLs
Univariate Cox regression analysis with hazard ratio (HR) was gained from overall survival (OS) and OS time from the candidate IRLs. HR 1 indicated that expression was higher as well as the risk and the survival rate was lower. LncRNAs that are up-regulated in the tumor should, in principle, have an HR greater than 1. The Kaplan-Meier analysis was generated by survminer(version 0.4.3)in R package based on the expression value, survival time and survival status to determine the optimal cut point. The log-rank test was performed based on survival version 2.42-6 in R package to sort the IRLs with a signi cant prognostic value (P < 0.05), then the survival curves were drawn. Multivariate Cox regression analysis was performed based on the expression value, OS, and OS time of IRLs in each sample. Subsequently the individual prognostic risk model for corresponding samples was established. Expr gene indicated the expression of corresponding IRL for each sample. All the samples were divided into low-risk and high-risk according to the median of risk scores the following study.All the samples in TCGA were regarded as a total-set. Training-and valid-sets were constructed by dividing the total-set equally into two parts to validate the risk score formula. The Kaplan-Meier survival curves of all three sets were drawn to determine the prognostic difference between the risk groups. The one-year, two-year and three-year survival receiver operating characteristic (ROC) curves predicted by the risk model were drawn.
Clinical and pathological characteristics of the risk score model Clinical and pathological characteristics, including age, pathologic M, pathologic N, pathologic T, clinical stage, neoplasm histologic grade, neoplasm cancer status, primary therapy outcome success, radiation therapy, tobacco smoking history, along with the risk score, the immune score and the expression of IRLs in each sample were included in the analysis for illustration of a heat map. We observed the differences in clinical pathological characteristics in risk groups or the risk score of clinical pathological characteristics using the Student's t-test. A scatter plot was drawn using GraphPad Prism 5 [23] based on signi cant clinicopathological characteristics.

Nomogram model construction and visualization
Univariate Cox regression analyses were performed respectively based on the risk groups and clinicopathological factors(age, pathologic-M, pathologic-N, pathologic-T, clinical-stage, neoplasmhistologic-grade, radiation-therapy and tobacco smoking-history). Multivariate Cox regression analysis was conducted on the factors with a p 0.05. Using the IRL signature along with the factors with a p 0.05, a nomogram was constructed to make the multivariate Cox regression analysis more clear.

Gene set enrichment analysis
To elucidate the biological differences between risk groups, a gene set enrichment analysis (GSEA) [24] of "c5.bp.v7.1.symbols.gmt" background was carried out using GSEA(version 4.0.3) software. A nominal p value < 0 .05(NOM p-val < 0.05) was considered signi cant. We focued on selecting immune-related terms for display.

Evaluation of in ltrating immune cells
To further observe the differences in the abundance of in ltrating immune cells of the risk groups, we used the CIBERSORT algorithm, a deconvolution method [25], coupled with LM22 that distinguished 22 immune cell subpopulations from Cibersort (a web server) and a heat map for all the samples was drawn using ggplot2(Version: 3.2.1). The Student's t-test was applied to nd signi cant immune cell subpopulations in risk groups to chart Violin Plot by boxplot version 0.3.2 in R package.
Only miRNA-mRNA pairs recognized by all four databases were considered as candidate targets. Subsequently, the ceRNA network based on the same miRNA of lncRNA-miRNA and miRNA-mRNA pairs was established and visualized using Cytoscape (version:3.4.0 http://chianti.ucsd.edu/cytoscape-3.4.0/) [29].Sorting the mRNA with a signi cant prognostic value ((P < 0.05) in the ceRNA network, the Kaplan-Meier analysis was generated using survminer (version 0.4.3) to determine the optimal cut point, and the log-rank test was performed based on survival (version 2.42-6) in R package.

Differential analysis of genes
In the aggregate, 1995 differentially expressed lncRNAs, with corresponding clinical information were extracted from TCGA and GSE6791 databases mentioned above. Of these, 567 lncRNAs were highly expressed and 1428 lncRNAs were expressed at low levels in the CC samples (Table 1). A total of 64 lncRNAs were obtained from the intersection of the two databases (Fig. 1A), among which 10 lncRNAs were consistently up-regulated and 54 lncRNAs were consistently down-regulated in the two databases. A total of 1040 IRGs were identi ed from the InnateDB. The ssGSEA analysis was based on 201 IRGs obtained from the intersection of TCGA and InnateDB databases (Fig. 1B) to compute IS. Finally, a cohort of 28-IRLs was obtained based on the Pearson correlation coe cient analysis between the 64 lncRNAs and IS of 201 IRGs(P 0.01). Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) with low expression were found to be in accordance with our expectation and were then included in the signature development. The results are shown in Table 2. The Kaplan-Meier survival analysis revealed that four IRLs were related to OS in CC signi cantly (p 0.05) (Fig. 2). The four IRLs were de ned as protective factors due to their HR value 1, which showed that the high expression of the four IRLs was associated with lower OS.  Expr gene indicated the expression value of corresponding IRL for each sample. An RS higher than the median was identi ed as a high-risk group while an RS lower than the medium was identi ed as a lowrisk group. Using this approach, a risk model signature based on four IRLs was constructed, which was further validated in the total-set and valid-set using the same β to con rm the prediction potential of the 4-IRL signature. The Kaplan-Meier survival curves based on logRank test revealed that OS in the low-risk group was markedly longer than that in the other group in all three sets (P Training−set =0.0068, P Valid−set =0.02, P Total−set =0.0015, Fig. 3A, 3B, 3C). The one-year, two-year and three-year survival ROC curves predicted by the risk model indicted that the AUCs were larger than 0.65 (0.695, 0.66, 0.676, Fig. 3D),thus predicting that the risk score model could e ciently forecast over 65% of OS for CC patients. Therefore, the risk model signature based on four IRLs was accurate in predicting OS of CC patients.
Clinicopathological characteristics of the risk score model A heat map was constructed by combining the expression values of four IRLs and their clinicopathological characteristics (Fig. 4A). The higher the RS, the lower the IS. RS and IS values showed a signi cant negative correlation based on the scatterplot of the correlation coe cient (r=-0.14, p = 0.01631, Fig. 4B). The scatterplot for the distribution of IS in risk groups showed that the IS of the highrisk group was signi cantly lower than that of the low-risk group (Fig. 4C). Furthermore, the high expression of the four IRLs could be seen with low RS, which suggested that the up-regulation of the four IRLs were associated with better prognosis. In contrast, low expression of the four IRLs had the opposite consequence. An RS contrast of neoplasm histologic grade showed that the RS in stages IIB-III-IV was signi cantly greater than that in stages I-II-IIA (Fig. 4C). This part of the result suggests that a high-risk score has an adverse effect on prognosis, which may be caused by a decrease in immune score.

Nomogram model construction and visualization
The univariate and multivariate Cox regression analyses of clinicopathological characteristics and the four-IRL signature for total TCGA dataset demonstrated that age, AJCC stage along with the four-IRL signature were all independent prognostic factors (P 0.05, Table 4). Nonetheless, neoplasm histologic grade, radiation therapy, and a history of tobacco-smoking were not correlated with prognosis independently. To better predict prognosis at one-, three-, and ve-year OS of CC patients, we constructed a new nomogram using variables such as age, AJCC stage, and the four-IRL signature (Fig. 5). Gene set enrichment analysis GSEA analysis of the two risk groups was carried out to predict enrichment status disparities of molecular mechanism functions. The enrichment analysis showed that 118 biological functions were markedly enriched in the low-risk group, whereas only one biological function(GO_ATP_SYNTHESIS_COUPLED_ELECTRON_TRANSPORT)was enriched in the high-risk group.
In the enrichment status enriched in the low-risk group, we sought out couple immune-related responses as shown in Fig. 6.

Evaluation of in ltrating immune cells
Previous studies have shown that in ltrating immune cells are closely related to the prognosis and treatment of malignant tumors [29]. From the GSEA analysis in our study, we discovered that the four-IRL signature was associated with many immune characteristics. Hence, the abundance of twenty-two in ltrating immune cells (Fig. 7A) were estimated which showed that the abundance of nine in ltrating immune cells (B cells naïve, B cells memory, T cells CD4 memory resting, NK cells resting, macrophages M1, dendritic cells activated, mast cells resting, mast cells activated and neutrophils) was signi cantly different (P 0.05) between the risk groups (Fig. 7B). The abundance of four in ltrating immune cells (B cells naïve, T cells CD4 memory resting, macrophages M1 and mast cells resting) in the low-risk group were signi cantly higher than that in the high-risk group as shown by the Student's t-test(P 0.05).

Construction of an IRL-associated ceRNA network
Four-IRLs,forty-ve hub mRNAs and thirty-eight miRNAs were involved in ceRNA network (Fig. 8). A total of 46 lncRNA-miRNA pairs, 232 miRNA-mRNA pairs, and 55 lncRNA-mRNA pairs were identi ed. The downregulation of 12 mRNAs in the ceRNA network was signi cantly related to OS in CC. Among them, the downregulation of seven mRNAs (CXCL12, FREM1, IGF1, IRF4, NFATC2, NTN1, and STAT6) had an adverse effect on prognosis, which was in line with what was expectated (Fig. 9).

Discussion
In this study, a cohort of 326 CC and 21 normal tissue samples from two datasets (TCGA, GSE6791) were included to identify the differential lncRNAs for patients with CC. A total of 1040 IRGs were collected from the InnateDB. Four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) were identi ed after ssGSEA analysis, the Pearson correlation coe cient analysis, univariate Cox regression analysis, and Kaplan-Meier survival analysis between the lncRNAs and IS. The prognostic risk model based on the four IRLs could divide CC patients into two risk groups according to the median RS, which was validated by dividing the total samples equally. The univariate and multivariate cox regression analyses of clinicopathological characteristics showed that age, AJCC stage, and the four-IRL signature were all independent prognostic factors. A nomogram was constructed based on age, AJCC stage, and the four-IRL signature to predict OS for CC patients more clearly. We also found that RS and IS showed signi cant negative correlation, which indicated that high RS had an adverse effect on prognosis due to a decrease in IS.
In our study, the four IRLs (BZRAP1-AS1, EMX2OS, ZNF667-AS1, and CTC-429P9.1) were identi ed to be protective against CC; only ZNF667-AS1 had been previously reported in CC. In the existing literature, ZNF667-AS1 with low expression has been identi ed to be negatively correlated with the OS, tumor size, and FIGO stage in CC [30]. Additionally, ZNF667-AS1 can competitively bind to miR-93-3p, which targets PEG3, to regulate the progression of CC [31]. Recent research has shown that inhibiting PEG3 would promote the immune escape of cancer cells [32]. BZRAP1-AS1 was found to be a novel biomarker associated with prostate cancer(PC), being down-regulated in PC samples [33]. It was shown, however, to be highly expressed in hepatocellular carcinoma and inhibited the transcription of THBS1 by recruiting DNMT3b to its promoter region [34]. Previous research has revealed that the downregulation of EMX2OS in classical papillary thyroid cancer might independently predict shorter recurrence-free survival [35], while the overexpression of EMX2OS in ovarian cancer and EMX2OS/miR-654/AKT3 axis may target PD-L1(programmed cell death protein 1) to suppress the initiation and progression of cancer [36]. Accumulating evidences have suggested that Thrombospondin-1 (THBS1) may affect tumor immunity [37]. PD1-PDL1(PD1 ligand) has already been shown to be an important immune checkpoint pathway, which can be used by cancer cells to evade immune attacks [38]. Thus, BZRAP1-AS1 and EMX2OS may play a dual role in cancer and directly or indirectly regulate tumor immunology. By contrast, no reports concerning CTC-429P9.1 in cancer have been published, and therefore, the role of CTC-429P9.1 remains unclear.
Our GSEA analysis showed that certain immune-related enrichment statuses were dramatically enriched in low-risk group. Non-canonical WNT signaling has been demonstrated to be closely associated with cancer stem cell survival, bulk-tumor expansion and invasion/metastasis, which have the potential for tumor immunological [39]. Ubiquitination has been shown to be crucial for tumor immunity [40].
Compared to a certain number of studies carried out to explore IRL signatures in several human malignancies [48][49][50][51][52][53][54], this was the rst study based on a comprehensive analysis that focused on IRL signature in CC. However, our study has some limitations. Additional data sets are needed to validate our study. The speci c molecular mechanisms in which these genes may be involved should be veri ed through in vitro and in vivo experiments.

Conclusions
We established a four-IRL-based signature with a prognostic value for CC, which could stratify CC patients into low-and high-risk groups. Meanwhile, the speci c mechanisms of the four-IRLs in the development of CC were preliminarily ascertained. This may provide the basis for tumor prevention and immunotherapy in the future.  Not applicable.

Competing interests
The authors declare that they have no con icts of interest.
Authors' contributions JZ, BC, NZ, and XZ contributed to data analysis, interpretation, and drafting. JZ and JT contributed to study design, study supervision, and nal approval of the manuscript. All authors read and approved the nal manuscript.
51. Wei C, Liang Q, Li X, Li H, Liu Y, Huang X, Chen X, Guo Y, Li J. Bioinformatics pro ling utilized a nine immune-related long noncoding RNA signature as a prognostic target for pancreatic cancer. J Cell Biochem. 2019;120 (9)