Construction and Validation of an Immune-Related lncRNA Prognosis Model for Thyroid Cancer

: Background: Immune-related long noncoding RNAs (lncRNAs) play an important role in the development of cancer. This study aimed to identify immune-related lncRNAs in thyroid cancer (THCA) and to develop a prognostic model for THCA. Method: We downloaded immune-related gene sets from the Gene Set Enrichment Analysis (GSEA) website and obtained THCA gene expression and clinical data from The Cancer Genome Atlas (TCGA) database. Immune-related lncRNAs were then obtained by performing a correlation analysis on the expression of lncRNAs and immune-related genes. Prognostic model for THCA immune-related lncRNAs was developed though univariate Cox regression and multiple Cox regression analyses. We confirmed the results in clinical samples using quantitative real-time PCR. Results: A totally of 26 immune-related lncRNAs in THCA were obtained. Then we constructed a prognosis model composed of seven lncRNAs (LINC01614, AC017074.1, LINC01184, LINC00667, ACVR2B-AS1, AC090673.1 and LINC00900). Our model can be used as an independent prognostic factor. Principal component analysis displayed that the lncRNAs in the model can distinguish between high and low-risk groups. Clinical correlation analysis showed that the expression levels of AC090673.1 (P<0.05), LINC01184 (P<0.001), and LINC01614 (P<0.001) were related to disease stage, and LINC00900 (P<0.001) and LINC01614 (P<0.001) were related to T stage. We validated this model in cancer and paracancerous tissues from 24 THCA patients. Conclusion: We identified and experimentally validated seven immune-related lncRNAs that can serve as potential biomarkers for THCA prognosis.


INTRODUCTION
Thyroid cancer (THCA) is a malignant tumour of the thyroid epithelium. Five-year survival rate for patients with invasive THCA is less than 50%. Most noninvasive patients have a good prognosis [1]. However, some patients with noninvasive THCA are prone to recur and distal metastases may occur [2]. Existing prognostic predictors for THCA do not accurately predict patients' survival and recurrence [3,4]. Therefore, novel and reliable markers for the prognosis of patients with THCA are urgently needed.

Tumour immunotherapy in thyroid cancer
Tumor immunotherapy has developed into an important treatment approach. It is gradually becoming a hot topic in oncology therapeutics and brings hope to cancer patients. In advanced follicular carcinoma and anaplastic THCA, multiple preclinical studies of cancer immunotherapy showed prospect for anticancer applications. 1 Anti PD-L1 treatment enhanced the effect of the BRAF inhibitor on THCA regression [5]. A variety of immune cells are present in the THCA microenvironment. These cells can fight tumors or promote them. In addition, proinflammatory cytokines and chemokines secreted by immune cells can promote or inhibit the proliferation of tumor cells [6]. The fact that cancer can survive in this immune microenvironment illustrates the critical role of immune regulation in THCA. Chronic inflammation is associated with the presence and severity of THCA [7]. The expression of immune-related genes in cancer cells can regulate the abundance of infiltrating immune cells, Affecting tumor diagnosis, prognosis, and sensitivity to clinical treatment. These findings have prompted researchers to explore the expression of immune-related coding and noncoding genes in THCA and provide ideas for the clinical diagnosis, treatment, and prognosis of THCA from an immune perspective.

Immune-related LncRNA in thyroid cancer
The immune-related genes associated with the prognosis of THCA patients have not been fully revealed. Long noncoding RNAs (lncRNAs) are important components of noncoding RNAs and play an important role in pretranscriptional, post-transcriptional, and posttranslational levels as well as in tumor development. Liyanarachi et al. found that the expression of lncRNAs XLOC051122 and XLOC006074 correlated significantly with lymph node metastasis and mutations in BRAF (V600E) in papillary THCA patients [8]. Lei et. al found that lncRNA TUG1 is related to the tumor progression of THCA [9]. Teng et al. showed that lncRNA PACERR and CYP4A22-AS1 are associated with the aggressiveness and disease-free survival of papillary THCA [10]. The clinical relevance and prognostic importance of immune-related lncRNAs need to be further investigated in THCA.
In this study, we provide insights into immune-related lncRNAs associated with THCA prognosis. A predictive model was developed and experimentally validated. The model served as an independent predictor. Our study provides potential immunotherapeutic targets for THCA.

Data collection
Expression profiles and clinical information of mRNA and lncRNA of patients with THCA were obtained from The Cancer Genome Atlas (TCGA: https://cancergenome.nih.gov/). It contains 58 paracancerous tissues and 509 THCA tissues. Patients with survival times of less than 30 days and unknown survival times were excluded.

Immune-related LncRNAs
We searched the Gene Set Enrichment Analysis (GSEA: https://www.gsea-msigdb.org/gsea/msigdb/index.jsp), using immune-related keywords "IMMUNE_RESPONSE" and "IMMUNE_SYSTEM_PROCESS", and then downloaded the gene sets. The expression of immune-related genes was extracted from TCGA and analyzed with Perl. Immunerelated lncRNAs were obtained by analyzing relevance between lncRNAs and immune-related genes by "limma"in R. The filter criteria were as follows: (1) all the genes in the simples had level of>0.5; (2) the levels of lncRNAs in all samples were in fluctuation.

Prognostic model construction
To screen immune-related lncRNAs that are significantly associated with prognosis, we analyzed the expression of lncRNAs in relation to patient survival time and status though univariate and multivariate COX regression analyses. The"survival" package in R was used (P<0.01). We used the function predicted to calculate the risk value for each patient. The calculation formula was: (coefficient of lncRNA1× expression of lncRNA1) + (coefficient of lncRNA2× expression of lncRNA2) + … + (coefficient of lncRNAn× expression of lncRNAn).

Patients and samples collection
A total of 24 patients with THCA admitted to Chongqing University Cancer Hospital from January to June 2017 were randomly selected. The pathological diagnosis of each patientwas assessed by three pathologists simultaneously according to the WHO classification criteria. Patients who received preoperative chemotherapy, radiotherapy, or any other adjuvant treatment were excluded from this study. The ethics has been approved by the Ethics Committee of Chongqing University Cancer Hospital. All participants signed informed consent forms.

RNA extraction and quantitative real-time PCR analysis
Tissues frozen in liquid nitrogen were collected, and RNA was extracted with on miRNeasy FFPE kit (Qiagen). MiRcute enhanced miRNA cDNA first strand synthesis Kit (TIANGEN) was used to reverse transcribe the RNA into complementary DNA. The qRT-PCR analysis was performed using the qPCR miRcute enhanced miRNA fluorescence quantification kit (TIANGEN). The results were normalised to glyceraldehyde-3-phosphate dehydrogenase (GAPDH) expression. We conducted three replicate experiments. The primers were synthesized by Thermofisher (Table S1).

Statistical analysis
All analyses were performed using R 3.4.3. The "survival" and "survminer" packages were used to plot survival curves for the high and low-risk groups. Heat maps were drawn using the "heatmap" package. The "ggpubr" package was used in correlation between lncRNAs and clinical features. Principal component analysis was performed using the "limma" and "scatterplot3d" packages. Univariate and multivariate Cox regression analysis were used to determine whether the constructed model can be used as an independent prognostic factor. SPSS 24.0 statistical software was used in analyzing the experimental data. The relative expression of lncRNA was calculated using the 2 -ΔΔCt method relative to GAPDH. The difference was considered significant when P<0.05.

Acquired the immune-related LncRNAs in THCA
To make the research process clear, a flow chart was drawn to illustrate the research design and analysis methodology ( Figure 1). We obtained THCA patients' sequencing data and clinicopathological information from TCGA, composed of 58 paraneoplastic and 497 cancerous tissues (Table 1). Immune-related gene sets named "Immune Response" and "Immune System Processes" were downloaded from the GSEA website. 331 immune-related genes were obtained by Gene expression analysis. Then, 822 immune-related lncRNAs were obtained by conducting correlation analysis of immune-related genes with lncRNAs (|Cor |>0.4, P<0.001). The process of building the prognostic model.

Prognostic model construction
To obtain immune-related lncRNAs associated with prognosis, we analyzed the relationship between the expression of immune-related lncRNAs and survival status. Univariate Cox regression analysis revealed that 26 immunerelated lncRNAs were significantly associated with patients' survival (P<0.01, Table 2). Multivariate Cox regression analysis was performed on the above 26 immune-related lncRNAs for the screening of seven lncRNAs with high prognostic diagnostic values and establishment of a prognostic model for immune-related lncRNAs in THCA (

Predicted good and poor prognosis patients with THCA
According to their median risk scores, the patients were divided into high-risk group (n=248) and low-risk group (n=249). We ranked the risk score and marked the survival status of the patients in a scatter plot. Mortality was higher in the high-risk group, and mortality increased with risk-score ( Figure 2A). The heat map revealed the relationship between the expression of lncRNAs in the model and the risk score in each patient ( Figure 2B). As the expression risk score increased, five lncRNAs had elevated expression, which were defined as high-risk lncRNAs (i.e., LINC01614, AC017074.1, LINC01184, LINC00667 and ACVR2B-AS1), whileereas AC090673.1 and LINC00900 had decreased expression, which were defined as low-risk lncRNAs. In addition, the results of the survival analysis showed a significant difference between the groups (P=1.41e-05, Figure 3).  Overall survival analysis of THCA patients in high-risk group and low-risk group.

Discrimination between high-and low-risk groups in different genomes
To verify the importance of lncRNAs as prognostic markers for THAC patients in our model, we conductedprincipal component analysis to determine whether lncRNAs can distinguish between high-and low-risk groups. Significant differences in the expression profiles of immune-related lncRNAs were found between the groups (Figure 4). The results indicated that the high-and low-risk groups can be discriminated by the lncRNAs in our model.

Evaluation of the prognostic models
We performed univariate and multivariate independent prognostic analyses to validate the accuracy and specificity of the prediction model and assess whether the model can be used as an independent prognostic factor. Univariate independent prognostic analysis revealed that the following factors were statistically significant, namely, age (P<0.001), stage (P=0.003), T stage (P=0.030) and the prognostic model (P<0.001) ( Figure 5A). Multivariate independent prognostic analysis displayed that age (P=0.002) and the model (P=0.025) could predict the overall survival (OS) for patients with THCA ( Figure 5B). This indicated that the model we constructed could predict the prognosis of THCA patients.

Analysis of the correlation between lncRNAs and clinical features
We performed correlation analysis to investigate the correlation between the expression of lncRNAs in the model and clinical characteristics,. The results suggested that the levels of AC090673.1 (P<0.05)，LINC01184 (P<0.001) and LINC01614 (P<0.001) were related to clinical stage. As the disease stage increased, the expression of AC090673.1 and LINC01614 fluctuated, and overexpression was observed in the late stage. By contrast, the expression of LINC01184 was relatively high and showed a trend of increasing before decreasing ( Figure 6A). The levels of LINC00900 (P<0.001) and LINC01614 (P<0.001) were associated with the T stage. As T stage increased, the expression of LINC00900 increased slightly and then declined, whereas the expression of LINC01614 decreased and then increased ( Figure 6B). These lncRNA prognostic markers accurately predicted the clinicopathological characteristics of patients.

Detected lncRNA expression levels in patients with THCA
To further evaluate the reliability of the constructed model, we examined the expression levels of seven lncRNAs in cancer and paracancer tissues of 24 patients with THCA (Table 4) through qRT-PCR (Table S2). The results showed that LINC01614, ACVR2B-AS1, AC017074.1, LINC01184, and LINC00667 were upregulated in THCA, whereas LINC00900 and AC090673.1 were downregulated ( Figure  7). The experimental results were consistent with the bioinformatic analysis. qRT-PCR validation of immune-related lncRNAs expression in paracancerous and cancer tissues of THCA patients. The experiment was repeated three times independently.

DISCUSSION
lncRNAs play critical roles in different stages of cancer immunity, including antigen release, antigen presentation, immune activation, immune cell migration, infiltration of cancer and killing of cancer cells. lncRNAs may be key regulators in reshaping the tumor immune microenvironment [11,12]. We analyzed the correlation between lncRNAs and immune-related genes in THCA and developed an immunerelated lncRNA prediction model composed of seven lncRNAs (LINC01614, AC017074.1, LINC01184, LINC00667, ACVR2B-AS1, AC090673.1 and LINC00900). The high-risk lncRNA LINC01614 in this model has been studied in breast cancer (BC). Vishnubalaji et al. found increased expression of LINC01614 in primary BC tissues and BC cell lines and concluded that it is an unfavorable prognostic marker for BC [13]. Wang et al. confirmed this result [14]. Sun and Ling found that high expression of LINC01614 indicated low OS rate in patients with non-small cell lung cancer (NSCLC) [15]. In our model, another highrisk lncRNA INC00667 was found to be a tumor promoter in NSCLC. Involved in the proliferation, migration, and angiogenesis of NSCLC cells, it is significantly associated with the OS of patients with NSCLC [16]. INC00667 correlated with the prognosis of BC and ovarian and hepatocellular carcinomas [17][18][19][20]. Zhou et al. found that LINC01184 is overexpressed in tumor-infiltrating B lymphocytes and involved in the immunosuppression of cancer [21]. ACVR2B-AS1 was also one of the high-risk lncRNAs in our model. The high expression of ACVR2B-AS1 is an independent poor prognostic factor for OS and relapse-free survival in patients with liver cancer [22].
The prognostic role of lncRNAs in THCA has been investigated from different perspectives, such as methylation and competing endogenous RNA. Li et al. analyzed the methylation and transcriptome of THCA, identified 483 epigenetically regulated lncRNAs associated with the development of THCA, and found that the methylationdriven 5-lncRNA-based signature is an independent prognostic factor [23]. Li et al. identified Lnc5m4 as an independent prognostic factor in a competing endogenous RNA network consisting of lncRNA, miRNA, and mRNA [24]. In this study, we constructed an immune-related prognostic model of THCA, which can be used as an independent prognostic factor.
In addition, we analyzed the correlation between the clinical characteristics of patients with THCA and the seven lncRNAs in our model. The results showed that AC090673.1, LINC01184, and LINC01614 were associated with clinical stage. LINC00900 and LINC01614 were associated with the T stage. LINC01614 is associated with clinical stage in patients with glioma and NSCLC [15,25]. LINC01184 is associated with the several clinicopathological profiles of rectal cancer, including clinical stage, tumor size, lymph nodes, and distal metastasis [26]. Regarding LINC00900, Wang et al. showed that the expression of LINC00900 increased with tumor grade in patients with primary glioblastoma [27]. LINC00900 is not differentially expressed in the neurogliomas of different clinical stages [28]. The relationship between these lncRNAs and the clinical characteristics of THCA patients needs to be further investigated.
The present study has some limitations. The model needs sample validation before it can be used in clinical settings. Recently, an immune-related lncRNA prognosis model based on the Gene Expression omnibus database was constructed [29]. The immune-associated gene sets in our study were derived from the GSEA database, and the prognostic model constructed was validated in multiple clinical samples, which made our conclusions more convincing. The prognostic model constructed in that study crossed over with our model but was not fully consistent. The posible reasons are the different sources of data and different sets of immune-related genes selected.

CONCLUSION
In summary, we constructed and experimentally validated a prognostic model for immune-related lncRNAs in THCA. The lncRNAs were significantly correlated with the clinical stage and T stage of THCA. Our results provide new targets for the prognosis and treatment of THCA.  Figure 1 The process of building the prognostic model.  Overall survival analysis of THCA patients in high-risk group and low-risk group.    qRT-PCR validation of immune-related lncRNAs expression in paracancerous and cancer tissues of THCA patients. The experiment was repeated three times independently.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.