Immune-Related lncRNAs to Construct Novel Signature for Predicting Prognosis in Gastric Cancer

Background: Long non coding RNAs (lncRNAs) have many functions, including immune response. The signal irlncRNAs with no requirement of specic expression level seems to be valuable in predicting the prognosis of patients with gastric cancer (GC). Results: Our results suggested that immune related lncRNA signaling is of great value in predicting prognosis, and it may be possible to measure the response to immunotherapy. This feature may guide the choice of immunotherapy for GC. Conclusion: Immune-related lncRNA signals show independent prognostic signicance in GC. These results of this research could predict the prognosis of GC patients without detecting specic expression level of lncRNA, providing a possible method for predicting the survival of GC patients, and providing a potential lncRNA target for immunotherapy. analysis. (B-E) High risk scores were uncorrelated with (B) CTLA4, (C) PDCD1, (D) LAG3, and (E) HAVCR2 levels, whereas these results showed no statistical difference in patients with GC. (F-J) The model acted as a potential predictor for chemosensitivity as high-risk scores was related to a lower IC50 for chemotherapeutics such as cisplatin and docetaxel, whereas they were related to a higher IC50 for paclitaxel, mitomycin and doxorubicin. whereas the results of docetaxel and doxorubicin showed no statistical difference in patients with GC.

processes, including cell growth and development, and promote the proliferation of cancer cells (18,19).
With the emergence of new sequencing technology, abundant research has found that lncRNAs play a novel role in tumor biology (20). It is obvious that lncRNA, as a new prognostic and diagnostic biomarker, has great clinical application prospects. Given that the number of non-coding RNAs far exceeds proteincoding genes and show a high degree of tissue and cancer type speci city, characterizing new lncRNA targets may revolutionize cancer treatment. Meanwhile,recent evidence has suggested that lncRNAs contribute to the malignant phenotypes of cancer not only through genomic or transcriptomic alterations but also by altering the immune microenvironment.
By way of contrast, the accuracy of cancer prognosis model based on the combination of two biomarkers is superior than that of single gene marker (21). There are few prognostic models have explored the signi cance of lncRNA before. Therefore, in this research, we used a fresh modeling algorithm to construct immune-related lncRNAs (irlncRNAs) by pairing and iteration, which did not require any speci c expression levels. Then we evaluated the predictive value of this model in the prognosis, chemotherapy e cacy and tumor immune in ltration of GC patients.

Identi cation of Differentially Expressed irlncRNAs (DEirlncRNAs)
The process ow of this study is shown in Fig. 1. We retrieved the transcriptome pro ling data of GC from the stomach adenocarcinoma (STAD) project of the The Cancer Genome Atlas (TCGA) database; 32 normal and 375 tumor samples were included. Next, we annotated the data by gene transfer format (GTF) les from Ensembel. First, PCA principal component analysis indicated that gene expression of GC tissue was not identical with that of normal tissue ( Fig. 2A). Then, a co-expression analysis was performed between known immune-related genes and lncRNAs. A total of 4361 irlncRNAs were identi ed (shown in Table S1). We set parameters as |log FC|>2.0 and p < 0.005, and distinguished 161 differentially expressed immune-related lncRNAs were as DEirlncRNAs (Fig. 2B), among which 102 were upregulated and 59 were downregulated (Fig. 2C).

Establishment Of Deirlncrna Pairs And A Risk Assessment Model
Using an iteration loop and a 0-or-1 matrix screening among 161 DEirlncRNAs, 9,326 valid DEirlncRNA pairs were identi ed. We performed a univariate Cox proportional hazard regression analysis (P < 0.05) to identify survival-related DEirlncRNA pairs. A total of 81 DEirlncRNA pairs were analyzed further. Then the least absolute shrinkage and selection operator (LASSO) model ( Fig. 3A and 3B) was used to nd vital DEirlncRNA pairs from these 81 lncRNA pairs for further construction of the model. Finally, 31 pairs of vital DEirlncRNA pairs were selected, preferable 10 of which were included in a Cox proportional hazards model by the stepwise method (Fig. 3C). Next, we calculated the areas under curve (AUCs) for receiver operating characteristic (ROC) curve of this model, drew the curved line, and found the AUC value of 1year, 2-year and 3-year was 0.829, 0.822 and 0.818, respectively (Fig. 3D).

Clinical Evaluation By Risk Assessment Model
We calculated the risk score for each sample was based on the expression level of the 10 DEirlncRNA pairs. According to the median risk score, GC samples were divided into high-risk group and low-risk group. 187 cases were classi ed into the high-risk group and 188 into the low-risk group. Risk scores of each case are shown in Fig. 4A and 4B. These gures suggested that the clinical outcome of patients in the low-risk group was superior to that of patients in the high-risk group. Kaplan-Meier analysis showed that patients in the low-risk group exhibited a longer survival time than those in the high-risk group (p < 0.0001) (Fig. 4C). Univariate and multivariate Cox regression analyses were used to explore whether the risk assessment model was a prognostic factor for gastric cancer independent of clinicopathological factors, such as age, gender, and pathological stage. The hazard ratio (HR) of risk score and 95% CI were 1.378 and 1.264-1.503 in univariate Cox regression analysis (p < 0.001), and 1.357 and 1.234-1.492 in multivariate Cox regression analysis (p < 0.001), respectively, suggesting that the risk assessment model was a prognostic factor in patients with GC ( Fig. 4D and 4E). Then, we performed a series of chi-square tests to investigate the relationship between the risk of GC and clinicopathological characteristics. The strip chart (  Table S2.

Estimation of Tumor-In ltrating Immune Cells and Immunosuppressive Molecules with Risk Assessment Model
Because lncRNAs and immune-related genes were initially connected, we consequently investigated whether this model was related to the tumor immune microenvironment. We revealed that the high-risk group was more positively associated with tumor-in ltrating immune cells such as memory cancer associated broblasts, endothelial cells, macrophages, monocytes, memory CD4 + T cells and activated myeloid dendritic cells, whereas they were negatively associated with plasmacytoid dendritic cells, follicular helper T cells and CD4 + T cells, as revealed by the Wilcoxon signed-rank test. A detailed Spearman correlation analysis was conducted, and the resulting diagram exhibited a lollipop shape, as shown in Fig. 6A. Because ICIs are administered for treating GC in clinical practice, we investigated whether this risk model was related to ICI-related biomarkers. Nevertheless, these results showed no statistical differences (Fig. 6B-6E)

Analysis of the Correlation between the Risk Model and Chemotherapeutics
Besides checkpoint blockades therapy, we attempted to identify associations between risk and the e cacy of common chemotherapeutics in treating GC in the TCGA project of the STAD dataset. We showed that a high-risk score was associated with a lower half inhibitory centration (IC 50 ) of chemotherapeutics such as cisplatin (p = 0.023) and docetaxel (p = 0.44), whereas it was associated with a higher IC 50 for paclitaxel (p = 0.0096), mitomycin (p = 0.014) and doxorubicin (p = 0.089). Though the results of docetaxel and doxorubicin showed no statistical difference in patients with GC, above results indicated that the model acted as a potential predictor for chemosensitivity (Fig. 6F). Enrichment Analysis we further analyzed differentially expressed genes (DEGs) between the low-risk and high-risk groups in the cohort from TCGA. A total of 90 DEGs (73 upregulated and 17 downregulated genes, p value < 0.05) were identi ed in the high-risk group compared with the low-risk group (Figs. 7A). Next, we conducted GESA and KEGG enrichment analysis to further clarify biological processes related to the risk score. As illustrated in Figs. 7B, enrichment analysis indicated that KEGG was mainly enriched in calcium signaling pathway, gastric acid secretion, neuroactive ligand-receptor interaction, and pancreatic secretion. In the GSEA enrichment analysis, B cell differentiation pathway, B cell activation pathway, cellular response to TGF-β stimulus pathway, and regulation of leukocyte migration pathway were signi cantly enriched in the high-risk group (Figs. 7C-F).

Discussion
GC is one of the most common malignant tumors with a high mortality rate in the world, with highly heterogeneous biological characteristics (2,7). The high heterogeneity of GC not only exists in the genotype and phenotype of tumor cells, but also exists in the tumor microenvironment (22). GC tissue is not only composed of GC cells, but also mixed with various normal cells, such as immune cells, stromal cells and broblasts (23). These different types of cells interact and co-evolve, eventually forming a complex whole. In recent years, in-depth sequencing studies of the transcriptome have found that about 4/5 of the transcripts in the human genome are protein non-coding genes, including lncRNAs (24). They participate in the occurrence, development, invasion and metastasis of GC through a variety of ways (25)(26)(27)(28). LncRNA is also closely related to tumor immunity. Hu et al. reported that long non coding RNA-LINK-A speci cally expressed in human tissue induced metastatic breast cancer in mice by reducing phosphorylation of E3 ubiquitin ligase TRIM71 mediated by protein kinase A(29);Li's research results suggest that tumor derived lncRNA TUC339 was involved in the regulation of macrophage activation and played a key role in the regulation of macrophage M1/M2 polarization (30). Zhao et al. found for the rst time that SNHG14/miR-5590-3p/ZEB1 positive feedback loop promotes the progression and immune evasion of diffuse large B cell lymphoma (DLBCL) by regulating PD-1/PD-L1 checkpoint, suggesting that targeting SNHG14 would be a potential way to improve the effect of DLBCL immunotherapy (31).
The continuous research on lncRNAs and immune system makes researchers realize that immune related lncRNAs can not only be used as potential prognostic biomarkers, but also as latent therapeutic targets.
Based on immune related lncRNAs and tumor immune in ltration, the signature shows good predictive and prognostic value in tumor diagnosis, evaluation and treatment. Cao et al. selected ve prognostic lncRNAs and constructed the immune related lncRNAs signature by Lasso Cox regression analysis, then con rmed that it was a reliable independent prognostic factor and signi cantly positively correlated with the in ltration of immune cells in tumor microenvironment and the expression of key immune checkpoints (32). Song et al. constructed a signature based on eight lncRNAs, identi ed an immunerelated prognostic signature based on lncRNAs and found 4 key immune-related genes (LIG1, TBX1, CTSG and CXCL12) in bladder urothelial carcinoma (33). Ma constructed and veri ed a robust signature of 8 immune-related lncRNAs for the prediction of breast cancer patient survival (34). In this study, we established a model based on immune related lncRNA by univariate Cox regression and Lasso regression analysis, and used the model to verify the clinical characteristics, chemotherapy drugs and immunotherapy. The results indicated that the model showed prefer predictive performance and the signature was robust and reliable, which can effectively divide gastric cancer patients into high-risk group and low-risk group.
Speci cally, we retrieved the original data of lncRNAs from TCGA, performed differential co-expression analysis to classify DEirlncRNAs, and used the improved cyclic single pairing method and 0 or 1 matrix to verify lncRNA pairs. Then we performed univariate analysis combined with Lasso regression to determine 10 vital DEirlncRNAs pairs and established a novel assessment model. We calculated the AUC value of 1year, 2-year and 3-year of this model's ROC curve. We scored the risk of the model and divided data into a low-risk group and a high-risk group based on the median score. The prognostic prediction e cacy of risk score was validated from several aspects. Firstly, Kaplan-Meier analysis was performed, which indicated that patients in the low-risk group exhibited a longer survival time than those in the high-risk group. Secondly, in order to explore the feasibility of prognostic markers in clinical application, we analyzed the age, gender, pathological stage and other clinical indicators of GC patients, evaluated the association between risk score and clinical characteristics. Although it was an independent prognostic indicator irrespective of other clinical symptoms, the patients divided by risk score showed signi cantly different characters. The model was then used to analyze the e cacy of chemotherapy of GC, tumor immune in ltration, and biomarkers related to checkpoint inhibitors, which means that the modeling algorithm was working well.
Lastly, the enrichment analysis of GSEA and KEGG pathways revealed several signi cantly enriched pathway signals. Patients in the high-risk group mainly focused on B cell differentiation pathway, B cell activation pathway, cellular response to transforming growth factor beta stimulus pathway, and regulation of leukocyte migration. Literature has shown that these pathways are closely related to the immune process. But more evidence is needed to support this hypothesis. In addition, the research results also revealed the underlying molecular mechanism, providing a promising direction for immunotherapy. However, we recognized some shortcomings and limitations of this research. For example, the original data set used for the initial analysis only comes from the TCGA database, and objectivity was relatively insu cient. We were unable to simultaneously search data sets of other databases that support lncRNA expression levels, clinicopathological characteristics and survival results of patients with GC. This model at the same time requires external clinical veri cation, because the expression level of each sample is different, which may cause the nal model to be unreliable. But we creatively constructed a 0-or-1 matrix to screen all lncRNA pairs to reduce sample errors caused by expression changes. In addition, we also used single factor and multi-factor analysis, lasso regression analysis, roc curve and other methods to verify the new modeling algorithm, optimized and applied it. Based on these results, we assumed that our model was acceptable, despite the lack of external data veri cation. However, external veri cation of clinical data sets would be bene cial. Therefore, in future work, we will re-collect clinical samples and expand the sample size for further veri cation, and the evaluation of the sample size will be very timeconsuming.

Conclusion
In conclusion, immune-related lncRNA signals show independent prognostic signi cance in GC. These results of this research could predict the prognosis of GC patients without detecting speci c expression level of lncRNA, providing a possible method for predicting the survival of GC patients, and providing a potential lncRNA target for immunotherapy.

Retrieval of Transcriptome Data, Preparation, and Differentially Expressed Analysis
The data of RNA expression pro les and clinical information for GC were downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/repository), including 375 GC tissues and 32 non-tumor tissues. GTF les were downloaded from Ensembel (http://asia.ensembl.org) for annotation to distinguish the mRNAs and lncRNAs for subsequent analysis. A list of recognized immunerelated genes (ir-genes) was downloaded from the ImmPort database (http://www.immport.org) and was used to screen irlncRNAs by a co-expression strategy. Correlation analysis was performed between irgenes and all lncRNAs. Those with immune gene correlation coe cients more than 0.4 and p value less than 0.001 were considered as irlncRNAs. To identify the DEirlncRNA, we used R package Dseq2 for differential expression analysis among irlncRNAs. The thresholds were set as log fold change |FC| >2.0 along with a p value <0.05.

Pairing DEirlncRNAs
The DEirlncRNAs were cyclically singly paired, and a 0-or-1 matrix was constructed assuming C is equal to lncRNA A plus lncRNA B; C is de ned as 1 if the expression level of lncRNA A is higher than lncRNA B, otherwise C is de ned as 0. Then, the constructed 0-or-1 matrix was further screened. No relationship was considered between pairs and prognosis if the expression quantity of lncRNA pairs was 0 or 1 because pairs without a certain rank could not properly predict patient survival outcome. When the amounts of lncRNA-pairs of which expression quantity was 0 or 1 accounted for more than 20% of total pairs, it was considered a valid match.

Establishment of a Risk Model to Evaluate the risk score
Univariate Cox analysis was performed to assess the association between the expression levels of DEirlncRNA pairs and the overall survival (OS) of patients with a p value < 0.05. 81 pairs of DEirlncRNA pairs related to prognosis were selected. Then the least absolute shrinkage and selection operator (LASSO) model was used to nd vital DEirlncRNA pairs from these 81 lncRNA pairs by utilizing the package glmnet in the R software as well as for construction of the model. Finally, 31 pairs of vital DEirlncRNA pairs were selected. The random forest plot was performed using the R package survminer. Preferable 10 of the 31 pairs were chosen and calculated the risk score by the formula: risk score=∑ki=1βiSi. Then the risk score for each sample was calculated based on the expression levels of the 10 DEirlncRNA pairs. According to the median risk score, GC samples were divided into high-risk group and low-risk group. 187 cases were classi ed into the high-risk group and 188 into the low-risk group. The AUC value of the model was calculated and drawn as a curve. The 1-, 2-, and 3-year ROC curves of the model were plotted.

Validation of the Constructed Risk Model
The Kaplan-Meier analysis was performed to single out the survival difference signi cantly associated with the OS from DElncRNAs pairs, which were selected by the LASSO method. The R packages utilized in these steps included survival, glmnet, pbapply, survivalROC, survminer, and pHeatmap.
To verify the clinical application value of the constructed model, we performed the chi-square test to analyze the relationship between the model and clinicopathological characteristics. Wilcoxon signed-rank test was used to calculate the risk score differences among different groups of these clinicopathological characteristics. To con rm whether the model can be used as an independent clinical prognostic predictor, we performed univariate Cox regression analyses between the risk score and clinicopathological characteristics. A forest map was used to demonstrate the results. The R packages utilized in these operations were survival, pHeatmap, and ggupbr.

Investigation of Tumor-In ltrating Immune Cells
To analyze the relationship between the risk and immune-cell characteristics, we considered the currently acknowledged methods to calculate the immune in ltration statues among the samples from the TCGA project of the STAD dataset including XCELL, TIMER, QUANTISEQ, MCPCOUNTER, EPIC, CIBERSORT-ABS and CIBERSORT. The differences in immune in ltrating cell content explored by these methods between high-risk and low-risk groups of the constructed model were analyzed by Wilcoxon signed-rank test; the results are shown in a box chart. Spearman correlation analysis was performed to analyze the relationship between the risk score values and the immune in ltrated cells. The correlation coe cients of the results were shown in a lollipop diagram. The signi cance threshold was set as p <0.05. The procedure was performed using R ggplot2 packages.

Exploration of the Signi cance of the Model in the Clinical Treatment
To evaluate the model in the clinic for gastric cancer treatment, we calculated the IC 50 of common administrating chemotherapeutic drugs in the TCGA project of the STAD dataset. Antitumor drugs such as cisplatin, docetaxel, paclitaxel, mitomycin and doxorubicin are recommended for GC treatment by AJCC guidelines. The difference in the IC 50 between the high-risk and low-risk groups was compared by Wilcoxon signed-rank test and the results are shown as box drawings obtained using with pRRophetic and ggplot2 of R.

Analyses of the Immunosuppressive Molecules Expressing Related to ICIs
To study the relationship between the model and the expression level of genes related to ICIs, we performed ggstatsplot package and violin plot visualization.

Functional Enrichment Analysis
GESA and KEGG pathway enrichment analyses were performed in R using the function of clusterPro ler. The signi cance threshold was set at p < 0.05. Abbreviations GC, Gastric Cancer; ICIs, immune checkpoint inhibitors; lncRNAs, long non-coding RNAs; irlncRNAs, immune-related lncRNAs; DEirlncRNAs, Differentially Expressed irlncRNAs; TCGA, The Cancer Genome Atlas; GTF, gene transfer format; LASSO, least absolute shrinkage and selection operator; AUC, areas under curve; ROC, receiver operating characteristic; HR, hazard ratio; DEGs, differentially expressed genes.

Declarations
Ethics approval and consent to participate Not necessary.

Consent for publication
Not applicable.

Availability of data and materials
The raw data of this study are derived from the TCGA database (https://portal.gdc.cancer.gov/), which are publicly available databases.

Competing interests
The authors declare that they have no competing interests.

Funding
There was no funding for this work.

Authors' contributions
This research was conducted in collaboration with all authors. TB performed the data curation and analysis. TB and ZW analyzed and interpreted the results. ZW and JX drafted and reviewed the manuscript. All authors read and approved the nal manuscript.     Estimation of Tumor-In ltrating Cells and Immunosuppressed Molecules by the Risk Assessment Model.
(A) Patients in the high-risk group were more positively associated with tumor-in ltrating immune cells such as memory cancer associated broblasts, endothelial cells, macrophages, monocytes, memory CD4+ T cells and activated myeloid dendritic cells, whereas they were negatively associated with plasmacytoid dendritic cells, follicular helper T cells and CD4+ T cells, as shown by Spearman correlation analysis. (B-E) High risk scores were uncorrelated with (B) CTLA4, (C) PDCD1, (D) LAG3, and (E) HAVCR2 levels, whereas these results showed no statistical difference in patients with GC. (F-J) The model acted as a potential predictor for chemosensitivity as high-risk scores was related to a lower IC50 for chemotherapeutics such as cisplatin and docetaxel, whereas they were related to a higher IC50 for paclitaxel, mitomycin and doxorubicin. whereas the results of docetaxel and doxorubicin showed no statistical difference in patients with GC. Figure 7 GESA and KEGG enrichment analysis with differentially expressed genes (DEGs) between the low-risk and high-risk groups. (A)Volcano plot showing differentially expressed genes between the low-risk and highrisk groups in the cohort from TCGA. Genes labeled in red or green are signi cantly differentially up or downregulated, respectively. (B)KEGG Enrichment analysis indicting the biological process risk score was mainly involved in calcium signaling pathway. (C-F) GSEA between low-risk and high-risk groups revealing B cell differentiation pathway, B cell activation pathway, cellular response to transforming growth factor beta stimulus pathway, and regulation of leukocyte migration.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.