Construction of a Novel Prognostic Immune-Related LncRNA Risk Model for Gastric Cancer

Studies have recently shown that immune-related lncRNAs play a vital role in the occurrence and development of human malignancies. However, the study in gastric cancer (GC) remains unclear. Here, we aimed to identify immune-related lncRNAs and construct a risk score model to predict the prognosis of GC patients. Methods: RNA expression data and clinical characteristics of GC were download from The Cancer Genome Atlas (TCGA) database. Immune genes were obtained from the Molecular Signatures Database (MSigDB). Immune-related lncRNAs were acquired by correlation coecient between the immune genes and lncRNAs using “limma R” package and Cytoscape 3.6.1. The risk score model was constructed by univariate and multivariate Cox regression, and its prognostic value was veried in TCGA cohort. Results: A total of 146 compared 375 with 32 samples. GC patients Furthermore, ROC analysis revealed that the risk score model had the best predictive effect compared with clinicopathological features during 5 years followed-up (AUC = 0.679). PCA analysis showed that the patients in the low- and high- group were signicantly distinguished in different directions based on the risk score model. Conclusion: This study indicated ve lncRNA


Introduction
Gastric cancer (GC) is the fourth commonly diagnosed cancers, and the second leading cause of cancerrelated deaths worldwide (Sung et al., 2021). In 2020, the number of new cases of GC are as many as 1,033,701 new cases, and the case of GC related death as many as 782,685, which contribute to a dramatic impact on global health (Sung et al., 2021). Although the combination treatment of surgery, chemotherapy, radiotherapy and immunotherapy has improved survival in GC patients, the survival rates of GC patients remain unsatisfactory (Akshatha et al., 2021). Therefore, it is crucial to explore the molecular mechanism of GC to develop new diagnostic and prognostic biomarkers to improve clinical outcomes.
Long non-coding RNAs (lncRNAs) are de ned as a class of non-protein-coding RNA transcripts, and more than 200 nucleotides in length (Geisler and Coller, 2013). LncRNAs account for a large part of the human genome, and are once considered transcriptional noise (Derrien et al., 2012). However, increasing studies recently reported that lncRNAs played vital role in transcriptional, post-transcriptional, and epigenetic levels (Lee, 2012;Ponting et al., 2009) In present study, we constructed a novel prognostic model according to comprehensive analysis of the immune-related lncRNA based on the data of 375 GC samples and 32 normal samples, which were downloaded from The Cancer Genome Atlas (TCGA). And, we identi ed 5 immune-related lncRNA associated with the prognosis of GC patients, which might be a potential prognostic markers and immune therapeutic targets for GC patients in future.

Data download
The RNA-seq FPKM (reads per kilobase per million) data of 375 GC samples and 32 normal samples, and corresponding clinical characteristics were downloaded from TCGA (https://portal.gdc.cancer.gov). The patients with a survival time less than 30 days were excluded. The immune-related genes list was downloaded from the Molecular Signature Database (MSigDB).

Identi cation of immune-related differentially expressed lncRNAs
The differentially expressed lncRNAs were obtained compared the GC samples with normal samples using the limma package in R 4.0 software based on the criteria of |log2 fold change (FC) | ≥1 and false discovery rate (FDR)<0.05. We calculated the correlation coe cient between the immune-related genes and lncRNAs to identify immune-related lncRNAs according to the criteria of correlation-Filter > 0.6 and p value < 0.05.

A prognosis model construction
The prognosis-related immune-related lncRNAs were identi ed through univariate and multivariate Cox regression analyses using the survival package in R software. Subsequently, a risk score model was constructed based on the expression quantity and coe cients of prognosis-related immune-related lncRNAs. The risk score was calculated for each GC patient was as follows: Risk scores = β1*Exp1 + β2*Exp2 + βi*Expi, where β represented the coe cient score, Exp represented the gene expression, and i represented i th prognosis-related immune-related lncRNAs.

Prognostic and independent analysis
A Kaplan-Meier survival curve was performed to evaluate the survival difference compared the low-risk group with the high-risk group based on the median risk score using survival package in R software. Subsequently, univariate and multivariate independence analysis was applied to explore the independence of the risk score model comparing with clinicopathological features (including age, gender, grade, stage, T stage, N stage and M stage). Moreover, the receiver operating characteristic (ROC) curve analysis were performed, and the values of the area under the curve (AUC) were calculated to evaluate the speci city and sensitivity of the risk score model and the clinicopathological features using survival ROC package.

Immune status analysis
Principal components analysis (PCA) was performed to discriminate the different immune statuses of GC patients according to the whole gene expression pro les, all immune-related lncRNAs and the risk score model through limma package and scatterplot3d package.

Statistical analysis
All analyses were performed by available packages in R software 4.1.0. Values of p < 0.05 were considered statistical signi cance.

Identi cation of immune-related lncRNA
The ow diagram of this study was shown in the Figure 1. A total of 14,142 lncRNAs were downloaded compared 375 GC samples with 32 normal samples from the TCGA database. Meanwhile, 331 immune genes were obtained from MSigDB. Immune-related lncRNAs were identi ed by constructing the coexpression network between lncRNAs and the immune genes using limma package in R and Cytoscape 3.6.1 ( Figure 2). Therefore, 146 immune-related lncRNAs were obtained in GC as the criteria of correlation-Filter > 0.6 and p value < 0.05.
Prognosis analysis of the risk score model Based on the above risk score model, there were 167 low-risk patients and 167 high-risk patients, respectively, based on median risk score. Survival analysis showed that the patients with higher risk score had a poorer overall survival (OS) compared with that with lower risk score ( Figure 4A). Moreover, the distribution of risk score curve and the survival status ranked by the risk score, and revealed that the highrisk patient had a relatively poor clinical outcome ( Figure 4B).
The independence prognostic analysis of the risk score model We further assessed that whether the ve immune-related lncRNAs risk score model possessed an independent prognostic value compared with other clinical risk factors such as age, gender, grade, stage, T stage, N stage, and M stage through univariate and multivariate Cox regression analyses. The results showed that the risk score model was an independent prognostic factor for GC patients (P<0.001) ( Figure  5A, Figure 5B). Furthermore, ROC analysis revealed that the AUC value of the ve immune-related lncRNAs risk score model was 0.679, which has the best predictive effect compared with age, gender, grade, T stage, N stage and M stage for GC patients after 5 years followed-up ( Figure 5C).

The correlation between the ve immune-related lncRNAs and clinicopathological characteristics
We estimated the correlation between the ve immune-related lncRNAs and clinicopathological characteristics (T stage, N stage and M stage) for GC patients by using chi-square test. We found that the expression of AP001528.2, PVT1 and LINC01094 was signi cantly associated with the depth of GC invasion ( Figure 6A). Moreover, the expression of LINC02542 was signi cantly correlated with the distant metastasis ( Figure 6B). However, the expression of these 5 immune-related lncRNA was not signi cantly related with lymph node metastasis ( Figure 6C).

The immune state of different risk groups
Principal component analysis (PCA) was performed to distribute the low-and high-group using the total gene expression, all immune-related lncRNAs and 5 immune-related lncRNAs risk score model in GC. The results revealed that the low-and high-group was not differentiated by immune status of GC patients based on the total gene expression and all immune-related lncRNAs ( Figure 7A and Figure 7B). However, the patients in the low-and high-group were signi cantly spread in different directions based on 5 immune-related lncRNAs risk score model ( Figure 7C). In present study, 146 immune-related lncRNAs were obtained by Person correlation analysis between the differentially expressed lncRNAs and the immune genes from MSigDB in GC. Then, 5 immune-related lncRNAs (AP001528.2, LINC02542, LINC02526, PVT1 and LINC01094) were identi ed as prognosisrelated lncRNAs by using univariate and multivariate Cox regression analysis. Subsequently, a risk score model was constructed based on these 5 immune-related lncRNAs, which was signi cantly associated with OS by survival analysis and had a satisfactory predictive value of 5-year survival for GC patients by ROC analysis. Moreover, the expression of AP001528.2, PVT1, LINC01094, and LINC02542 was signi cantly correlated with the depth of invasion and the distant metastasis in GC. In addition, the 5 immune-related lncRNAs risk score model can clearly distinguish the high-or low-risk group compared with the total gene expression or all immune-related lncRNAs through PCA analysis. Therefore, the present study proved that the 5 immune-related lncRNAs risk score model were novel biomarkers, and had a satisfactory predictive prognosis of GC patients. Furthermore, the 5 immune-related lncRNAs might be new immune therapeutic targets for GC patients in future.

Discussion
Recently, several studies have discovered that immune-related lncRNAs are identi ed and have satisfactory capacity predict the prognosis of human malignancies. For instance, a seven immune-related lncRNAs prediction model was constructed in lung adenocarcinoma (LUAD), which had a satisfactorily predictive e ciency and guided the personalized treatment for LUAD patients (Li et al., 2020). Zhao K et al. have discovered that the signature of six immune-related lncRNAs are identi ed in bladder cancer, and closely associate with the prognosis for the patients with bladder cancer (Zhao et al., 2021). Therefore, these six immune-related lncRNAs might be immunotherapy targets for bladder cancer (Zhao et al., 2021). A nine immune-related lncRNAs prediction model was constructed in colon cancer, and closely related with overall survival for the patients with colon cancer, which could be de ned as potential biomarkers affecting the prognosis of colon cancer (Lin et al., 2020). In present study, we rstly identi ed ve immune-related lncRNAs (AP001528.2, LINC02542, LINC02526, PVT1 and LINC01094) in GC, and constructed a risk score model based on these immune-related lncRNAs. Furthermore, we found that GC patients with a higher risk score had a poorer overall survival than that with a lower risk score. Therefore, these ve immune-related lncRNAs might be novel biomarkers to predict the prognosis, and immunotherapy targets for GC patients.
LncRNA PVT1 was found to be dysregulated in several human malignancies. For instance, PVT1 expression was signi cantly up-regulated in pancreatic ductal adenocarcinoma (PDAC) tissues compared with adjacent normal tissues. Meanwhile, patients with higher PVT1 expression level associated with shorter overall survival compared to those with lower PVT1 expression level (Huang et al., 2015). Similarly, the expression of PVT1 was higher in colon cancer tissues than that of adjacent tissues, and the higher PVT1 expression contributed to shorter disease-free survival and overall survival for the patients with colon cancer (Fan et al., 2018). In addition, PVT1 expression was dramatically increased in gastric cancer tissues compared with that in the normal control, and increased PVT1 expression resulted in poor overall survival and disease-free survival for the patients with gastric cancer (Yuan et al., 2016). However, we discovered that the overexpression of PVT1 was associated with a good prognosis for GC patients in present study. Therefore, it should be further veri ed potentially signi cant clinical implications of the PVT1 for GC patients in future.
Chen HY et al. have reported that the expression level of LINC01094 was prominently increased in ovarian cancer tissues compared with adjacent normal tissues, and LINC01094 overexpression promoted the viability, migration and invasion of ovarian cancer cells . Consistently, LINC01094 was highly expressed in the clear cell renal cell carcinoma compared with adjacent normal tissues, and the LINC01094 overexpression was associated with a poor prognosis for the patients with clear cell renal cell carcinoma . In present study, we rstly found that the overexpression of LINC01094 was associated with a poor prognosis for GC patients.
Obviously, there were some shortcomings in present study. First, all the data were obtained based on online databases. Therefore, further studies should be performed to verify our ndings. Second, we rstly discovered that the overexpression of AP001528.2, LINC02542 and LINC02526 was associated with a poor prognosis for GC patients. However, there was no report of these three immune-related lncRNAs involved in other human malignancies, nowadays. Therefore, the biological function and mechanisms of these immune-related lncRNA required further exploration in human malignancies. Third, it is important to explore the underlying mechanism of these ve immune-related lncRNA involved in the occurrence and development of GC in vivo and in vitro in future.
In conclusion, we constructed a novel ve immune-related lncRNAs risk score model which had a satisfactory predictive prognostic value for GC patients. Therefore, the ve immune-related lncRNAs risk score model might be potential prognostic biomarkers and immunotherapy targets for GC patients in future.   Co-expression network between immune gene and lncRNAs.   Validation of risk scores model of the ve immune-related lncRNAs in TCGA cohort. A, The higher risk score group had a poorer overall survival compared with the lower risk score group; B, The overexpression of the ve immune-related lncRNAs contributed to high-risk scores and poorer OS. Strati cation analysis of clinicopathological features. A, The forest plot of univariate Cox regression showed age, T stage, N stage and risk score model were associated with overall survival; B, The forest plot of multivariate Cox regression showed age and risk score model were associated with overall survival; C, ROC analysis showed that risk score model had an effective predictive prognosis for GC.

Figure 6
The relationship between ve immune-related lncRNA and clinicopathological features. A, The expression of AP001528.2, PVT1 and LINC01094 was associated with the depth of GC invasion; B, The expression of LINC02542 was correlated with the distant metastasis; C, The expression of ve immune-related lncRNA was not related with lymph node metastasis.