The Comprehensive Analysis of Competitive Endogenous RNA Networks and Tumor-Inltrating Immune Cells in Gastric Cancer With Lymph Node Metastasis

Background: Gastric cancer is a kind of tumor with strong heterogeneity. Long non-coding RNAs (lncRNAs) acting as competing endogenous RNAs (ceRNAs) play signicant roles in the development of tumors. Methods: In this study, we divided all TCGA gastric cancer patients into the whole, intestinal and diffuse cohorts for further analysis, and constructed competitive endogenous RNA network and evaluated immune cells using CIBERSORTx. The support vector machines recursive feature elimination (SVM-RFE) was used for screening signicant signatures and the support vector machines (SVM) for establishing model predicting the lymph node metastasis. Results: In this study, we divided all TCGA gastric cancer patients into the whole, intestinal and diffuse cohorts for further analysis, and constructed competitive endogenous RNA network and evaluated immune cells using CIBERSORTx. The support vector machines recursive feature elimination (SVM-RFE) was used for screening signicant signatures and the support vector machines (SVM) for establishing model predicting the lymph node metastasis. The performance of SVM model was good in the intestinal and diffuse cohort, while the model in the whole cohort was relatively poor. Some important co-expression patterns between immune cells and ceRNAs network indicated signicant correlation CD70 with dendritic cells and so on. Conclusion: Our research inferred competing endogenous RNA network of lymph node metastasis and built an excellent predicting model.

Results: In this study, we divided all TCGA gastric cancer patients into the whole, intestinal and diffuse cohorts for further analysis, and constructed competitive endogenous RNA network and evaluated immune cells using CIBERSORTx. The support vector machines recursive feature elimination (SVM-RFE) was used for screening signi cant signatures and the support vector machines (SVM) for establishing model predicting the lymph node metastasis. The performance of SVM model was good in the intestinal and diffuse cohort, while the model in the whole cohort was relatively poor. Some important coexpression patterns between immune cells and ceRNAs network indicated signi cant correlation CD70 with dendritic cells and so on.
Conclusion: Our research inferred competing endogenous RNA network of lymph node metastasis and built an excellent predicting model.

Introduction:
Gastric cancer (GC) is the sixth most common cancer and the second leading cause of cancer-related deaths worldwide 1 . Worldwide mortality rates for GC have declined in the past 10 years, however the survival rate remains low 2 . Many clinical, molecular, and pathologic data suggest that GC is a heterogeneous disease 3 . It's urgent to investigate the underlying mechanism by analyzing different groups of patients. The lymph node metastasis is the most common metastasis pattern of GC. According to the Lauren classi cation, there are the intestinal-type GC that is associated with lymphatic or vascular invasion and the lesions are scattered in distant positions and the diffuse-type GC that have the characteristic of non-cohesive, scattered tumor cells. So, the Lauren classi cation is an excellent classi cation method for our research. ceRNA regulatory mechanism is involved in many carcinoma 5 6 7 . Not only tumor cells but also tumorin ltrating immune cells participate in lymph node metastasis 8 .
In this study, based on the whole cohort, the intestinal-and the diffuse cohort, we constructed the ceRNA network respectively and evaluate the immune cells fraction using the CIBERSORTx algorithm 9 . The support vector machines recursive feature elimination (SVM-RFE) algorithm was applying for selected lymph node metastasis associated RNA and tumor-in ltrating immune cells. Moreover, we built a predicted model for lymph node metastasis using the support vector machines (SVM). The owchart of this research is shown in Figure 1 Construction of the ceRNA network and screening the best optimal RNAs: CeRNA is said to act as a sponge for miRNAs and regulate mRNA from miRNA binding. To better comprehend the role of differentially expressed lncRNAs and mRNAs in GC, we constructed a ceRNA regulatory network to elucidate the interaction Between lymph node metastasis or not. In the whole patient cohort, we identi ed 65 lncRNA-miRNA interaction pairs and 65 miRNA-mRNAinteraction pairs (Table S1 and S2) and constructed ceRNA network with 22 lncRNA, 43 miRNAs and 22mRNAs ( Figure S1). In the intestinal-type patient cohort, we identi ed 233 lncRNA-miRNA interaction pairs and 305 miRNA-mRNA interaction pairs (Table S3 and S4) and constructed ceRNA network with 39 lncRNAs, 157 miRNAs and 66 mRNAs ( Figure  S2). In the diffuse-type patient cohort, we identi ed 3 lncRNA-miRNA interaction pairs and 3 miRNA-mRNA interaction pairs (Table S5 and S6) and constructed ceRNA network with 3 lncRNAs, 3 miRNAs and 2 mRNAs ( Figure S3). One lncRNA and 3 mRNAs were screened out in the whole cohort applying SVM-RFE. 6 lncRNAs and 17 mRNAs were screened out in the intestinal cohort applying SVM-RFE. 3 lncRNAs and one mRNA were screened out in the diffuse-type cohort applying SVM-RFE.
Functional enrichment analysis: To further clarify the potential biological functions of mRNAs in gastric cancer, KOBAS 3.0 online database was used to perform functional enrichment analysis. Gene ontology analysis showed that there were signi cantly 3 GO terms in the whole cohort (Figure 2A), 20 GO terms in the intestinal cohort ( Figure 2B) and 18 GO terms in the diffuse cohort ( Figure 2C). The number of KEGG pathways enriched by mRNAs of ceRNA network was relatively less. There was just biosynthesis of amino acids in the whole cohort. Neuroactive ligand-receptor interaction and nicotine addiction were signi cantly enriched in the intestinal cohort and pyrimidine metabolism, thyroid hormone synthesis and purine metabolism were signi cantly associated with the diffuse cohort.
Composition of immune cells in GC and screening the best optimal immune cell types: Immune cells estimated by CIBERSORT algorithm are displayed in Table S7, S8 and S9. The violin plot depicted results of the Wilcoxon rank-sum test ( Figure 3). There were signi cantly different Mast cells resting between lymph node metastasis and non-lymph node metastasis in the whole cohort (p=0.012) ( Figure 3A) and intestinal-type cohort (p=0.05) ( Figure 3B) and signi cantly different Plasma cells (p=0.049) and T cells CD8 (p=0.035) in the diffuse-type cohort ( Figure 3C).
In addition, we applied SVM-RFE to screen and have 5 immune cells related with lymph node metastasis in the whole cohort, including: NK cells activated, macrophages M0, macrophages M2, mast cells resting and neutrophils. In the intestinal-type cohort, we had NK cells resting, dendritic cells resting and mast cells resting. Finally, we identi ed 4 immune cells, including: B cells memory, plasma cells, T cells CD8 and macrophages M0 in the diffuse-type cohort.
The co-expression analysis: We performed co-expression analysis of immune cells and RNAs signi cantly associated with lymph node metastasis. In the whole cohort and diffuse cohort, there was no evident co-expression interaction between RNAs and immune cells ( Figure 4A and 4C). In the intestinal cohort, FGF13-AS1 and KCNJ2 were positively associated with NK cells resting and CD70 was positively related with dendritic cells resting ( Figure 4B).
Construction of SVM classi er: lncRNA dataset, mRNA dataset and immune cells dataset associated with the lymph node metastasis were merged into a single dataset for improving the performance and reliability of model. The three categories of signature labels were utilized to build an SVM classi cation model, where the datasets were split into 10 folds and used for training dataset and testing dataset with a 7/3 ratio. We assessed the performance of prediction with accuracy, sensitivity, speci city, positive prediction value (PPV) and negative prediction value (NPV) and the detailed results are in Table 2. The considerable good prediction effects were observed in term of 5 metrics of models. Compared to the whole cohort, SVM classi er in the other two cohort were having a better performance. Discussion: With the development of technology, the diagnosis and treatment of GC have made great progress. But the recurrence and metastasis of tumor are important factors affecting prognosis. Then, Lymph node metastasis is the most common form of metastasis in GC. To investigate the mechanism of lymph node metastasis and to estimate the status of its are urgent. Intestinal-type GC is preceded by premalignant lesions, including chronic atrophic gastritis and intestinal metaplasia. Diffuse type GC is more common in young patients, in whom there is a female preponderance and behaves more aggressively than the intestinal type 17,18 . So, we thought that investigating GC as the whole could lose the speci c subtype related discoverable points. Previous studies showed that ceRNA network [19][20][21][22][23][24][25] and immune cells fraction supposed that these may also have an effect on lymph node metastasis. In present study, we screened differentially expressed mRNAs and lncRNAs and constructed ceRNA network for GC's lymph node metastasis to reveal its potential functions and mechanisms. The immune cell fractions were evaluated via CIBERSORTx. The lncRNAs, mRNAs and immune cell fractions selected by means of SVM-RFE were used to establish the SVM model for predicting the status of lymph node metastasis in the whole, intestinal and diffuse cohort, respectively. The results showed that the predicting effect of model in the intestinal and diffuse cohort were superior to that in the whole cohort, which proved their excellent clinical application.
In this study, we systematically integrated gene expression pro les and identi ed lncRNAs aand mRNAs in GC. And we didn't identify differentially expressed miRNAs, the reason is that we thought miRNA work as intermediate in the ceRNA network and an extreme example proving our ideas is that when the miRNA's expression is constant, the ceRNA network still works. So, using differentially expressed miRNAs for constructing ceRNA network could lead to the neglect of important lncRNA-miRNA-mRNA triples. We investigated the DElncRNAs, miRNAs and DEGs in the ceRNA network and took the intersection to RNAs of three cohort, respectively. There are not only common RNAs, but also speci c subtype's RNAs, which proved our hypothesis that the behavior of lymph node metastasis is different in Lauren subtype. Of course, the detailed mechanism needs us to further explore.
The performance of prediction model of three cohort using signi cant signatures after SVM-RFE perfectly illustrated different Lauren subtypes had different mechanism of lymph node metastasis. Probably because of fewer mRNAs, the few KEGG enrichment pathways were signi cant (p < 0.05). The KEGG enrichment pathways and GO terms of the diffuse cohort didn't need to be paid more attention, which the reason was that the number of selected mRNAs of the diffuse cohort was small. We found the intrinsic component of membrane of GO terms were common in the whole and intestinal cohort, which meant the intrinsic component of membrane maybe related with lymph node metastasis. In addition, the cell-cell signaling by wnt was speci c for the whole cohort and vesicle lumen, catalytic activity, positive regulation of nitrogen compound metabolic process, cellular anatomical entity and so on were speci c for the intestinal cohort.
We observed the differences in the components of immune cells between lymph node metastasis and non-metastasis and found signi cant NK cells activated, Macrophages M0, Macrophages M2, Mast cells resting and Neutrophils for the whole cohort, NK cells resting, dendritic cells resting and mast cells resting for the intestinal cohort and B cells memory, Plasma cells, T cells CD8 and Macrophages M0 for the diffuse cohort. The tumor microenvironment contains innate and adaptive immune cells, which display pro or anti-tumor functions 29 . Evidence accumulated from many cancer models suggested that macrophages 30 31 32 , Dendritic cells 33 34 , Mast cells 35 36 and NK cells 37 contributed to the lymph node metastasis of tumors. Signi cantly different immune cells just supported their association with lymph node metastasis.
In the co-expression of the intestinal cohort, our study suggested that dendritic cells were signi cantly associated with CD70. CD70 is not only implicated in tumor cell and regulatory T cell survival through interaction with its ligand, CD27 38 . Moreover, CD70 was reported that it was related with dendritic cells and NK cells 39 40 . Thus, There were several limitations of our study that should be acknowledged. The public data we use are all data on the population of Western countries, and this conclusion should be applied cautiously to Asian countries. Because we're analyzing Lauren subgroup population data, the number of cases in the subgroup of diffuse is relatively small, which may lead to less reliable results. Moreover, many inferences need to be proved by further experiments.
Although there are some inadequacies in our research, it still has many bright spots. First of all, our research established the ceRNA network and combined immune cells using CIBERSORTx, and these are based on two Lauren subgroups, which can reduce the effect of tumor heterogeneity. Then, the prediction results of SVM model also demonstrated that it was correct for us to classify and analyze patients with GC. The good performance of SVM model meant it could be used in clinical diagnosis.

Conclusion:
We speculated that CD70 might also play a role in the lymph node metastasis of GC and it maybe a potential therapy target. Our research inferred competing endogenous RNA network of lymph node metastasis and built an excellent predicting model.

Materials And Methods:
Data collection and differential gene expression analysis: The RNA-seq data with fragments per kilobase of transcript per million mapped reads (FPKM) and count and clinical data of the 375 GC samples from The Cancer Genome Atlas (TCGA; https://cancergenome.nih. gov/) database were downloaded by using the "TCGAbiolinks" package in R 10 . We matched and selected LncRNA and mRNA using GENCODE According to node stage (N stage) of tumor node metastasis stage (TNM stage), we divided all patients from TCGA into the whole cohort (all patients with N stage), the intestinal cohort (patients with intestinal-type classi cation and N stage) and the diffuse cohort (patients with diffuse-type classi cation and N stage). We obtained the differentially expressed mRNAs and lncRNAs by using the "DEseq2" packages in R software with |log2 fold change|> 1 with an adjusted false discovery rate (FDR) of P < 0.05 13 .
Construction of the ceRNA Network: First, we predicted the lncRNA-miRNA interaction pairs using differentially expressed lncRNAs (DElncRNAs) by LncBase Predicted v.2 with the threshold of 0.9 14 in all three groups. We thought the miRNA with zero expression in many samples can't work in the ceRNA network. So, the miRNAs with zero expression in more than 20% patients were removed in three groups, respectively. Then, we had the intersections of predicted lncRNA-miRNA and remaining 80% miRNA from above. The miRTarBase 15 were used to predict for the miRNA-mRNA interaction pairs. The miRNAs in the miRNA-mRNA interaction pairs were again took the intersection with the two kinds of interaction pairs. Finally, we established matched lncRNA-miRNA-mRNA triples. CIBERSORTx Estimation: In order to further explore the cytological causes of lymph node metastasis to some extent, the CIBERSORT algorithm 9 , with the B-mode of batch correction mode and 1000 permutations, was used to estimate the fraction of 22 immune cell types in all patients. Only cases with CIBERSORT P < 0.05 were considered eligible for subsequent analysis.
SVM-RFE and SVM: SVM-RFE is a machine learning method based on support vector machine, which is used to nd the best optimal variables by deleting SVM-generated eigenvectors. The variables closely related with lymph node metastasis status were selected from mRNA of ceRNA and lncRNA of ceRNA and immune cells, respectively. Then, we merged three data matrices into single matrix with lymph node metastasis status. In order to avoid over tting of predicting model, the patient cohorts were divided two groups on the basis of 7:3, one in front was named as the train cohort and one in back was named as the test cohort. We built SVM classi ers in the train cohort using screened lncRNAs, mRNAs and immune cells above and examined the performance of classi ers in the test cohort. The above analysis was implemented using the e1071 package in R available from https://CRAN.Rproject.org/package=e1071.