Mining the TCGA Database for Microenvironment ‐ Related Genes with Prognostic Value in Gastric Cancer


 BackgroundGastric cancer (GC) is the fifth most common cancer worldwide. Previous studies have suggested that the tumor microenvironment (TME) plays an important role in the development and prognosis of GC. In this study, we aimed to identify genes in tumor-infiltrating immune cells (TICs) that influence the progression and prognosis of GC. MethodsWe used the ESTIMATE algorithm to calculate the scores of the stromal and immune components of the TME in 407 GC samples collected from The Cancer Genome Atlas (TCGA) database.The differentially expressed genes (DEGs) were intersected by a protein-protein interaction (PPI) network and analyzed by univariate Cox regression.Further analysis showed the correlation between MCEMP1 and the clinicopathological characteristics of GC patients (clinical stage, distant metastasis) and survival.Then we used Gene set enrichment analysis (GSEA) and CIBERSORT analysis to examine the relationship between MCEMP1 and the TME.ResultsThe analysis revealed that the expression of MCEMP1 was positively correlated with the clinicopathological characteristics of GC patients (clinical stage, distant metastasis) and negatively correlated with survival. Gene set enrichment analysis (GSEA) indicated that gene sets in the MCEMP1 high expression group were concentrated mainly in immune-related pathways. CIBERSORT analysis of the proportion of TICs revealed that neutrophils and M2 macrophages were positively correlated with MCEMP1 expression, suggesting that MCEMP1 is responsible for preservation of the immune-dominant status of the TME. ConclusionHigh MCEMP1 expression might be a biomarker of a poor prognosis in GC patients and provide a clue regarding the different statuses of the TME, offering additional insight into therapy for GC.


Results
The analysis revealed that the expression of MCEMP1 was positively correlated with the clinicopathological characteristics of GC patients (clinical stage, distant metastasis) and negatively correlated with survival. Gene set enrichment analysis (GSEA) indicated that gene sets in the MCEMP1 high expression group were concentrated mainly in immune-related pathways. CIBERSORT analysis of the proportion of TICs revealed that neutrophils and M2 macrophages were positively correlated with MCEMP1 expression, suggesting that MCEMP1 is responsible for preservation of the immune-dominant status of the TME.

Conclusion
High MCEMP1 expression might be a biomarker of a poor prognosis in GC patients and provide a clue regarding the different statuses of the TME, offering additional insight into therapy for GC.

Background
Gastric cancer (GC) is the fth most common malignant tumor in the world and the third leading cause of cancer-related death [1]. The incidence of GC has been increasing rapidly [2]. Currently, the treatments for GC are based mainly on surgery, chemotherapy and radiotherapy [3]. However, these therapies have limited bene ts on patient survival. Therefore, it is urgently needed to explore the carcinogenesis and therapeutics of GC.
Research on molecular markers for early detection has attracted increasing attention. Recently, accumulating evidence has indicated that the tumor microenvironment (TME) is one of the fastest developing and most promising elds in solid tumors. The TME is the internal and external environment in which tumors are closely related to the occurrence, growth and metastasis of tumors, including cancer cells, stromal cells, immune cells, endothelial cells, various chemokines, cytokines and hormones [4,5]. Collaborative interactions between cancer cells and their supporting cells contribute to the malignant phenotypes of cancer (e.g., immortal proliferation, resistance to apoptosis, and evasion of immune surveillance). Therefore, the TME signi cantly in uences the therapeutic response and clinical outcome of cancer patients [4,6]. In the TME, immune and stromal cells are two major types of nontumor components and have been proposed to be valuable for the diagnostic and prognostic assessment of tumors. The ESTIMATE algorithm can be used to infer the TME score, which includes the stromal score, immunity score, and tumor purity, with expression pro le data [7]. This ESTIMATE algorithm has been effectively applied to prostate cancer, colon cancer and breast cancer [8][9][10]. Moreover, a large number of studies have shown that tumor-in ltrating immune cells (TICs) in the TME serve as a promising indicator for therapeutic effects [11]. These studies indicated that the TME shows great promise in improving prognosis and predicting the response to immunotherapy in GC patients.
In this study, we used the ESTIMATE algorithm to evaluate the StromalScore and ImmuneScore. We searched for the differentially expressed genes (DEGs) that correlated with the TME and GC prognosis. MCEMP1 was identi ed as a DEG that is closely related to the prognosis of GC. MCEMP1 encodes a oneway transmembrane protein that participates in the immune response and the regulation of mast cell differentiation [12]. Increased intratumoral mast cells can foster immune suppression and GC progression [13]. Previous studies have suggested that MCEMP1 may play a crucial role in the TME. Furthermore, we revealed that MCEMP1 was a potential prognostic indicator of GC. Therefore, the ndings of our study will have the potential to reveal a novel biomarker for early GC detection and prognosis prediction.

Data Pro le
The transcriptome RNA-seq data of 407 GC cases (normal samples, 32 cases; tumor samples, 375 cases) and the corresponding clinical data were downloaded from The Cancer Genome Atlas (TCGA) database (https://portal.gdc.cancer.gov/).
Estimation of the Immunity Score/Stromal Score/ESTIMATE Score We utilized the R package "estimate" to estimate the ratio of immune-stromal components in the TME for each sample, and the ImmuneScore, StromalScore, and ESTIMATEScore were calculated. According to the median score, the patients with GC were divided into a high score group and a low score group, and survival was analyzed according to stromal/immune components. Then, the StromalScore and ImmuneScore among different clinical indexes were compared with the R language packages "survival" and "survminer". A P value < 0.05 was considered statistically signi cant.

Survival Analysis
The R language packages "survival" and "survminer" were applied for the survival analysis. A total of 375 of 407 patients had detailed survival data, with a time span ranging from 0 to 10 years; thus, these data were subjected to survival analysis. The Kaplan-Meier method was used to plot the survival curve, and the log-rank test was used to examine statistical signi cance; p < 0.05 was considered signi cant.

Identi cation of DEGs
A total of 375 tumor samples were divided into high score and low score groups according to the median ImmuneScore or StromalScore. Differential analysis of gene expression was carried out using the R package "limma", and then the DEGs were obtained.

Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) Enrichment Analyses
DEGs were obtained from the intersection of differential stromal cells and differential immune cells. In R 4.0.2, the R packages "enrichplot", "ClusterPro ler", "ggplot2" and "org.Hs.eg.db" were used to perform KEGG and GO enrichment analyses. A P value (adjusted P value) < 0.05 was set as the threshold value.

Heatmaps
Heatmaps of the DEGs were produced by R language with the package "heatmap".

Analysis of Differential Scores or Gene Expression with Clinical Stages
The clinicopathological characteristics of the GC patients were downloaded from the TCGA. The analysis was performed with R language, and the Wilcoxon rank-sum or Kruskal-Wallis rank-sum test was used to determine signi cance depending on the number of clinical stages for comparison.

Protein-Protein Interaction (PPI) Network to Screen DEGs
The PPI network was constructed with the STRING database (https://string-db.org) with a con dence level > 0.9 as the lter value, and then the network was visualized with Cytoscape (version 3.6.1, https://cytoscape.org).

Cox Regression Analysis
The R language package "survival" was used for the univariate Cox regression analysis. The 18 genes with signi cant expression differences (ordered by p value from smallest to largest in the univariate Cox regression analysis) are shown as a forest map.

Gene Set Enrichment Analysis
The gene expression matrix le and cls le of group descriptions were used as input les. The C7 gene set v7.1 and the C2 KEGG gene set v7.1 were selected as the two main gene sets. All gene sets were permuted 1000 times for each analysis. Those with P < 0.05, NOM P < 0.05, and |NES| > 1 were considered statistically signi cant.

Evaluation of Immune Cell In ltration in GC
The CIBERSORT computational method was applied to estimate the TIC abundance pro le in all tumor samples. Then, GC patients were divided into a high expression group and a low expression group according to the expression level of the target gene (P < 0.05). The correlation test was used to assess the correlation between immune cell content and target gene expression (P < 0.05).

A high StromalScore was negatively correlated with survival
The analysis process of our study is shown in Fig. 1. A total of 407 samples (32 normal gastric tissue samples and 375 GC tissue samples) were collected from the TCGA database. A total of 375 cases of GC were divided into a high score group and a low score group according to the StromalScore/ImmuneScore, and then overall survival (OS) analysis of the StromalScore and ImmuneScore was performed. As shown in Fig. 2a, only the StromalScore was negatively correlated with the survival rate, while the ImmuneScore and ESTIMATEScore had no signi cant correlation with the OS rate ( Fig. 2b-2c). This suggests that the stromal components in the TME may be suitable for predicting the prognosis of GC patients.

StromalScore/ImmuneScore and clinical correlation analysis
To determine the relationship between the proportions of immune and stromal components with clinicopathological characteristics, the patients were divided into high and low StromalScore/ImmuneScore groups and intersected, and then the clinical characteristics of the two groups were compared based on stromal/immune components. Both the StromalScore and ImmuneScore showed positive correlations with tumor grade, tumor stage and T classi cation of the TNM staging system (Fig. 3a-3f, Supplementary Fig. 1a-1f). These results suggest that the ratio of immune and stromal components is associated with the progression of GC (e.g., invasion or metastasis).
DEGs shared by the ImmuneScore and StromalScore were overlapped to obtain effective upregulated and downregulated genes According to the median values of stromal/immune components, patients were divided into high and low score groups to screen DEGs. The heat map showed the expression pro le of special genes in the stromal ( Fig. 4a) and immune (Fig. 4b) components in 375 patients with GC. Based on the stromal/immune cell score group, 2173 upregulated DEGs and 382 downregulated DEGs were obtained from the StromalScore, and 1566 upregulated DEGs and 606 downregulated DEGs were obtained from the ImmuneScore. A Venn plot showed that a total of 1305 DEGs were identi ed after intersecting the StromalScore and ImmuneScore: 1070 DEGs in the upregulated groups (Fig. 4c) and 235 genes in the downregulated groups (Fig. 4d). These DEGs (a total of 1305 genes) could be used to determine the status of the TME.
Functional enrichment analysis indicated the enrichment of immune function: 5 DEGs were identi ed by the PPI network and Cox regression analysis R software was used to perform enrichment analysis of the DEGs. KEGG enrichment analysis (Fig. 5a) showed that most of the pathways related to the DEGs were signi cantly related to the immune response(cytokine-cytokine receptor interaction, cell adhesion molecules, B cell receptor signaling pathway and primary immunode ciency). GO enrichment analysis (Fig. 5b) also indicated that the DEGs mapped to immune-related GO terms, such as the immune response, signal transduction process, membrane composition, receptor complex, surface receptor activity and protein binding. Therefore, the overall functions of the DEGs seem to be closely related to immune-related activities, suggesting that immune factors play an important role in the TME of GC.
To further explore the underlying mechanism, 1305 overlapping DEGs were used to establish a PPI network with the STRING database. The network was visualized with Cytoscape software (Fig. 5c). Then, we displayed the rst 150 core genes of the network according to the number of network nodes. Five DEGs, MCEMP1, HTR2A, PTGFR, F13A1 and HGF, were intersected by the top node count genes in the PPI network and the prognostic genes of GC obtained from the univariate Cox regression analysis (Fig. 5d-5e).
MCEMP1 was signi cantly associated with a poor survival rate and prognosis To explore the potential value of these identi ed DEGs, differential analysis of gene expression was carried out on 5 DEGs between the normal and tumor groups. We determined that the expression of MCEMP1 was upregulated in tumor tissues compared to normal tissues (Fig. 6a-6b). Then, the survival and clinical correlation analyses of DEGs also showed that MCEMP1 was signi cantly correlated with poor survival (Fig. 6c) and prognosis ( Fig. 6d-6h, Supplementary Fig. 2a-2b). These data revealed that the MCEMP1 gene is involved in the development of GC and may have a great in uence on prognosis.

GSEA enrichment results of MCEMP1
Given that the level of MCEMP1 was correlated with the survival and TNM stages of GC patients, GSEA was implemented in the high expression and low expression groups based on the median level of MCEMP1 expression. In addition, the results of the KEGG enrichment analysis were visualized. As shown in Fig. 7, MCEMP1 expression was negatively correlated with antigen processing and presentation, galactose metabolism, glycosaminoglycan degradation, natural killer (NK) cell-mediated cytotoxicity, the NOD-like receptor signaling pathway, the proteasome, and the Toll-like receptor signaling pathway. These results demonstrated again that the expression of MCEMP1 in the TME of GC has a great in uence on the signaling pathways related to activation and the cytotoxic effect of immune cells.

Correlation of MCEMP1 with the proportion of TICs
To further determine the relationship between MCEMP1 and the TME, the proportion of tumor in ltrating immune subsets was analyzed using the CIBERSORT algorithm. The content of immune cells in each sample was obtained, and the correlation between immune cells was assessed (Fig. 8a). Subsequently, the correlation between MCEMP1 and 22 immune cells was analyzed (Fig. 8b). The results showed that a total of 14 kinds of TICs were correlated with the expression of MCEMP1, among which 5 kinds of TICs were negatively correlated with MCEMP1 expression (naive B cells, resting mast cells, activated NK cells, memory resting CD4 T cells, and T regulatory cells (Tregs)), and 9 kind of TICs were positively correlated with MCEMP1 expression (activated dendritic cells, eosinophils, M0 macrophages, M2 macrophages, activated mast cells, neutrophils, resting NK cells, memory activated CD4 T cells, and gamma delta T cells) (Fig. 8c-8d). These results further indicate that the expression of MCEMP1 is closely related to immune cells in the TME.

Discussion
In the present study, we attempted to identify DEGs associated with OS and the classi cation of TNM stages in GC patients from the TCGA database. MCEMP1 was identi ed to be involved in immune activities. Importantly, in this study, a series of bioinformatics analyses indicated that MCEMP1 might be an indicator of the status of the TME in GC patients. Furthermore, many studies have demonstrated that the TME is signi cantly correlated with the initiation, progression and metastasis of GC [14,15] and that TICs are closely related to the clinical prognosis of GC patients [16].
A large number of studies have shown that speci c genes and transcription factors play an important role in the TME of GC [15,17]. In recent years, many drugs targeting various components of the TME have been approved for clinical use, including aromatase, vascular endothelial growth factor (VEGF) and immune checkpoint inhibitors (ICIs) [18]. They have made great achievements in the treatment of GC.
Inhibition of the programmed death-1 (PD-1)/programmed death ligand 1 (PD-L1) axis by ICIs, including nivolumab and pembrolizumab, has become a new therapeutic strategy for advanced GC [19]. Immunotherapies targeting PD-1 and PD-L1 have been approved for GC patients [20]. However, ICIs have shown limited survival bene ts to GC patients. Most patients with GC either do not respond to ICIs, develop primary and acquired treatment resistance [21], or experience immune-related adverse events [22].
For advanced GC, tumor mutation load can also be used as a prognostic biomarker independent of PD-L1 expression [23]. Although both PD-L1 and tumor mutational burden (TMB) are widely used as biomarkers to judge the prognosis of patients, it is not suitable to use these two markers alone in many patients, so it is urgent to explore more biomarkers to meet the clinical requirements of immunotherapy. In our present research, we forested the correlation between TICs in the TME and prognosis and incorporated the clinical stage, survival rate and TNM stage of GC patients by analyzing data from the TCGA database. Among the DEGs identi ed, we found that the expression of MCEMP1 was closely related to the survival rate and advanced clinicopathological status (clinical stage and distant metastasis). These results indicate that MCEMP1 is a potential prognostic biomarker and a target for the treatment of GC.
MCEMP1 was identi ed as a high-risk gene for GC. MCEMP1 encodes a unidirectional transmembrane protein and participates in the regulation of mast cell differentiation or the immune response [12]. It is expressed mainly by monocytes and mast cells [24]. MCEMP1 is closely related to the innate immune system and plays an important role in regulating the immune response. Deletion of MCEMP1 can increase the activity of T lymphocytes, the expression of immunoglobulin and the activity of NK cells, inhibit the release of in ammatory factors and induce T lymphocyte apoptosis [25]. A clinical study of bladder cancer con rmed that MCEMP1 expression is related to the occurrence and prognosis of bladder cancer [24]. Another study established a risk score model to predict the prognosis of GC patients, which also included the MCEMP1 gene [26]. These results suggest that MCEMP1 is closely related to the prognosis of GC. In addition, the GSEA results showed that many immune-related signaling pathways, such as antigen processing and presentation, galactose metabolism, glycosaminoglycan degradation, NK cell-mediated cytotoxicity, the NOD-like receptor signaling pathway, the proteasome, and the Toll-like receptor signaling pathway, were downregulated in the MCEMP1 high expression group. These results indicate that MCEMP1 plays promotes the development of GC.
To further demonstrate the role of MCEMP1 in the prognosis of GC, we predicted the relationship between MCEMP1 and in ltrating immune cells. In this study, we found that MCEMP1 expression was related to 14 kinds of TICs and positively correlated with neutrophils (|R| = 0.68). It has been reported that tumor cells use neutrophils to enhance the metastatic ability of cancer. Neutrophils are the main kind of TIC in the TME of GC according to many reports, and neutrophil extracellular traps (Nets) are related to the biological behavior of many kinds of malignant tumors. It has been reported that tumor-educated neutrophils (TENS) can signi cantly promote the growth and metastasis of GC by inducing mesenchymal stem cells (MSCs) to transform into cancer-associated broblasts (CAFs) [27]. This result suggests that with the upregulation of MCEMP1 expression, the main response in the microenvironment of GC changes from an antitumor immune response to promoting tumor cell metabolism.

Conclusions
Overall, through bioinformatic analysis, our study innovatively revealed the gene expression characteristics of GC and their in uence on the TME. In addition, we found that MCEMP1 plays a regulatory role in the TME of GC and is closely related to the clinical prognosis of GC patients. These results might have implications for prognostic prediction and therapeutic guidance for GC. This study also further expands our knowledge on the mechanism of GC development. LT analyzed and interpreted data on gastric cancer patients in the TCGA database and wrote articles. GZ made an in-depth revision of the article. All the authors read and approved the nal manuscript. Figure 1 Overview fowchart of this study. StromalScore is associated with GC stages and their overall survival. a STAD cases were divided into two groups based on their StromalScore: As shown in the Kaplan-Meier survival curve, median survival of the low score group is longer than the high score group, as indicated by the log-rank test; P=0.003. b STAD cases were divided into two groups based on their ImmuneScore:The median survival of the low score group is longer than the high score group; however, it is not statistically different as indicated by the log-rank test; P =0 .717. c STAD cases were divided into two groups based on their ESTIMATEScore:The median survival of the low score group is longer than the high score group; however, it is not statistically different as indicated by the log-rank test; P = 0.540.

Figure 3
Distribution of StromalScore and ImmuneScore. a Association between StromalScore and different fuhrman grades. b Association between StromalScore and TNM staging. c Association between StromalScore and invasion depth. d Association between ImmuneScore and different fuhrman grades. e Association between ImmuneScore and TNM staging. f Association between ImmuneScore and invasion depth.  Analysis of DEGs. a KEGG term enrichment results with P < 0.05. b GO term enrichment results with P < 0.05. c Using Cytoscape to visualize the interaction between 1305 overlapping DEGs(with a con dence level of 0.9)from PPI network. d Forest map of multivariate regression analysis. e Venn diagrams show ve that overlap. Clinical correction analysis with MCEMP1. a Expression level of MCEMP1 in gastric tumors and normal tissues. b MCEMP1 mRNA levels in GC tissues and matched normal tissues in the TCGA dataset. c Kaplan-Meier curves for overall survival of MCEMP1,median survival of the low score group is longer than the high score group. d Association between MCEMP1 expression and age. e Association between MCEMP1 expression and grade. f Association between MCEMP1 expression and stage. g Association between MCEMP1 expression and T classi cation. h Association between MCEMP1 expression and lymph node metastasis. N0, N1, N2, and N3 represent 0, 1, 2, and 3 lymph node metastases. GSEA was used to show the enrichment results of high expression of MCEMP1. The horizontal axis represents the gene name, and the vertical axis represents the enrichment score. Each path was plotted with a different color curve, and the signal pathways of P < 0.05and FDR < 0.05were signi cant. The peak of the curve was on the right side, indicating that these pathways were negatively correlated with the high expression of MCEMP1. The green line represents the low group, and the red line represents the high group. c Venn diagrams