CXCR4 negatively correlates with NKT cell inltration during gastric precancerous lesions progression to early gastric cancer

Background: With the coming of immunotherapy era, immunotherapy is gradually playing a vital role in the treatment of gastric cancer (GC). However, immune microenvironment in gastric precancerous lesions (GPL) and early gastric cancer (EGC) still remain largely unknown. Methods: From the Gene Expression Omnibus (GEO), data of three GPL-related gene expression proles (GSE55696, GSE87666 and GSE130823) and three GC data sets with clinical information (GSE66229, GSE15459 and GSE34942) were downloaded. Three GC data were consolidated as a GC meta-GEO cohort. RNA sequencing data of 375 stomach adenocarcinoma (STAD) samples with clinical information from The Cancer Genome Atlas (TCGA) and 175 stomach normal controls (NC) from Genotype-Tissue Expression (GTEx) datasets were obtained from the UCSC Xena browser, which were merged as a STAD TCGA-GTEx cohort. The abundance of immune cells in above datasets were estimated using Immune Cell Abundance Identier (ImmuCellAI) algorithm. Firstly, key immune cells associated with GPL progression to EGC were identied using one ‐ way analysis of variance (ANOVA) test as well as Spearman’s correlation test in two GPL and EGC related datasets (GSE55696 and GSE87666). Then, weighted gene co-expression analysis (WGCNA) and pathway enrichment were adopted to identify hub gene co-expression network. Candidate hub genes were identied based on network parameters. Combining expression comparison and prognosis analysis in STAD TCGA-GTEx and GC meta-GEO cohort, Genes with signicant difference between GC and NC and prognostic signicance were identied as real hub genes. Correlation between real hub genes and key immune cells was evaluated using Pearson’s correlation test. The pattern of key immune cells inltration and hub genes expression as well as their correlation during GPL progression to EGC were validated in an independent cohort GSE130823. The correlation was also veried in

Conclusion: CXCR4 and NKT cell are possible to serve as biomarkers in monitoring GPL progression to EGC. Besides, CXCR4 may be involved in regulating NKT cell in ltration during GPL progression to EGC, which may provide a new immunotherapeutic target.

Background
Gastric cancer (GC) is the fth most common malignancy and the third leading cause of cancer-related mortality [1]. Histologically, Lauren categorized GC into two subtypes: intestinal and diffuse types [2]. Intestinal-type GC (IGC), with a clear and multistep histological evolution starting from chronic in ammation and progressing to atrophy, intestinal metaplasia, gastric precancerous lesion [GPL, including low-grade intraepithelial neoplasia (LGIN) and high-grade intraepithelial neoplasia (HGIN)] and frank malignancy, is closely related with in ammatory and immune response [3,4]. As early gastric cancer (EGC) has a far better prognosis than advanced GC, early diagnosis and treatment are essential [5].
In ammatory cell in ltration exhibits a positive or negative effect on tumor invasion, growth, metastasis, and prognosis [6]. With the coming of immunotherapy era, immunotherapy is gradually playing a vital role in the treatment of GC and much attention has been paid to the speci c stage of GC [7]. However, immune microenvironment of GPL and EGC still remain largely unknown.
With the development of public databases, such as the Gene Expression Omnibus (GEO), large amounts of genome, epigenome, transcription and proteomics data have become more accessible [8].
Correspondingly, multiple algorithms for analyzing omics data have been developed over the past few years [9], which provides more opportunities to uncover the immune microenvironment and molecular mechanisms of carcinogenesis. For instance, Immune Cell Abundance Identi er (ImmuCellAI) algorithm can estimate the abundance of immune cell based on transcriptome data [10]. Weighted gene coexpression analysis (WGCNA) can identify key biological targets and therapeutic targets through constructing a co-expressed gene network and exploring its correlation with clinical traits [11].
In current study, we used ImmuCellAI to quantify the proportions of immune cells of GPL and EGC from GEO database and combining with one-way ANOVA and correlation analysis, NKT cell was found signi cantly related to GPL progression to EGC. WGCNA was applied to identify gene co-expression network and hub genes associated with NKT cell abundance. Finally, in three GPL cohorts, we found that during GPL progression to EGC, CXCR4 was gradually up-regulated while NKT cell abundance decreased and there was a signi cantly negative correlation between them. CXCR4 had poor prognosis and its negative correlation with NKT cell in ltration was also con rmed in GC datasets (GC meta-GEO cohort and STAD TCGA-GTEx cohort). These ndings may provide targets for immunotherapy on GPL and EGC.
Characteristics of datasets in this study were displayed in Table 1. Raw data of the three GPL-related gene expression pro les were extracted. Background correction and normalization were performed by "limma" package of R respectively [19].
Data of the three GC data sets were extracted and normalized using the RMA algorithm. Then they were transformed to the form of log base 2 and consolidated as a GC meta-GEO cohort totally with 100 NC and 556 GC. The batch effects were removed by the "combat" function of "sva" package of R [20].
As for GTEx and TCGA datasets, RNA sequencing data (FPKM values) were transformed into log2 (FPKM + 1). Then, they were merged as a STAD TCGA-GTEx cohort and normalized according to the "normalizeBetweenArrays" function of the package of "limma" in R software so that the expression values have similar distribution across a set of arrays [19].

Identi cation of key immune cells associated with GPL progression to EGC
We used "stage" to represent the severity of the pathology of tissues. In the following correlation calculation, CG was represented by "1", LGIN was represented by "2", HGIN was represented by "3" and EGC was represented by "4". The relationship between immune cell abundance with pathological stage was preliminarily analyzed by one-way analysis of variance (ANOVA) test with p < 0.05 set as the cut-off. Then, correlation between immune cell abundance and pathological stage was calculated using Spearman's correlation test. Common immune cells with a Spearman's r greater than 0.5 and a p-value less than 0.05 in GSE55696 and GSE87666 cohorts were considered as key immune cells. After further veri cation between GC and NC in the GC meta-GEO cohort and STAD TCGA-GTEx cohort, immune cells with correspondingly consistent alterations were subjected for identifying related gene co-expression network.

Construction Of Gene Co-expression Networks
According to sample clustering, there was an outlier sample in GSE87666 while none in GSE55696 ( Fig. 4a and 4b). After removing the outlier sample (GSM2338022), 4359 genes in GSE55696 and 4663 genes in GSE87666 with the highest expression variance (top 25%) were selected for subsequent WGCNA. β = 6 (scale-free R 2 = 0.87) and β = 13 (scale-free R 2 = 0.86) were the lowest power t scale-free index over 0.85 and determined as the soft-thresholding power parameter to ensure a scale-free network in GSE55696 and GSE87666 respectively ( Fig. 4c-f). Eventually, genes with similar expression patterns were grouped into eight co-expression modules respectively in GSE55696 and GSE87666 ( Fig. 5a and 5c). The grey module is a set of genes that cannot be clustered to any module. In GSE55696, the modules of blue (r = 0.46, p = 3e-05) and yellow (r = 0.26, p = 0.02) were signi cantly positively correlated with the abundance of NKT cell while pink (r = -0.40, p = 3e-04), green (r = -0.75, p = 5e-15), and red (r = -0.50, p = 5e-06) displayed signi cantly negative (Fig. 5b). In GSE87666, the red module (r = 0.83, p = 4e − 06) was signi cantly positively correlated with the abundance of NKT cell while the green module (r = -0.65, p = 0.002) displayed signi cantly negative (Fig. 5d).

Identi cation Of Hub Modules Associated With Key Immune Cells
Module eigengenes (MEs) was the major component of a gene module. To identify the modules signi cantly associated with key immune cells in ltration, we calculated the correlation between MEs and the in ltration level of key immune cells using Pearson's correlation test in GSE55696 and GSE87666 cohorts respectively, with p-value < 0.05 set as the cut-off. Then, we took intersection of the modules with same correlation direction in GSE55696 and GSE87666 cohorts respectively, and performed KEGG pathway enrichment analysis on the overlapping genes to understand their potential functions to assist in determining the hub modules, with a cut-off criterion of adjusted p-value < 0.05.

Identi cation Of Candidate Genes
After hub modules identi cation, candidate hub genes were identi ed. Module hub genes, which are highly connected intra-modular genes, have the high Module Membership (MM) scores to the respective module. MM is the parameter that correlates the expression pro le of a gene with the MEs of a module which can measure the distance between genes and a given module. The absolute value of gene signi cance (GS), measuring the Pearson's correlation between a given gene and the clinical trait, can also serve as the parameter to lter out the hub genes. We set the |MM| ≥ 0.70 and the |GS| ≥ 0.50 for hub genes in co-expression network. Meanwhile all genes in the hub module were uploaded to the Search Tool for the Retrieval of Interacting Genes (STRING; https://string-db.org/) database [27] to construct protein-protein interaction (PPI) network with the species limited to 'Homo sapiens' and con dence > 0.9. We used Cytoscape to visualize the network (https://cytoscape.org/) [28]. Genes with node degree ranking in top 10% were considered as central nodes in PPI network. Both in the hub modules of GSE55696 and GSE87666 datasets, common genes serve as hub genes in co-expression network and central nodes in PPI network were considered as the candidate hub genes, which was visualized in the form of Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/).

Identi cation Of Real Hub Genes
The expression level comparison between GC and NC and survival analysis of 4 candidate genes were conducted to identify the real hub genes. In the GC meta-GEO cohort, CXCR4 was signi cantly upregulated in GC and had poor prognosis ( Fig. 7a and 7e). For CD53, the expression between GC and NC and the prognosis had no signi cant difference ( Fig. 7b and 7f). IL10RA and CD19 were signi cantly downregulated in GC but the prognosis was not signi cant (Fig. 7c, 7d, 7 g and 7 h). In the STAD TCGA-GTEx cohort, CXCR4 was signi cantly upregulated in GC and had poor prognosis ( Fig. 8a and 8e). CD53 and CD19 were signi cantly upregulated in GC but the prognosis was not signi cant (Fig. 8b, 8f, 8d and   8 h). For IL10RA, there was no signi cant difference in expression between GC and NC and prognosis ( Fig. 8c and 8 g). Combining the results of above two cohorts, CXCR4 was identi ed as the real hub gene.
Veri cation of the pattern of NKT cell in ltration, CXCR4 expression and their relationship The pattern of NKT cell in ltration and CXCR4 expression during GPL progression were veri ed in the independent GPL cohort GSE130823. Their relationship was also veri ed in the independent GPL cohort GSE130823 and GC datasets (TCGA-GTEx and GC meta-GEO cohort). During GPL progression, data indicated that the abundance of NKT was gradually decreased with GPL progression (p in one-way ANOVA test = 0.0007, Fig. 10a) and signi cantly negatively correlated with tumorigenesis (r = -0.5799, p < 0.0001, Fig. 10b) while CXCR4 expression was opposite (p in one-way ANOVA test = 0.0004, Fig. 10c; r = 0.5799, p < 0.0001, Fig. 10d). CXCR4 expression also displayed a signi cantly negative correlation with NKT cell in ltration both in GPL progression to EGC (r = -0.8070, p < 0.0001, Fig. 10e) and GC (GC meta-GEO cohort: r = -0.3651, p < 0.0001, TCGA-STAD cohort: r = -0.2518, p < 0.0001 Fig. 10f-g). Above analysis results were consistent with those in GSE55696 and GSE87666 cohorts.

Results
The immune cell in ltration landscape and key immune cells identi cation during GPL progression to EGC 24 kinds of immune cell abundance were estimated by ImmuCellAI and analyzed by one-way ANOVA test in terms of pathological stage in GSE55696 and GSE87666 cohort. Combining with the results of above two cohorts, the abundances of CD8 naive cell and NKT cell were signi cantly different across pathological stages (one-way ANOVA test p-value < 0.05) while only NKT consistently showed a linear downward trend in the transformation process in the two cohorts ( Fig. 2a and 2b). The correlation between 24 immune cell abundance and pathological stage in two cohorts was displayed in heatmaps ( Fig. 2c and 2d). Under the criterion of Spearman's r greater than 0.5 and a p-value less than 0.05, in GSE55696, Effector memory T cell (r =-0.58, p = 3e-08) and NKT cell (r = -0.58, p = 3e-08) were signi cantly negatively correlated with pathological stage while Macrophage (r = 0.61, p = 3e-09) was opposite. As for GSE87666, only NKT cell (r = -0.63, p = 0.01) met the criterion of signi cance. Combining with the correlation results of above two cohorts, NKT cell was considered as key immune cell associated with GPL progression to EGC (Fig. 2e). With further veri cation in the GC meta-GEO cohort and STAD TCGA-GTEx cohort, it was signi cantly downregulated in GC compared with NC (Fig. 3).

Identi cation Of Hub Modules Correlated With Nkt Cell In ltration
To identify hub modules correlated with NKT cell in ltration, we took intersection of the modules with same correlation direction in GSE55696 and GSE87666 datasets respectively [positive: blue module (GSE55696) ∩ red module (GSE87666), yellow module (GSE55696) ∩ red module (GSE87666)); negative: pink module (GSE55696) ∩ green module (GSE87666), green module (GSE55696) ∩ green module (GSE87666), red module (GSE55696) ∩ green module (GSE87666)]. Then, KEGG pathway enrichment analysis was performed on the overlapping genes. The overlapping genes of GSE55696 and GSE87666 green modules were enriched in immune-related pathways (Fig. S1a), while others had not found any pathway that can be enriched (Fig. S1b-d) or the enriched pathway had no obvious relationship with immunity ( Fig. S1e). Therefore, green modules of GSE55696 and GSE87666 were considered as hub modules correlated with NKT cell in ltration and subjected for further analysis.

Identi cation Of Candidate Hub Genes
There were 66 overlapping genes shared in two hub modules (Fig. 6a) and they were enriched in immunerelated pathways including 'chemokine signaling pathway', 'cytokine-cytokine receptor interaction', 'leukocyte transendothelial migration' 'primary immunode ciency', 'hematopoietic cell lineage', 'intestinal immune network for IGA production', 'Leishmania infection' and 'B cell receptor signaling pathway' (Fig. 6b). The correlation between green module's GS and MM in GSE55696 and GSE87666 was calculated and found that they have a positive relationship (r = 0.87, p = 2.3e − 119; r = 0.50, p = 9.7e − 19), which indicated that hub genes of module tend to be associated with NKT cell in ltration ( Fig. 6c and  6d). PPI networks of the green modules in GSE55696 and GSE87666 were constructed with a cutoff con dence > 0.9 ( Fig. 6e and 6f). Based on |MM| ≥ 0.70 and |GS| ≥ 0.50, 184 genes in the green module of GSE55696 and 94 genes in the green module of GSE87666 were selected as candidate hub genes in co-expression network, respectively. As for PPI network, 34 genes in the green module of GSE55696 and 22 genes in the green module of GSE87666 were considered as central nodes with node degree ranking in top 10%. After conducting Venn analysis, in the green modules of GSE55696 and GSE87666 cohort, 4 genes (CXCR4, CD53, IL10RA and CD19), serving as both candidate hub gene in co-expression network and central node in PPI network, was regarded as the candidate hub gene negatively associated with NKT cell in ltration (Fig. 6g).

Discussion
In current study, we estimated the abundance of immune cells in GPL and EGC related datasets using ImmuCellAI algorithm [10]. NKT was found gradually decreased with GPL progression to EGC and was identi ed as key immune cell associated with tumorigenesis using one-way ANOVA test and Spearman's correlation test. Further veri cation indicated that it was signi cantly downregulated in GC in meta-GEO cohort and STAD TCGA-GTEx cohort. Co-expression networks of NKT cell were constructed using WGCNA. Green modules in GSE55696 and GSE87666 having a signi cantly negative correlation with NKT cell in ltration were identi ed as hub modules as their overlapping genes were signi cantly enriched in immune-related pathways. In further screening, CXCR4, CD53, IL10RA and CD19 were identi ed as the candidate hub genes according to gene network related parameters. Combining expression comparison and prognosis analysis, CXCR4 was determined as the real hub gene. CXCR4 expression was increased with GPL progression, signi cantly positively correlated with tumorigenesis and negatively correlated with NKT cell in ltration. The pattern of NKT cell in ltration and CXCR4 expression as well as their relationship stay consistent in the independent GPL cohort GSE130823 and GC datasets (STAD TCGA-GTEx and GC meta-GEO cohort).
NKT cells (usually de ned as CD3 + CD56+), are a broad group of CD3 + T cells that co-express T-cell antigen receptor (TCR) and NK-cell markers [29,30] They share characteristics from both NK and T cells and possess both innate and acquired immune functions. On the one hand, they can be activated to secrete cytotoxic enzymes and cytokines to kill target cells once TCR ligation [31]. On the other hand, they can mediate non-MHC restricted lysis and cytokine production in the absence of TCR activation. Therefore, they were considered to play an important role in anti-tumor and anti-virus immune response [32,33]. Zhuang et al [34] has found that the frequencies of CD3 + CD56 + NKT-like cells in GC tumors were signi cantly decreased and the effector function had impaired, which were consistent with our ndings. They also found that the mechanism of the impairment of CD3 + CD56 + NKT-like cells from GC patients was not resulted from coinhibitory molecules such as PD-1, Tim-3, LAG-3 and TIGIT but from some soluble inhibitory factors released by tumor cells or tumor-in ltrating immune cells. However, the speci c factors remained to be explored.
We found that CXCR4 expression was increased with GPL progression and negatively correlated with NKT cell in ltration during GPL progression to EGC. CXCR4, is a seven-span transmembrane G-protein coupled-receptor that is the primary receptor for CXCL12 [35]. It is reported that CXCR4 plays an important role in tumor biological behaviour, such as growth, metastasis, angiogenesis and cancer cellmicroenvironment interaction [36]. In GC, CXCR4 is upregulated and associated with poor prognosis [37,38]. It was reported that SDF-1/CXCR4 axis could facilitate myeloid-derived suppressor cells (MDSCs) accumulation in tumor microenvironment and suppresses T cell immune responses [39,40]. In current study, we speculated that some kinds of immunosuppressive cells expressing CXCR4 may gradually accumulate in the tumor microenvironment and inhibit the immune response of NKT cells, leading to GPL progression.

Conclusions
In conclusion, NKT cell and CXCR4 are possible to serve as biomarkers in monitoring GPL progression. CXCR4 may be involved in NKT cell in ltration during GPL progression to EGC, which may provide a new immunotherapeutic target. The authors declare that they have no known competing nancial interests or personal relationships that could have appeared to in uence the work reported in this paper.

Declarations
38. He H, Wang C, Shen Z, Fang Y, Wang X, Chen W, Liu F, Qin X, Sun Y. Upregulated expression of C-X-C chemokine receptor 4 is an independent prognostic predictor for patients with gastric cancer. PLoS    Identi cation of candidate hub genes. a Intersection of hub modules in GSE55696 and GSE87666 cohorts. b KEGG pathway enrichment analysis for overlapping genes of green modules from GSE55696 and GSE87666 cohorts. Scatter plot of module eigengenes in the green module in c GSE55696 cohort and d GSE87666 cohort. Protein-protein interaction network of genes in the green module in e GSE55696 cohort and f GSE87666 cohort. The color intensity in each node was proportional to the node degree. g Intersection of hub genes in PPI network and co-expression network and 4 candidate genes (CXCR4, CD53, IL10RA and CD19) were obtained. PPI, protein-protein interaction Expression level comparison and prognosis analysis of 4 candidate hub genes in meta-GEO cohort.