Analysis of Key Pathways, Genes and Immune Cell Inltration in Metastatic Esophageal Squamous Cell Carcinoma Based on Bioinformatics

[Abstract] : Objective : To seek potential metastatic tumor markers in esophageal squamous cell carcinoma (ESCC) by bioinformatics. Methods : Four datasets (GSE70409, GSE26886, GSE161533, GSE17351) were downloaded from GEO database, and the differential expression of mRNA was screened by Limma software package. Gene ontology function and kyoto encyclopedia of genes and genomes pathway enrichment analysis were performed using the database for annotation; Protein–protein interaction network of these DEGs was constructed based on STRING database; then, the hub gene were identified by six algorithms of Cytoscape software. Using GEPIA2 database and OncoLnc database to analyze the survival of hub genes. Immune infiltration was analyzed by TIMER2 database and CIBERSORT deconvolution method. Results : A total of 244 differentially expressed genes （ DEGs ） were screened from 4 data sets of GEO database, including 164 up-regulated genes and 80 down-regulated genes, which were mainly enriched at extracellular region and extracellular space, and involved in biological processes such as cell adhesion and proteolysis related to tumor invasion and metastasis. KEGG results showed that DEGs were mainly enriched in PI3K-Akt signal pathway and extracellular matrix (ECM) receptor interaction signal pathway. Through the protein interaction analysis of DEGs, we found that three hub genes (CXCL8, MMP9 and MMP13) were highly expressed in ESCC. However, the GEPIA2 database shows that only CXCL8 is associated with the overall survival of ESCC patients. The results of immune infiltration showed that macrophages and resting dendritic cells increased significantly, while mast cells and monocytes decreased significantly. Conclusion : Three hub genes and immune infiltrating cells may play an important role in ESCC, and are expected to become potential tumor markers of ESCC and be used in the diagnosis and treatment of ESCC. level of both increases with the increase of tumor invasiveness 26 . The same results were also confirmed in prostate cancer 9 , knockout of CXCL8 in PC-3M-LN4 cells decreased the expression of MMP9, while up-regulation of CXCL8 increased the expression of MMP9. A study on the analysis of gastric cancer microarray data showed that the co-expression of CXCL8 and MMP9 could be an early detection marker of perineural invasion(PNI) 27 . To sum up, these three hub genes are of great value and significance as tumor markers and prognostic indicators of ESCC.


Introduction
Esophageal cancer (ESCA) is one of the most common malignant tumors in the world. In the global cancer statistics 2020 1 , the incidence of esophageal cancer ranks seventh, and esophageal squamous cell carcinoma(ESCC) accounts for about 90% of esophageal cancer. Due to the absence of typical and early clinical symptoms and the lack of simple and effective early clinical diagnosis methods, esophageal squamous cell carcinoma was often diagnosed in the middle and late stage. Up to now, surgical treatment is still the main treatment for ESCC in the early stage, but the five-year clinical survival rate is still not ideal 2 . So far, the molecular mechanism of the occurrence and development of ESCC has not been fully clarified, and there is an urgent need to find more potential tumor markers for effective evaluation of diagnosis and prognosis.
CXCL8, also known as IL-8, has been shown to up-regulate in many malignant tumors and is related to tumor progression. High levels of CXCL8 in serum and tissues are often associated with distal metastasis and poor prognosis of ESCC 3 . At present, numerous studies have demonstrated that CXCL8 promotes the progression of ESCC mainly through the recruitment of tumorassociated macrophage 4 and myeloid-derived suppressor cell 5 . MMP9 and MMP13 belong to the matrix metalloenzyme family. Gu et al 6 found that MMP9 could be a negative independent prognostic factor of ESCC, which played a role in early ESCC with MMP13. Some scholars believe that the degree of immune cell infiltration in tumor microenvironment is helpful to analyze the immune status of tumor and evaluate the effect of immunotherapy 7 . Most immune cells can secrete CXCL8, previous studies have found that CXCL8 in bladder cancer 8 and prostate cancer 9 can increase the activity of MMP9 and promote further metastasis of tumor cells.
Meanwhile, MMP9 and MMP13 are highly expressed in tumors such as ESCC, and MMPs may become an excellent biomarker 10 . With the development of gene chip technology and highthroughput sequencing technology, more and more valuable biomolecules [11][12][13] can be identified by mining big data with bioinformatics technology. Therefore, screening the potential tumor markers of ESCC is of great significance for its early diagnosis, treatment and prognosis.
In this study, differentially expressed genes (DEGs) were screened from Gene Expression Omnibus (GEO). Then, we established a protein-protein interaction network and do gene enrichment analysis s to find the important hub genes. Finally, the immune infiltration analysis was carried out to explore the infiltration of all kinds of immune cells in ESCC, which provided a new idea for the prognosis diagnosis and treatment of ESCC.

Data processing of DEGs
Limma package (version: 3.40.2) of R software was used to study the differential expression of mRNAs. The adjusted P-value was analyzed to correct for false positive results in GEO datasets. "Adjusted P < 0.05 and Log (Fold Change) >1 or Log (Fold Change) < −1" were defined as the thresholds for the screening of differential expression of mRNAs. The overlapping upregulation and down-regulation DEGs from these four datasets were identified using the Sangerbox online tool (http://sangerbox.com/Tool).

Analysis of GO enrichment and KEGG pathway
Gene annotation plays a crucial role in the identification of biological characteristics in transcriptome data. Gene Ontology and KEGG are widely used in functional enrichment analyses; gene functions can be classified into biological process (BP), cellular component (CC), molecular function (MF). KEGG stores a lot of data about genomes, biological pathways, diseases, chemical substances, and drugs. In this study, the DAVID database(https://david.ncifcrf.gov/) was used to conduct the Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses in overlapping DEGs. The p value < 0.05 was considered statistically significant.

Protein-protein interaction (PPI) network construction and module analysis
The Search Tool for the Retrieval of Interacting Genes (STRING) 14 is an interacting gene database designed to analyze PPI information .The PPI network of DEGs was constructed by using STRING(https://string-db.org/). An interaction score of 0.7 was regarded as statistically significant. Then the PPI network was built and visualized in Cytoscape software (3.8.2). Next, we use molecular complex detection (MCODE) plugin to determine the significant PPI network modules. In addition, six algorithms 15 in the CytoHubba plugin were used to screen hub genes.

Verification of hub genes in multi-database
The GEPIA2(http://gepia.cancer-pku.cn/) is an online public accessed platform, which includes TCGA and GTEx data. Accompany with gene expression data of 32 types of cancers which are derived from 10,882 RNA-seq and 10,546 miRNA-seq data, ENCORI (http://starbase.sysu.edu.cn/)allows researchers performing gene analysis. In the present study, expression level of these identified hub genes in ESCC was analyzed at GEPIA2 database and ENCORI database, and the cutoff values were set as | log2FC | = 1.0 and p value = 0.01.
Meanwhile, the GEPIA2 was applied to analyze expression of hub genes in different stages of ESCC.

Hub Gene correlation and Survival Analysis
We employed GEPIA2 to perform the correlation between hub genes. The survival time of patients with ESCA was analyzed by GEPIA2 in order to explore the correlation between gene expression and prognosis of patients with ESCA. The percentiles of the high expression group and the low expression group were set to 70% and 30%.

Immune infiltration analysis
In this experiment, we estimated the associations between hub gene expression and immune cell population (B cells, CD8+T cells, CD4+T cells, neutrophils, macrophages and DC cells) in Timer2.0 database (http://timer.cistrome.org/), p value < 0.01 was considered to be statistically significant. At the same time, the infiltration and distribution of 22 kinds of immunocytes in ESCC group and normal group were obtained by CIBERSORT deconvolution algorithm.

Identification of DEGs
In this study, 4 gene expression profiles were selected from GEO. After screening the DEGs according the criteria, 1694, 3209, 583 and 1889 genes were identified from GSE70409, GSE26886, GSE161533 and GSE1735, respectively. Among them, the number of up-regulated genes were 941, 1614,328 and 873 respectively. Meanwhile, the down-regulated genes were 753,1595,255 and 1016, respectively. Then, the intersection of the DEG profiles was get using venn analysis. Finally, we found 164 up-regulated genes (Fig1a) and 80 down-regulated genes (Fig1b).

Gene ontology and pathway functional enrichment analysis
As shown in (Tab2),the results of GO analysis showed that the overlapping DEGs were principally enriched in the biological functions ,especially in the proteolysis, extracellular matrix organization, oxidation-reduction process, cell adhesion and positive regulation of cell proliferation; In terms of cell composition, the DEGs were mainly related to extracellular exosome , extracellular region, extracellular space, endoplasmic reticulum and endoplasmic reticulum membrane; In regard to molecular function, the DEGs were mainly binding to calcium ion and actin, affecting the activities ofserine-type endopeptidase , oxidoreductase and metallopeptidase. Also, KEGG pathway analysis showed that the overlapping DEGs were dramatically concentrated in the PI3K-Akt signal pathway and ECM-receptor interaction.

PPI network construction and hub gene identification
The protein interaction of DEGs was constructed by STRING. The PPI network for DEGs is demonstrated in (Fig2a), which included 103 nodes and 175 edges, with a comprehensive score of more than 0.7 p < 1.0e-16. In addition, six algorithms in the CytoHubba plugin were used to evaluate the PPI network. And we selected the top 10 genes of each algorithm (Tab3). Based on the intersection of the results of the six algorithms, the three hub genes CXCL8, MMP9 and MMP13 were screened. At the same time, we found that three hub genes appeared in the same module in MCODE analysis(Fig2b).

Verifying of the hub genes
In order to verify the dependability of the results from the bioinformatics analysis, we next  (Fig4a-c). The online database ENCORI was also used to confirm the significant overexpression of three hub genes in ESCC (Tab4). These results suggest that the expression levels of CXCL8, MMP9 and MMP13 were higher in ESCC tissues than in normal esophageal tissues, indicating their potential oncogenic effects..

Association of the hub genes with survival time of patients
The online database GEPIA2 were used to confirm the association of the hub genes with the survival curve of patients. The results of total survival analysis of hub genes in ESCA showed that there was a significant correlation between the high expression of CXCL8(P=0.026) and the overall survival time of patients(Fig5a). However, there was no significant correlation between MMP9 (P=0.49), MMP13 (P=0.53) and prognosis of patients (Fig5b-c). Furthermore ,we found that there was a certain correlation between the expression of the three hub genes in ESCA, and the p values were＜0.05(Fig5d-f).

Association of the hub genes with tumor-infiltrating immune cells
In this study, we attempted to explore the relationship between the mRNA expression of hub genes and immune cell infiltration using TIMER2.0. As shown in (Fig6a-c), each of the hub genes correlated with tumor purity in ESCA. We observed that there was a certain correlation between

Discussion
Esophageal squamous cell carcinoma is one of the most common malignant tumors, which is often diagnosed as middle and advanced stage. As we all know, invasion and metastasis are the main biological characteristics of malignant tumors and a marker of human cancer 16 . The main steps of its occurrence include extracellular matrix degradation, cell adhesion and so on. In our study, DEGs were determined by comprehensive analysis of four data sets, and the results showed that DEGs were mainly enriched in extracellular region and extracellular space, and for functional analysis, DEGs were mainly concentrated in cell adhesion and proteolysis related to tumor invasion and metastasis. In addition, the results of KEGG signal pathway analysis also showed that DEGs were enriched in the functional pathways of tumor invasion and metastasis, the changes of PI3K-Akt signal pathway and ECM receptor pathway were involved in tumor invasion and metastasis in many kinds of tumors 17,18 .
In this study, based on six algorithms, three hub genes were screened, among which chemokine CXCL8, also known as interleukin-8, can stimulate neutrophil granules to release and produce superoxide and hydrogen peroxide 19  In our study, CIBERSORT was utilized to analyze and evaluate the immune cell infiltration of ESCC. We found that compared with normal tissues, the levels of macrophage, myeloid dendritic cell resting and T cell follicular helper in ESCC group were higher, while the levels of B cell, monocyte, CD8+T cell and mast cell activated were lower. Chen et al 28 found that M0 and M1 macrophage were significantly increased in ESCC, and M1 macrophage was involved in inhibiting the invasion and migration of ESCC cells. In large body of experimental evidence has found that M1 macrophage can produce cytokines such as IL1, IL-12 and TNF-α, which produce anti-tumor effect 29  To sum up, this study uses a variety of bioinformatics methods to systematically analyze the DEGs of ESCC, which provides clues for exploring the mechanism of ESCC. However, this study only analyzes the information data, and further experimental studies are needed to confirm the above results in the future. Overall, these findings suggest that three hub genes may be potential biomarkers in ESCC.