Bioinformatics analysis of aberrantly methylated-differentially expressed genes in gastric cancer

Background: This study was carried out to identify the aberrantly methylated-differentially expressed genes in gastric cancer (GC). Methods: We downloaded data of gene expression microarrays GSE118916 and gene methylation microarrays GSE25869 from the Gene Expression Omnibus (GEO) database. The DEGs and DMGs were analyzed by the limma software package and Venn diagram. The PPI network was mapped and the enrichment analysis was conducted by the DAVID database. GEPIA online tool, Oncomine database, HPA, and cBioPortal tool were used to verify hub genes. Result: We obtained 110 Hypo-HGs, 9 high-regulation hypomethylation oncogenes, 23 Hyper-LGs, and 2 low-regulation hypermethylation tumor suppressor genes. Hypo-HGs biological process mainly involves cell adhesion and extracellular matrix organization, Hyper-LGs biological process mainly involves response to nicotine and xenobiotic metabolic process. KEGG analysis showed that Hypo-HGs significantly enriched in Focal adhesion, PI3K-Akt signaling pathway, and ECM-receptor interaction. in P450, Chemical carcinogenesis, and Metabolism of xenobiotics cytochrome P450. The the hub genes COL1A1, THBS1, COL5A2, COL12A1, and CXCR. COL1A1, THBS1, COL5A2, COL12A1, and CXCR4 be used as a target for precise diagnosis and treatment of GC. Focal adhesion, PI3K-Akt signaling pathway, and ECM-receptor interaction are important mechanisms of GC. a significant not point


Introduction
Gastric cancer (GC) refers to an epithelial malignant tumor that originates in the stomach and is the third most common cause of cancer death, second only to lung cancer and liver cancer [1]. The occurrence of GC is affected by many factors, including dietary factors, environmental factors, and genetic factors [2][3][4]. The treatment methods of GC mainly include surgery, chemotherapy, radiotherapy, immunotherapy, and targeted therapy [5]. Although the GC treatment methods are constantly evolving, the survival rate is still low due to the prone to recurrence and metastasis of advanced GC [6]. According to statistics, the 5-year survival rate of GC patients is only 20% to 30% [7]. Therefore, searching for hub genes and biomarkers that affect the occurrence and development of GC is of great significance for the early diagnosis, treatment, prognosis, and drug discovery of GC.
Epigenetics was considered to be a heritable change in gene expression, not mediated by changes within the DNA sequence [8]. DNA methylation in the gene promoter region was related to the silencing of oncogenes and tumor suppressor, and was considered to be a hallmark of many tumors [9]. Although some studies have confirmed that certain genes have abnormal DNA hypermethylation or hypomethylation in GC [10,11], but it is still difficult to determine the comprehensive profile and pathways of the interaction network.
In recent years, high-throughput platform microarrays can be used to screen for genetic or epigenetic changes in cancer [12]. Many gene expression profiling microarray analyses and abnormal methylation studies have been carried out via this tool, and various differentially expressed genes (DEGs) [13] and differentially methylated genes (DMGs) [14] in GC have been discovered. However, these studies did not conduct joint analysis, which may lead to the lack of some hub genes. In this study, we combined gene expression profiles and gene methylation microarray data to identify abnormally methylated and differentially expressed genes and pathways between GC tissue and normal tissue. Then used GEPIA database, Oncomine database, HPA, and cBioPortal [在此处键入] tool to identify the hub genes involved in the pathogenesis of GC. The protocol of our study procedures is shown in Fig. 1.

Microarray data
The GC gene expression dataset GSE118916 and the methylation dataset GSE25869 were received by searching the keyword "gastric cancer" in the GEO database

Gene ontology and pathway functional enrichment analysis
110 mutually inclusive hypomethylated/up-expression and 23 hypermethylated/downexpression genes were imported into the DAVID database [17] (https://david.ncifcrf.gov/) for GO function [18] and KEGG pathway enrichment analysis [19] , the species was qualified as Homo sapiens. GO enrichment analysis was mainly composed of biological process (BP), cellular component (CC), and molecular function (MF).

Validation of the hub genes in the TCGA database
The Hypo-HGs, Hyper-LGs which Degree value of the top 10 genes, 9 upregulatedhypomethylated oncogenes, and 2 downregulated-hypermethylated were input into an online tool of GEPIA (http://gepia.cancer-pku.cn/index.html) to verify its expression in TCGA-STAD and draw Kaplan-Meier survival curve (OS). And the hub genes were identified.

Analysis of hub genes in Oncomine database
Oncomine (https://www.oncomine.org/) [20] is a microarray-based gene database and integrated data-mining online cancer microarray database. The Oncomine database was used to confirm the expression of 5 hub genes in 20 different types of cancer, and to explore the mRNA expression differences between GC and normal gastric tissue.

Validation in HPA and Genetic information of the hub genes
In this paper, the protein expression and distribution of hub genes were investigated in GC tissues and compared to normal tissues in HPA(https://www.proteinatlas.org/) [21] . [在此处键入] And the hub genes enrichment analysis circle diagram was drawn. The cBioPortal tool (http://www.cbioportal.org/) was used to discover the genetic information of GC hub genes and the correlation between messenger RNA (mRNA) expression and DNA methylation.

DEGs and DMGs in GC
The GSE118916 expression matrix has 1163 DEGs, including 528 up-regulated genes and 635 down-regulated genes, drawn into cluster heat maps ( Fig. 2A) and volcano maps (Fig. 2B). A total of 2589 DMGs were obtained in GSE25869, including 680 hypermethylated genes and 1909 hypomethylated genes, which were drawn into a cluster heat map (Fig. 3); 2B：The volcano map visualizes all DEGs. The red dots represent up-regulated genes, the blue dots represent down-regulated genes, and the gray dots represent genes that are not differentially expressed. [在此处键入]

Fig. 3 Heat map of DMGs
The hierarchical clustering of the heat map reveals the DMGs in the GC. Orange and purple indicate higher and lower expression levels, respectively.
Hypermethylation genes may trigger tumorigenesis by downregulating the expression of these genes (Fig. 4B).

Fig. 5 Protein-Protein Interaction Network of Hypo-HGs
A larger Degree or combined-score value indicates a larger node size and deeper node color.

Gene ontology and pathway functional enrichment analysis
The GO analysis of Hypo-HGs showed that BP was mainly concentrated in cell adhesion and extracellular matrix organization, CC was mainly concentrated in the extracellular region, extracellular space, and extracellular exosome, and MF was [在此处键入] mainly concentrated in protein-binding, calcium ion binding, and heparin binding.
Enriched. KEGG analysis results showed that Hypo-HGs were significantly enriched in Focal adhesion, PI3K-Akt signaling pathway, and ECM-receptor interaction. R language was used to draw the GO and KEGG bubble charts according to the count value of the TOP10 (Fig. 6).  (Table 1).

Fig. 6 Bubble diagram for GO and KEGG enrichment analysis of Hypo-HGs
The size of the bubble in the figure represents the number of enriched genes, and the difference in [在此处键入] bubble color represents the significance of gene enrichment. 3.5 Validation of the hub genes in TCGA database [在此处键入] GEPIA database was used to view Hypo-HGs (Fig. 7A

Analysis of the hub genes in Oncomine database
The results of the Oncomine database showed that COL1A1, THBS1, COL5A2, COL12A1 were all expressed in GC, which was statistically significant. Although CXCR4 was not expressed in GC, 41 studies have shown its expression in other tumors. (Fig. 9). [在此处键入]

Validation in HPA and Genetic information of the hub genes
The HPA online tool was used to analyze the protein expression of COL1A1, THBS1, COL5A2, COL12A1, and CXCR4 (Fig. 10). The results showed that the COL1A1 protein gene was not expressed in normal gastric tissues and was low expressed in GC tissues. THBS1 protein gene was highly expressed in normal gastric tissues, but not expressed in GC tissues. The COL12A1 protein gene was low expressed in normal gastric tissues, but not expressed in GC tissues. There is no pathological map of COL5A2, CXCR4 expression in the HPA database. Then we drew an enrichment analysis circle diagram of the hub genes (Fig. 11). The cBioPortal tool showed that 101 of 393 patients with gastric adenocarcinoma (26%) had genetic mutations in these five genes (Fig.12A). Overview of genetic variation of 5 hub genes was also analyzed COL1A1 THBS1 COL5A2 COL12A1 CXCR4 [在此处键入] ( Fig.12B). Fig.12C showed the correlation between COL12A1 mRNA and DNA methylation. There is no data plot between COL1A1, THBS1, COL5A2, CXCR4 mRNA, and DNA methylation in the database. staining strength.

Fig. 11 Enrichment analysis circle diagram of hub genes
In order to explore the functions of 5 hub genes in GC, we conducted enrichment analysis about them.
The results showed that the 5 hub genes BP mainly include collagen fibril organization，collagen   Fig.12B showed overview of genetic variation of 5 genes. Fig.12C showed the correlation between COL12A1 mRNA and DNA methylation. There was no data plot between COL1A1, THBS1, COL5A2, CXCR4 mRNA and DNA methylation in the database.

Discussion
In recent years, despite the continuous development of the treatment process of GC, the treatment and prognosis of GC are still poor due to the lack of difficulties in early diagnosis. Explaining the potential molecular mechanism of GC will help the early diagnosis and prognosis of GC. Bioinformatics analysis is increasingly used to screen possible target biomolecules that have a guiding role in tumor diagnosis and treatment [22].
In this study, bioinformatics tools were used to analyze gene expression data set analysis results showed that the biological process of Hypo-HGs mainly involved cell B C [在此处键入] adhesion and extracellular matrix organization. Cell adhesion is involved in the pathological and physiological processes of a variety of tumor cells [23], and changes in cell-cell adhesion and cell-matrix adhesion can promote cancer cell metastasis [24].
Intercellular adhesion molecule-1 (ICAM-1) is a member of the immunoglobulin superfamily of adhesion molecules (IGSF). It is a key protein for intercellular signal communication which is related to a variety of pathological processes. When the body has inflammation, infection, and immunity under stress and other conditions, ICAM-1 can be over-activated and expressed, and participates in regulating the immune response of the body's cells [25]. High-level expression of ICAM-1 can be detected in GC cells with a high metastasis rate, which shows that the expression of ICAM-1 is significantly related to the invasion and metastasis of GC, and can be effectively used for clinical monitoring of gastric cancer's blood-borne lymphoid transfer [26]. The extracellular matrix is a key component that plays an active role in all cancer characteristics [27] and mediates cell-microenvironment interactions [28]. The biological process of Hyper-LGs mainly involved response to nicotine and xenobiotic metabolic process. Nicotine can significantly up-regulate the expression of matrix metalloproteinase 7 (MMP7), and high expression of MMP7 has been shown to play a key role in cancer invasion, and smoking addiction increases the risk of GC [29]. The xenobiotic metabolic process may regulate the sensitivity of GC [30]. KEGG analysis results showed that Hypo-HGs were significantly enriched in Focal adhesion, PI3K-Akt signaling pathway, and ECM-receptor interaction. Studies have found that focal adhesion was involved in the occurrence and metastasis of GC, and calcium releaseactivated calcium regulation 2 (ORAI2) promotes the tumorigenicity and metastasis of GC through PI3K/Akt signal transduction and MAPK-dependent focal adhesion decomposition [31]. The PI3K-Akt pathway is widely distributed in various cells and can regulate a variety of biological behaviors of cells [32]. The abnormal PI3K-Ak pathway may trigger the occurrence and development of cancer [33]. Studies have shown that it can promote the proliferation of GC cells and inhibit cell apoptosis, which is closely related to the invasion and metastasis of GC cells [34]. ECM is an important [在此处键入] part of the tumor microenvironment [35]. The ECM-receptor interaction plays a crucial role in many cancers [36,37]. Hyper-LGs were significantly enriched in Drug metabolism-cytochrome P450, Chemical carcinogenesis, and Metabolism of xenobiotics by cytochrome P450. Cytochrome P450 family genes participate in the development of GC through the xenobiotic metabolism of cytochrome P450 [38].
Overexpression of cytochrome P450 family 2 subfamily E polypeptide 1 (C P2E1) promotes the proliferation and invasion of GC, and inhibits their apoptosis [39]. SPARC, CDH1, TMEM45B were highly expressed in GC tissues, PXMP2 was low in GC tissues; Most of the genes have been studied in GC. It has been confirmed that COL3A1 was overexpressed in other bladder cancers and glioblastomas [41,42], but the mechanism of influence in GC has not been fully understood. COL1A2 is related to the invasion and metastasis of GC [43]. The high expression of COL1A2 may indicate a poor clinical prognosis in patients with GC [44]. The high SPARC expression increases tumor cell activity and enhances epithelial-mesenchymal transition and angiogenesis [45]. Pathogenic mutations and germline deletions of CDH1 are important pathogenic factors for early-onset diffuse GC [46]. TMEM45B is abnormally expressed in many types of tumors. Knockdown of TMEM45B can inhibit the JAK2/STAT3 signaling pathway, thereby inhibiting the proliferation, migration, and invasion of GC cells [47]. PXMP2 plays a vital role in lipid and reactive oxygen metabolism, which is [在此处键入] a new target for studying depression [48], but its effect in tumors has not been reported. COL1A1, COL5A2, COL12A1 belong to the collagen-forming gene family [49], and each collagen is composed of 3 polypeptide chains numbered with Arabic numerals [50]. The collagen-forming gene family is involved in the formation of collagen in extracellular matrix proteins [51] and overexpressed in a variety of cancers [52].
Collagen is the main component of the extracellular matrix of GC cells and the main component of the mesenchymal microenvironment. It can induce tumor cell migration [53]. When GC occurs, collagen synthesis increases and induces epithelialmesenchymal transition, leading to tumor cell infiltration and metastasis [54]. The expression of COL1A1 is higher in GC tissues than normal tissues [55], which is related to the prognosis of GC patients [56]. COL5A2 is related to the pathological process of osteosarcoma [57], bladder cancer [58], and GC [59]. COL12A1 is known to be abnormally expressed in connective tissue diseases, and COL12A1 mutations are associated with poor prognosis [60]. COL12A1 is highly expressed in intestinal diffuse GC which is associated with poor OS and PFS [61]. Studies have found that THBS1 [在此处键入] mutations are related to early GC. THBS1 may become a new prognostic target of GC by affecting tumor purity, TMB, TME score, and multiple oncogenic signaling pathways [62]. CXCR4 is related to the aggressiveness of GC [63] and lymph node metastasis [64]. Blocking CXCR4 can inhibit the growth and invasion of GC cells [65].
Consistent with this study, CXCR4 may potentially trigger the occurrence of GC by upregulating gene expression after methylation.

Conclusion
This study combined the gene expression microarrays and gene methylation microarrays, conducted a comprehensive bioinformatics analysis, and identified abnormally methylated and differentially expressed tumor-promoting genes and TSGs in GC tissues, as well as related Function and pathways. The five hub genes verified by TCGA, Oncomine, HPA, and cBioPortal databases include COL1A1, THBS1, COL5A2, COL12A1, CXCR4, which can be used as targets for precise diagnosis and treatment of GC. However, due to the limitations of bioinformatics, further experimental studies are needed to verify its molecular mechanism in detail.

Data Availability
The data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that there are no conflicts of interest. [在此处键入] for