Investigation of Aberrantly Methylated Differentially Expressed Genes in Pancreatic Cancer by Integrated Bioinformatics Analysis


 Background: Abnormal methylation of DNA sequences plays an important role in the development and progression of pancreatic cancer (PC). The purpose of this study was to identify abnormal methylation genes and related signaling pathways in PC by comprehensive bioinformatic analysis of three datasets in the Gene Expression Omnibus (GEO). Methods: Datasets of gene expression microarrays (GSE91035, GSE15471) and gene methylation microarrays (GSE37480) were downloaded from the GEO database. Aberrantly methylated-differentially expressed genes (DEGs) were analysis by GEO2R software. GO and KEGG enrichment analyses of selected genes were performed using DAVID database. A protein–protein interaction (PPI) network was constructed by STRING and visualized in Cytoscape. Core module analysis was performed by Mcode in Cytoscape. Hub genes were obtained by CytoHubba app. in Cytoscape software. Results: A total of 267 hypomethylation-high expression genes, which were enriched in biological processes of cell adhesion, biological adhesion and regulation of signaling were obtained. KEGG pathway enrichment showed ECM-receptor interaction, Focal adhesion and PI3K-Akt signaling pathway. The top 5 hub genes of PPI network were EZH2, CCNA2, CDC20, KIF11, UBE2C. As for hypermethylation-low expression genes, 202 genes were identified, which were enriched in biological processes of cellular amino acid biosynthesis process and positive regulation of PI3K activity, etc. The pathways enriched were the pancreatic secretion and biosynthesis of amino acids pathways, etc. The five significant hub genes were DLG3, GPT2, PLCB1, CXCL12 and GNG7. In addition, five genes, including CCNA2, KIF11, UBE2C, PLCB1 and GNG7, significantly associated with patient's prognosis were also identified. Conclusion: Novel genes with abnormal expression were identified, which will help us further understand the molecular mechanism and related signaling pathways of PC, and these aberrant genes could possibly serve as biomarkers for precise diagnosis and treatment of PC.

In this study, the gene expressions profiling dataset (GSE91035, GSE62165) and gene methylation profiling dataset (GSE37480) were downloaded from the GEO database (https://www.ncbi.nlm.nih.gov/geo/). The dataset GSE91035 which was produced on the platform GPL22763 (Agilent-039714 LincRNA SurePrint G3 Human GE 8 x 60K Microarray PVD 028004) includes 27 human PDAC and 8 normal pancreatic tissues. The dataset GSE62165 containing 118 PDAC tissues and 13 normal pancreatic tissues was produced on the platforms GPL13667 (Affymetrix Human Genome U219 Array). The methylation gene dataset GSE37480 includes 7 PDAC and 4 normal tissues, and the methylation genes dataset was produced on the platform GPL9767 (Agilent-014791 Human CpG Island ChIP-on-Chip Microarray 244K).
Data processing The GEO2R online software was used to analyze the raw submitter supplied data of microarrays (GSE91035, GSE62165 and GSE37480) and identify differentially expressed genes (DEGs) and differentially methylated genes (DMGs). GEO2R is a widely used web tool which can be used to compare different groups of samples in a GEO series, with the purpose of screening DEGs genes under the experimental conditions. P < 0.05 and |logFC| > 1 were applied as the cut-off criteria to find DEGs. And the cut-off criteria in finding DMGs were P < 0.05 and |t| > 1. Finally, Venn diagram (http://bioinformatics.psb.ugent.be/webtools/Venn/) was used to identify the hypomethylation and high-regulated genes by overlapping the high DEGs in GSE91035 and GSE62165 and the high DMGs in GSE37480. The hypermethylation and down-regulated genes were retrieved similarly as above.

Functional and pathway enrichment analysis
The Database for Annotation, Visualization and Integrated Discovery (DAVID) and R package clusterProfiler were applied to perform the Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for the genes [11,12]. The Gene ontology (GO) analysis was composed of the cellular component, molecular function, and biological process. P < 0.05 was regarded as statistical significance.
PPI network construction and module analysis STRING database (Search Tool for the Retrieval of Interacting Genes) was used to construct PPI network of genes, and an interaction with a combined score > 0.4 was regarded as statistically significant. The Visualization of the PPI network was realized by the Cytoscape software. Then MOCDE was used in Cytoscape to screen the PPI network with the MCODE scores > 5, degree cut-off = 2, node score cut-off = 0.2, Max depth = 100 and k-score = 2. Hub genes were selected with the connection degree > 10 in the HUBITT integrated in Cytoscape. The functional enrichment analysis of the genes in module was achieved in R package clusterProfiler with a significant threshold of P < 0.05.

Validation of hub genes
Oncomine is online microarray analysis database based on RNA sequencing of the tumor. In this study, the Oncomine were used to investigate the hub gene expression between the tumor and normal tissues. The TCGA database comprises comprehensive, multidimensional maps of key cancer genome alterations among various cancers. MEXPRESS (http://mexpr ess.be) is a database which was used for analysis the DNA methylation based on the TCGA database. In this study, the MEXPRESS was used to validate hypomethylation-high expression genes and hypermethylation-low expression genes.

Genetic alterations and survival analysis of hub genes
The cBioPortal (http://www.cbioportal.org/) is a visual web which can provide visualization, online analysis and downloading genome data of various cancer based on the TCGA database. Besides, the cBioPortal tool can be used to analysis the relationship between gene expression and the methylation status. The Cancer Genome Atlas (TCGA) is a collaboration between the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI), which has generated comprehensive, multi-dimensional maps of the key genomic changes in 33 types of cancer. GEPIA is web server for analyzing the RNA sequencing expression data of 9,736 tumors and 8,587 normal samples from the TCGA by using a standard processing pipeline [13]. To analyze the effect of hub gene on the prognosis of patients with pancreatic cancer, the GEPIA database was used to analyze the prognosis of patients in the TCGA database.

Identification of aberrantly methylated-differentially expressed genes in PDAC
The GEO2R was used to analyze the DEGs or DMGs in the microarrays (GSE91035, GSE62165 and GSE37408). A total of 5031 high-expression genes were found in the datasets (2300 in GSE91035 and 2731 in the GSE62165), and 2989 low-expression genes were obtained from the datasets (1598 in the GSE91035 and 1391 in the GSE62165). In the methylation dataset GSE37408, 4216 hypomethylation genes and 5335 hypermethylation genes were identified, respectively. 267 high-expression-hypomethylation genes were found by overlapping the high-expression and hypomethylation genes, while 202 low-expression-hypermethylation genes were obtained by overlapping the low-expression and hypermethylation genes (Fig. 1). The expression of the top 25 high-expression-methylation genes and top 25 low-expression-hypomethylation genes in the GSE37408 were shown in the Fig. 2.
GO and pathway functional enrichment analysis GO and KEGG enrichment analyses were performed by R package clusterProfiler and DAVID software was used to explore the function of the different expression genes. As showed in Table 1, the high-expression-hypomethylation genes were enriched in biological processes (BP) of cell adhesion, biological adhesion and regulation of signaling. Molecular function (MF) showed that these genes were correlated with kinase activity, cell adhesion molecule binding and protein complex binding. As for cell component (CC), the enrichments were mainly in cell-substrate junction and adherens junction. Besides, the KEGG pathway assay showed that the enrichment was mostly involved in focal adhesion (Table 2).
However, for the low-expression-hypermethylation genes, just as list in Table. 1, the main genes that were enriched were about the CC of intrinsic component of plasma membrane and bicellular tight junction, and the BP of cellular amino acid biosynthetic process and positive regulation of PI3K activity. The enrichment of MF was mostly in the amino acid transmembrane transporter activity and calmodulin binding. Additionally, KEGG analysis displayed that the genes were associated with the pathway of pancreatic secretion and amino acids biosynthesis (Table 2).
PPI network construction, module analysis and hub gene selection PPI networks were constructed on the basis of STRING database. Module analysis was conducted by MCODE in Cytoscape software. The hub genes were obtained by using the cytoHubba embedded in the Cytoscape. The PPI network of hypomethylation-high expression genes was showed in the Fig. 3A and the hub genes that were detected included EZH2, CDC20, CCNA2, KIF11 and UBE2C. The top modules were displayed in the Fig.3B. The pathway of the core modules demonstrated the important functions of the cell cycle, ECM-receptor interaction, focal adhesion, and PI3K-Akt signaling pathway ( Table 3). As for hypermethylation-high-expression genes, the PPI network was showed in Fig. 4A and the top 5 hub genes were DLG3, GPT2, PLCB1, CXCL12 and GNG7. The core modules of PPI network were displayed in Fig. 4B, C and D, respectively. The pathway associated with the core module was the Chemokine signaling pathway (Table 3).

Validated the hub genes in the TCGA
To further investigate the results, we exam the expression of the hub genes in the Oncomine database. Through analysis the expression of the hub genes, we found the expression of all the hub gene were in consistent with the expression in the GEO datasets ( Figure 5 ) . For validating the methylation status of the hub genes, the online methylation web MEXPRESS were used to investigate the methylation status, and the results shows the overexpression hub genes have significant hypermethylation status, and the downregulation genes in the GEO datasets have hypomethylation status. All the online validated results were in consistent with our bioinformatics analysis ( Figure 6).
Furthermore, in order to verify the expression in protein level of the hub genes, we investigate the immunohistochemical analysis of the hub genes in the Human Protein Atlas database. For the hypomethylated upregulated genes, EZH2, CDC20, CCNA2, KIF11 and UBE2C have a high expression trend in the pancreatic cancer. when it come to the hypermethylated downregulated genes, the results show that these genes were low-expression in the tumor tissue compared with the normal tissue ( Figure 7). However, due to the limitation of the number of samples in the Human Protein Atlas database, the experimental results need to be further verified in a large number of clinical samples.

Genetic alterations and survival analysis of hub genes
In addition, we analysis the gene genetic alterations of the hub genes with the usage of cBioPortal tools. The results show that more 30% of the tumor patient in the cBioPortal have at least one hub gene alteration and the hub gene GNG7 is the most frequently altered gene with a 4% gene alteration rate( Figure 8). Then, we analysis the correlation between the mRNA expression of hub genes and the clinic outcome of the patient basing on the TCGA datasets in the GEPIA. The results show that the hypomethylated upregulated genes CDC20, KIF11, UBE2C have a significant negative correlation with the overall survival of the patients. And hypermethylated downregulated genes PLCB1, GNG7 have a significant positive correlation with the overall survival according to the GEPIA databases( Figure 9). However, the hub genes EZH2, CCNA2, DLG3, GPT2, CXCL12 seem to have no significant correlation the outcome of the patient, which need to explore in a large clinic sample.

DISCUSSION
Pancreatic cancer is a common malignancy of the digestive system with a high mortality rate. The occurrence of pancreatic cancer is a complex biological process, which involves abnormal expression of many tumor-related genes and abnormal activation of signaling pathways [14]. Therefore, the discovery of abnormal expression of genes and related signaling pathways is of great significance for the exploration of the molecular mechanism of tumor genesis. Currently, the use of gene chips and tumor bioinformatics analysis makes it easier to identify abnormal genes and signaling pathways involved in tumor development.
DNA methylation is one of the most important gene epigenetic modifications and is widely involved in tumor development, which was usually occurred on the cytosines that precede a guanine nucleotide [15]. And CPG island usually appears in the promoter region of genes and plays a decisive role in gene expression. Methylation on CpG can alter the spatial structure of DNA, thus affecting the binding of DNA binding proteins to DNA and suppressing the gene expression [16]. So far, a large number of studies have shown that demethylation of oncogenes and hypermethylation of tumor suppressor genes play important roles in the development of tumors. For example, in breast cancer, the DNA methylation of the CLDN6 promoter can down-regulate the CLDN6 expression and induced the enhancement of the invasion ability of the breast cancer [17]. And in pancreatic cancer, the histone demethylase KDM3A can promotes tumorigenesis of cancer by regulating the expression of DCLK1 [18].But previous studies have focused on the effect of a particular methylated gene or an individual's gene methylation profile on tumors. Therefore, systematic analysis of abnormal gene expression and abnormal methylation state can more comprehensively and accurately reveal the effect of methylation on tumor.
As we know, the methylation of gene including suppressor gene and cancer-promoting gene plays an important role in the development of the cancer [7,19]. The methylation of CpG island can suppress the expression of gene [20]. In pancreatic cancer, the high-methylation of genes can promote the proliferation and invasion of the cancer [21]. In addition, the malignancy of cancer can be enhanced through the way of methylation. [22] Thus, it can be concluded that the gene methylation has a significant function in the oncogenesis. In recent years, high-throughput GeneChip analysis provides a convenient and effective way to obtain thousands of aberrant gene [23]. In this study, we analyzed the relationship between the different expression -methylation gene and the prognosis of the patients with PADC.
In present study, we identified 267 hypomethylated, upregulated genes and 202 hypermethylated, downregulated gene with the using the bioinformatic tools. Through the GO analysis, we found that the BP of the hypomethylated, upregulated genes were enriched in the cell adhesion, biological adhesion, regulation of signal transduction, cell morphogenesis involved in differentiation and regulation of signaling. The CC analysis showed the enrichments in the focal adhesion, adherens junction, cell-substrate adherens junction. The MF enrichment of the genes were in the kinase activity and cell adhesion molecule binding. It is reasonable that in the early stage of tumorigenesis, the weaken of the attachment of a tumor cell, either to another cell or to an underlying substrate promotes its migration [24]. On morphological changes, the epithelial-mesenchymal transformation ( EMT ) can promote invasion and migration of the pancreatic cancer cells [25]. In addition, the changes in cellular signaling pathways such as activation of EGFR signal in pancreatic cancer cells are very common during tumor development [26]. The activation of MAPK signaling pathway induced by increased activity of p38 kinase promotes tumor cell proliferation [27]. KEGG pathway enrichment analysis suggested significant enrichments in the pathways of ECM-receptor interaction, PI3K-Akt signaling pathway and focal adhesion. These results are consistent with the facts that the up-regulation of PI3K-Akt signaling pathway promotes the pancreatic cancer [28].
The overview function connection of the hypomethylated, upregulated genes gene was illustrated by PPI network. The top 5 hub genes obtained from the PPI network included EZH2, CDC20, CCNA2, KIF11 and UBE2C. The high expression of EZH2, a histone methyltransferase, promotes EMT transformation of pancreatic cancer and also the tumor metastasis, which is negatively correlated with the prognosis of patients [29,30]. This result is consistent with what were found in the gene enrichment analysis. The CDC20 is an anaphase-promoting complex activator, the high expression of CDC2 is associated with pancreatic cancer differentiation and progression [31]. The CCNA2, a regulator of the cell cycle, is usually abnormally high expressed in cancer cell and can enhance the cell proliferation [32]. The high-regulation of KIF11 in glioblastoma can enhance the invasion and proliferation [33]. In oral cancer and lung cancer, the KIF11 can treated as a prognostic indicator [34,35]. UBE2C is an oncogene in breast cancer and gastric cancer, and the increased expression is negatively correlated with the prognosis of patients [36,37]. However, the relationships between the expression of KIF11 and UBE2C with pancreatic cancer is still unclear and need further study.
For the hypermethylated, downregulated genes, BP analysis showed the genes were mainly enriched in the progresses of nervous system development, and cellular amino acid biosynthetic process. Neural infiltration is a major factor in causing the poor prognosis of pancreatic cancer. Abnormal expression of neural development-related genes promotes neural invasion of pancreatic cancer, while the neural ablation can enhance the prognosis [38]. Abnormal amino acid metabolism is common in the pancreatic cancer cells. The inhibition of glutamine metabolism can suppress Pancreatic Cancer [39]. The CC analysis of the genes showed the enrichments in the neuron part, bicellular tight junction and the intrinsic component of plasma membrane. The KEGG analysis showed the enrichments were in the amino acid transmembrane transporter activity, fibroblast growth factor-activated receptor activity. The results are reasonable that the weaken of the bicellular junction can promote the immigration of the cells, and it had been found that the down-regulation of tight junction-associated MARVEL protein marvelD3 can promote the transformation of EMT [40] and activity decline of the amino acid transmembrane transporter such as folic acid transporter can result in a disorder in the metabolism in cancer.
PPI network of hypermethylated, downregulated genes illustrated the overview of their functional connections, of which top 5 hub genes were also selected: DLG3, GPT2, PLCB1, CXCL12 and GNG7. DLG3, a tumor suppressor, encodes a member of the membrane-associated guanylate kinase protein family. And the down-regulation of DLG3 promotes the invasion and proliferation in glioblastoma, however, in pancreatic cancer, the function of DLG3 has not been reported [41]. CXCL12 plays a role as a tumor suppressor gene in pancreatic cancer, the up-regulation of the CXCL12 suppresses human pancreatic cancer growth and metastasis [42]. The loss of PLCB1 can induce cell damage under oxidative stress and enhance the alpha-synuclein aggregation [43], but the function in the process of pancreatic cancer formation still needs further study. In breast cancer, the expression level of GPT2 is significantly increased, and is related to the pathological grade [44].
Module analysis for the hypermethylated, downregulated gene indicates that the cell cycle， ECM-receptor interaction, focal adhesion, and PI3K-Akt signaling pathway may play great roles in the development of pancreatic carcinoma. It is well-known that the cell cycle is the critical cellular processes with DNA replication and translation and it's arrest can inhibit cell proliferation [45]. As we all know, the adhesion function of cells is the primary step for tumor metastasis [46], and the adhesion function enhanced by the focal adhesion and ECM-receptor interaction can accelerate cancer cell metastasis in the blood vessels. The PI3K signaling pathway is activated in a variety of tumors including pancreatic cancer, and the activation of PI3K signaling pathway promotes the metastasis of tumor cells [47].

Conclusion
In summary, through bioinformatic analyses of gene expression and methylated profile in the microarray datasets, we have discovered a series of genes that are abnormally methylated and expressed in the cancer cell of PADC patients. Five survival-related hub genes (DLG3, GPT2, PLCB1, CXCL12 and GNG7) were found. These results help us to understand the role of apparent modification in the development of pancreatic cancer and can also be used as indicators for early diagnosis and prognosis evaluation of pancreatic cancer. However, the data in this paper needs to be further verified in biological experiments.
Ethics approval and consent to participate :Not applicable Consent for publication : Not applicable Availability of data and materials : The datasets GSE91035, GSE62165 and GSE37480 are available in the Gene Expression Omnibus (GEO, https ://www.ncbi.nlm.nih.gov/ geo/).   The protein levels of (a) EZH2, (b) CDC20, (c) CCNA2, (d) KIF11, (e) UBE2C were upregulated in the tumor compared to the normal tissues. And the protein levels of (f) DLG3, (g) GPT2, (h) PLCB1, (i) CXCL12, (j) GNG7 were down-regulated in the tumor tissues compared to the normal tissues according to the database FIGURE 8 Genetic alterations of ten hub genes in the TCGA pancreatic cancer study using the cBioPortal database. a Genetic alteration frequency of hub genes. Different colors represent different kinds of genetic alterations. b A visual summary of alterations of hub genes in per sample. Each sample is presented in a column with each gene in a row. Different kinds of genetic alterations are highlighted in different colors FIGURE 9 The survival analysis of hub gene in pancreatic cancer. (TCGA database) The mRNA expression levels of (b) CDC20, (d) KIF11, (e) UBE2C have a significant negative correlation with the overall survival of the patients. And the mRNA expression levels of (h) PLCB1, (j) GNG7 have a significant positive correlation with the overall survival according to the GEPIA databases.