Identification of the co-expression module-trait relationships and hub genes associated with the diffuse-type gastric cancer.

Diffuse-type gastric cancer (DGC) is a gastric cancer subtype based on Laurén histological type, with poor prognosis when compared intestinal-type gastric cancer. It is necessary that comprehensive analysis identifies pathways and genes involved in DGC. RNA expression data of DGC (including 52 samples, top 5000 genes) and fourteen patient clinic traits were downloaded from The Cancer Genome Atlas (TCGA) database. Co-expression modules and module-trait relationships were constructed by weighted gene co-expression network analysis (WGCNA). The identified ten co-expression modules were obtained from DGC samples. The co-expression turquoise module positively correlated with longest dimension. The co-expression magenta module positively correlates with gender, and person neoplasm cancer status. The co-expression green module also positively correlates with pathologic N stage and pathologic stage. Besides, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment obtained some cancer related signal pathways in the co-expression turquoise and green module, such as proliferation, drug resistance, cell metabolism. Additionally, we identified the hub genes in the co-expression turquoise, green and magenta modules. and were DGC patients, including ANK2, GNAO1, MAP6, RERG (turquoise module), MEP1A, GUCY2C, ALDOB (green module), XIST, TTTY15, NCRNA00185, CYorf15B (magenta module). The study provided new biomarkers and potential mechanisms in DGC. in data analysis; Analysis and interpretation of the data and statistical expertise; Conception and design of the study.All of reviewed and the manuscript.


INTRODUCTION
Globally, gastric cancer is the fifth leading cause of cancer and the third leading cause of death from cancer, making up 7% of cases and 9% of deaths [1]. In 1965, Lauren classified gastric cancer into intestinal-type and diffuse-type according to its histological structure and biological behavior [2].
Diffuse-type gastric cancer (DGC) originates from gastric inherent mucosa with poor prognosis, which is poorly differentiated, diffuse growth, lack of cell junction, and generally does not form glandular ducts. Many poorly differentiated adenocarcinomas and signet ring cell carcinomas belong to DGC.
Women incidence of DGC is higher than men [3]. Treatments may include some combination of surgery, chemotherapy, radiation therapy and targeted therapy. The poor prognosis could be because most people were diagnosed with advanced disease. Identification of new biomarkers and potential mechanisms are urgent to improve prognosis of DGC [4].
The Cancer Genome Atlas (TCGA), as a large public database, has provided abundant genomic and clinical data of various cancers. So far, a few biomarkers and molecular mechanisms in many kinds of cancers have been discovered based on TCGA [5]. To identify novel prognostic markers and explore the underlying mechanisms of gastric cancer, weighted gene co-expression network analysis (WGCNA) was used for clustering highly correlated gene group (the co-expression modules) [6]. WGCNA was extensively used in various biological contexts, such as yeast genetics, cancers, genetics data, proteomic data, metabolomics data, and analysis of imaging data [7,8]. The co-expression module-trait relationships perfectly reflected the correlation between gene-based networks and the clinical phenotypes. Eigengene networks study module relationship and intramodular connectivity help to hub genes in interesting modules.
One found a preserved module consisting of 506 genes was associated with clinical traits including pathologic T stage and histologic grade by WGCNA [9]. The study previously also used WGCNA to identified two novel lncRNAs, PCAT18 and LINC01133, associated with GC development and metabolic pathways of gastrointestinal disease and function [10].However, identification of the co-expression module-trait relationships and hub genes associated with the diffuse-type gastric cancer has not been reported. Diffuse-type gastric cancer is one subtype of gastric cancer with poor prognosis.
In this study, to investigate the DGC-related gene modules and key biomarkers, the gene expression data and clinical traits information were obtained from TCGA. WGCNA identified that co-expression modules (turquoise, magenta, green modules) were significantly correlated with clinic traits and extracting a series of hub genes. Some of important hub genes were significantly related to overall survival in DGC patients. Our findings might discovery new biomarker and mechanisms for DGC.

Genetic and clinical data of DGC patients
To download DGC-related data from TCGA database, including 72 DGC sample patients, 20530 genes, and 14 clinical traits. For all these samples, the clinical information including age at initial pathologic diagnosis, gender, longest dimension, lymph node examined count, neoplasm histologic grade, new tumor event after initial treatment, number of lymph nodes positive, pathologic M stage, pathologic N stage, pathologic T stage, pathologic stage, person neoplasm cancer status, radiation therapy, targeted molecular therapy.
The gene expression missing value (expression = 0) that more than 20% was excluded, and extracted top 5000 good genes for further study.

Construction of weighted gene co-expression network
To find clusters (modules) of highly correlated genes of DGC sample, WGCNA to select an accurate scale free R 2 value to construct co-expression modules.
The next step is module detection and the network construction. The modules whose size less than 30 were merged into one module by the blockwiseConsensusModules function of WGCNA R package. The third step is to calculate the associations between clinical variable and the co-expression modules, and the module eigengene (ME) is utilized. Therefore, each module's ME represents a summary measure for the overall co-expression network. In addition, the gene significance (GS) was defined as mediated p-value of each gene (GS = lg P ) in the linear regression between gene expression and the clinical traits.

The Kyoto Encyclopedia of Genes and Genomes (KEGG) base on the co-expression modules.
KEGG is a reference knowledge database involving chemical information systems information, and genomic information. KEEG pathway (https://david.ncifcrf.gov/home.jsp) enrichment analysis was performed the co-expression modules with a statistical significance level of p < 0.05.

Identify hub genes with molecular complex detection
The criteria of hub genes searching were molecular complex detection (MCODE) score >6 by Cytoscape software (version 3.2.0; National Resource for Network Biology), and statistical significance of p < 0.05 [11].

Construction of the protein-protein interaction (PPI) network and biological processes (BP) analysis of hub genes
The PPI network of hub genes was constructed by STRING database

Statistical analysis
All data were analyzed by R software 3.4.1 (https://www.r-project.org/). In all cases, p < 0.05 was considered as statistical significance .and corrected by Benjamini-Hochberg for multiple testing.  Table 2 and these co-expression modules were represented in different colors ( Figure   3A). Heatmap plot of topological overlap model was plotted based on this gene network, which showed correlation between different co-expression modules ( Figure 3B).

Gene Co-expression modules correspond to clinic traits
The ten identified modules were constructed module-trait relationships heat map ( Figure 4 and Supplementary Table 4). The number of genes in the ten co-expression modules was shown in Table 1. The results revealed that MM in co-expression turquoise module was significantly correlated with longest dimension (cor=0.55, p<0.0001) ( Figure   6A). MM in co-expression magenta module was significantly correlated with gender (cor=1, p<0.0001) and person neoplasm cancer status (cor=-0.65, p<0.0001) ( Figure 6B). MM in co-expression green module was significantly correlated with pathologic N stage (cor=0.48, p<0.0001) and pathologic stage (cor=-0.3, p=0.00064) ( Figure 6C).

KEGG pathway analysis of genes in interested co-expression modules
KEGG pathway analysis revealed 28 statistically significant signaling pathways to involve genes identified in the co-expression turquoise module and 12 pathways in the co-expression green module (Supplementary table 5).
None statistically significant signaling pathways were identified in the co-expression magenta module. It provided clues that DGC might potentially related to multiple signaling pathways, such as such as proliferation, drug resistance, cell metabolism, adhesion molecules. Some of those identified pathway were closely associated with cancer, including calcium signaling pathway, MAPK signaling pathway, ras signaling pathway, transcriptional misregulation in cancer, cell adhesion molecules (CAMs) in turquoise module. Calcium signaling pathway is important for cellular signalling, as once they enter the cytoplasm they exert allosteric regulatory effects on many enzymes and proteins [12]. The MAPK pathway could communicate a signal from a receptor on the surface of the cell to the DNA in the nucleus of the cell [13]. When ras pathway was 'switched on' by the other signals, it subsequently switches on other proteins, which turn on genes involved in cancer cell growth, differentiation and survival [14]. In essence, cell adhesion molecules help cells stick to each other and to their surroundings. Cell adhesion is important in affecting cellular mechanisms of growth, contact inhibition, and apoptosis [15]. Those [16]. Studies have showed that tumor cells energy metabolism characteristic is aerobic glycolysis, namely Warbuerg effect [17].

Hub genes in the co-expression modules
Genes with high intramodular connectivity (MCODE score >6 and p < 0.05) are  Figure 8C).

DISCUSSION
Early symptoms of GC may include upper abdominal pain, heartburn, nausea and loss of appetite, but there are no specific clinical symptoms [18].
Most of GC patients were diagnosed as advanced stage and untreatable by surgery. Lauren histologic type is one of the most important factor associated with pattern of recurrence following resection of gastric adenocarcinoma. DGC showed poor prognosis and this is accompanied by lymph nodes metastasis.
Women incidence of DGC is higher than men [19]. The study of risk factors, such as pathologic N stage or gender, is necessary [20]. Furthermore, some new targeting therapy of DGC may provide new insights into the treatment for DGC patients.
In this study, a total of ten co-expression gene modules were constructed by the top 5000 genes from the 72 human DGC samples by WGCNA analysis method, which was used to identify the relationship between DGC transcriptome (RNA sequencing data) and clinic traits. The co-expression turquoise module, magenta module, and green module were significantly related to some clinical traits. KEGG pathway analysis of genes in those interested co-expression modules revealed 28 statistically significant signaling pathways to involve genes identified in the co-expression turquoise module and 12 pathways in the co-expression green module. Here, we would discuss some important cancer related pathways. For example, the co-expression turquoise module was found to be enriched in MAPK signaling pathway, and FGF18 may participate in angiogenesis in gastric cancer and may be involved in tumor invasion and metastasis [21]. Additionally, neurotrophic tyrosine kinase receptor type 2 (NTRK2) promotes proliferation and invasion, relating to cell dedifferentiation, tumor budding, and poor prognosis of GC [22]. The combination of microtubule associated protein-tau (MAPT) and β-tubulin III (TUBB3) was found to predict chemosensitivity to paclitaxel in gastric cancer in vitro and in vivo. This merits further study and may help guide individual therapy [23]. The co-expression turquoise module was found to be enriched in ras signaling pathway, and contained some key genes (FGF18, FGF5, FGF14,   FGF9, EFNA3, GNG13, FGF10, FGF13, PAK3, RASGRP2, GNG3, SHC3,   GNG4, FGF1, FGF2, SHC4, GRIN2A, IGF1, HGF, HTR7, NGFR, GNB3,   PLA2G3, PLA2G5,  Furthermore, long noncoding RNAs (lncRNAs) are a cluster of noncoding transcripts, which play key roles in various biological processes in cancer [30]. We found that some lncRNAs in the co-expression magenta module were also related to DGC survival, including XIST, TTTY15, NCRNA00185, and CYorf15B. The result might provide new mechanism for DGC.
In conclusion, this study we used WGCNA to identify the co-expression modules were significantly correlated with five clinic traits. These