Detecting Hub Genes Associated With Lauren Subtype of Gastric Cancer by Weighted Gene Co-Expression Network Analysis


 Background Gastric cancer is a rather heterogeneous type of malignant tumor. Among the several classification system, Lauren classification can reflect biological and pathological differences of different gastric cancer.Method to provide systematic biological perspectives, we employ weighted gene co-expression network analysis to reveal transcriptomic characteristics of gastric cancer. GSE15459 and TCGA STAD dataset were downloaded. Co-expressional network was constructed and gene modules were identified. Result Two key modules blue and red were suggested to be associated with diffuse gastric cancer. Functional enrichment analysis of genes from the two modules was performed. Validating in TCGA STAD dataset, we propose 10 genes TNS1, PGM5, CPXM2, LIMS2, AOC3, CRYAB, ANGPTL1, BOC and TOP2A to be hub-genes for diffuse gastric cancer. Finally these ten genes were associated with gastric cancer survival. Conclusion More attention need to be paid and further experimental study is required to elucidate the role of these genes.


Background
Gastric cancer (GC) is the fth most common cancer globally. In 2018, gastric cancer-caused death ranks the third among all kinds of cancer (1). Multiple interactions of genetic, environmental and host factors brings tremendous complexity and heterogeneity to gastric cancer (2). With regard to medical management, several classi cation systems were developed, such as Lauren classi cation, WHO classi cation, Goseki & Ming classi cation. Based on multiple omics data, Asian Cancer Research Group (ACRG) and The Cancer Genome Atlas (TCGA) have also issued their own molecular subtype category system (2,3). However, Lauren classi cation is still the most common used as it could re ect biological differences and individual subtypes within this classi cation do not transform into others (4).
In 1965, Lauren rst described two distinct histologic types of gastric cancer: intestinal and diffuse (5). The intestinal type pathologically exhibits components of glandular or intestinal architecture which is thought to emerge from a sequence of multistep carcinogenesis often caused by chronic mucosal in ammation due to Helicobacter pylori (H. pylori) infection. This type is more common in male and older patients. The diffuse type manifests as poorly cohesive cells in ltrating the gastric wall, and can be a progressive disease ultimately leading to linitis plastica. This type is more common in female and younger patients and frequently associated with familial history (6,7). A recent research reported Lauren classi cation as an indicator for survival and response to chemotherapy in advanced gastric cancer (8). Lauren subtype is even an important factor associated with pattern of recurrence following resection of gastric adenocarcinoma (8). Thus, uncovering the molecular mechanism of Lauren subtype may facilitate further understanding the pathogenesis of GC and providing potential therapy target (9). Although several typical oncogenes such as CDH1, KRAS and RHOA has been related with particular subtype, it is still unclear how the intestinal or diffused GCs are driven by particular genes (10)(11)(12).
Weighted gene co-expression network analysis (WGCNA) is a new systems biological method providing insights into gene networks that might be responsible for phenotypic traits of interest (13,14). In this study, we perform an integrated bio-informatic analysis by using this algorism to identify hub genes for diffuse and intestinal GC based on transcriptomic data from GEO datasets and The Cancer Genome Atlas (TCGA). Furthermore, we found 10 genes that were expressed in diffuse gastric cancer at particular level and correlated with the overall survival of GC.

Materials And Methods
Data acquisition and study design Two datasets GSE15459 and TCGA stomach adenocarcinoma gene expression data were used in this study.
Gene expression matrix and clinical information of Dataset GSE15459 was download from Gene Expression Omnibus database of NCBI (https://www.ncbi.nlm.nih.gov/gds/). TCGA level 3 data and clinical phenotypic data were downloaded from UCSC portal (https://xenabrowser.net/datapages/). The data analysis ow is showed in Figure1. Brie y, GSE154559 data was normalized and genes of which expression variation in all samples rank top 50% were selected to construct gene module by WGCNA. Then, the relation between each module and clinical characteristics was analyzed. Hub gene from modules that has the strongest correlation with diffuse subtype were picked as candidate genes for later survival analysis and validated in TCGA dataset.

Construction of Co-expression Networks
The gene co-expression networks were constructed with WGCNA package in R program (15). After constructing a sample cluster tree to detect the outlying samples, we used the pickSoftThereshold function to calculate the best power (β) value to get a scale-free topology network in which the best power value was selected to generate a weighted adjacency matrix. After all, automatic module detection via dynamic tree cutting was performed to create a cluster tree, and de ne modules as branches of the tree.

Hub gene identi cation
In the network, module eigengen (ME) represents the rst principal of a module and is an optimal summary of the gene expression pro les of a given module. To measure how a gene is close to the identi ed module, WGCNA algorithm issued module membership (MM) conception counted by the correlation of each gene to ME. Gene signi cance (GS) is determined as the absolute value of the correlation of a gene and sample external trait information. Module signi cance is de ned as the average absolute gene signi cance for all genes in a given module.

Functional enrichment analysis of module genes
After the key module gene list was obtained, gene ontology and KEGG pathway analysis were performed by using "clusterPro ler" package of R language (16). P value<0.01 and q value<0.05 were chosen as the threshold to de ne the signi cantly enriched term.

Differential expression (DEGs) analysis
DEGs between diffuse and intestinal gastric cancer in TCGA dataset were analyzed by "limma" package in R software. Genes with |logFC|>1 and adj.P.Val<0.01 were regarded as DEGs. Afterward, DEGs were intersected with hub genes identi ed by WGCNA in GSE15459 dataset to get a nal candidate gene list for survival analysis.
Survival analysis R program package "survival","survminer" (17,18) were utilized to perform survival analysis and draw survival curve. Log-rank test were used to evaluate the association between hub-genes expression and patient survival. Those genes with P value<0.05 were regarded as prognostic gene.

Construction of co-expressional net-work and recognition of co-expression module
We included 10243 genes probes with top 50% expression variance in 192 samples from GSE15459 dataset which has complete clinical information to construct the co-expression network. First of all, sample cluster tree was built to identify outlying samples. No signi cant outliers were detected among these 192 samples (Figure2A). To receive a scale-free net, we selected 3 as the best power value provided by the pickSoftThereshold function with the scale-free R 2 to be 0.89 (Figure2B, FigureS1). Calculated with this power value, the co-expression matrix and the topological overlap matrix (TOM) were established (FiguresS2). We get a network consisted of 12 gene modules (Figure3), the number of gene within each module was showed in Table1. Inter-module correlation analysis suggested black, blue, and green modules were closed to each other, while red, green and yellow were closed (Figure4B).

Intramodular analysis
To measure how a given genes is connected to biologically interesting modules, WGCNA di nes module  (Figure5A). For KEGG pathway analysis, genes in red module were involved in "cell cycle", "DNA replication", "base excision repair", "oocyte meiosis" and "p53 signaling pathway", while blue module genes were involved in "Axon guidance", "cardiomyopathy" and "vascular smooth muscle contraction" ( Figure S5).

Hub gene identi cation
To identify hub genes in red and blue module. We chose genes that meet following criteria as hub gene candidates: ((abs(GS.diffuse)>0.3)&(abs(datKME$kMEblue/red)>0.9)&(NS1$q.Weighted <0.01)) In this screening, 23 genes in red and 32 genes in blue modules were obtained (Table2). To validate the performance of these 55 genes in external dataset, we analyzed the differential expression in TCGA stomach adenocarcinoma comparing diffuse with intestinal subtype. (Figure5B). Fortunately, all these genes were differentially expressed between diffuse and intestinal subtype (P<0.01). There was a higher expression level for the 32 blue module candidate genes in diffuse gastric cancer than those in intestinal subtype, consistent with the result that blue module is positive correlated with diffuse subtype (Figure3A). This is on the contrary for the 23 red module genes, as they intend to get higher level in intestinal subtype while red module intend to be positive associated with intestinal trait. Although all these 55 genes were differentially expressed among these two subtype only ten genes including TNS1, PGM5, CPXM2, LIMS2, AOC3, CRYAB, ANGPTL1, BOC and TOP2A were with an absolute LogFC>=1. Thus, we regarded these 10 genes as the most possible hub genes in diffuse gastric cancer.

Discussion
WGCNA is an informative method for detecting biologically relevant patterns using high dimensional data sets and had been widely applied in transcriptome analysis. It identi es gene modules by assign genes with strongly covarying patterns into groups and allows for the assessment of module relation to sample trait (19).
The advantage of WGCNA lies in its focusing on the association between co-expression modules and clinical traits providing highly reliable and biologically signi cant results (20).
Lauren classi cation mainly distinguishes gastric cancer into two subtypes including diffuse and intestinal gastric cancer and the greatest advantage of this category system is it can perceive histological and biological trait of gastric cancer easily (21).
To our knowledge, the current study is the rst one to utilize WGCNA method for identi cation of hub genes in Lauren gastric cancer subtype. We identi ed 12 gene modules in gastric cancer tissue samples and among these modules, blue and red were the most correlated with diffuse or intestinal subtype otherwise.
Intramodular analysis showed MM correlated strongly with genes signi cance regarding to diffuse phenotype suggesting key genes in these two modules may make sense in diffuse subtype.
Involved in cytoskeleton remodeling, Actin is found to be critical for focal adhesion which is de cient in diffuse gastric cancer (7,22,23). In our study, result of gene functional enrichment analysis showed actin binding proteins were enriched in blue module that was positive with diffuse phenotype implied abnormal acting binding may occur in diffuse gastric cancer. In blue module, genes link to cation channel activity were also found to be enriched. Recent several publications has reported calcium channel protein such as TRPV4 and TRPM2 were involved in gastric cancer cell survival, invasion and disease progression (24)(25)(26). Taken our result into consideration, we propose this case should be con ned in particular gastric cancer subtype.
As regarding to red module which is positive associated with intestinal gastric cancer, GO term including GO: 0140097 (catalytic activity, acting on DNA), GO: 0008017(microtubule binding), GO: 0003678(DNA helicase activity), GO: 0016538(cyclin-dependent protein serine/threonine kinase regulator activity) enriched among this module genes. Lei and colleagues identi ed 3 subtypes of gastric adenocarcinoma: proliferative, metabolic, and mesenchymal. Proliferative subtype is mainly consisted of intestinal gastric cancer (2,27). As gene in these terms are involved in DNA replication and cell proliferation, this result provide a similar scenario for intestinal gastric cancer biology.
Hopefully, we identi ed 10 hub-genes in blue and red module. Nine of these genes including HSPB6, TNS1, PGM5, CPXM2, LIMS2, AOC3, CRYAB, ANGPTL1, and BOC were from blue module and associated with gastric cancer survival. As a member of tensin family, TNS1 is revealed to function as a scaffold for adhesion related signaling by binding to actin cytoskeleton and β1-integrin (28), and it had been studied in acute myeloid leukaemia and colorectal cancer. (29,30). LIMS2,a member of PINCH proteins, that contain 5 LIM domain also plays important roles in cytoskeletal organization and cell-extracellular matrix adhesion, migration, proliferation, and survival (31).
HSPB6 and CRYAB are two members of small heat shock protein famlily (sHSPs) that confer cells the ability to survive under stress conditions (32,33). Previous study discovered that mesenchymal stem cells overexpression HSPB6 rendered cell resistance to oxidative stress via increasing secretion of growth factors including VEGF, FGF-2, and IGF-1 (34). On other side, CRYAB, also as a chaperone can interact with apoptosisassociated protein such as caspase-3, Bax, and Bcl-xS to inhibit cell apoptosis (35).Comprehensively, higher expression of these two gene means diffuse gastric cancer is more tolerant to stress.
In the present study, we also recognize CPXM2 to be a survival-related hub gene in diffuse gastric cancer.
CPXM2 is a carboxypeptidases for C-terminal cleavage of peptide or protein have recently been validated to contribute the progression of liver cancer and osteosarcoma (36,37). AOC3 that is an amine oxidase with copper as a cofactor is an in ammation-inducible endothelial cell molecule mediating leucocyte interactions with the blood vessels for its adhesive property (38). To date, AOC3 is has been showed to be a prognostic marker in breast cancer and astrocytomas (39,40).

Conclusion
Overall, Lauren subtype of gastric cancer have different expression pattern which can be identi ed by WGCNA. HSPB6, TNS1, PGM5, CPXM2, LIMS2, AOC3, CRYAB, ANGPTL1, and BOC might be potential hub gene in diffuse gastric cancer, a subtype with a relatively poorer survival. These nine genes were also prognostic marker for gastric cancer thus more attention need to be paid and further experimental study is required to elucidate the role of these genes.   Construction of co-expressional network. cluster dendrogram of the co-expression gene when setting soft threshold as three. Gene modules were attributed into different color.  A. Gene ontology analysis of blue and red module genes; B. differential expression of the 55 hub gene candidates in blue and red module.