Weighted Gene Co-expression Network Analysis Reveals ALDH1B1 As a Prognositc Biomarker Involved in Gastric Cancer Development

Gastric cancer (GC) is one of the leading cancers associated with high mortality and poor prognosis mainly due to its relatively late diagnosis and the limited therapeutic options. Consequently, screening for prognostic GC biomarkers and novel molecular therapeutic targets is necessary to promote patient outcomes. Methods Weighted gene co-expression network analysis (WGCNA), a systems biology approach, was applied to analyze the mRNA sequencing data and clinical information of GC patients obtained from The Cancer Genome Atlas (TCGA). Gene modules and clinical traits were constructed according to the Pearson correlation analysis, and the gene ontology (GO) and functional enrichment analysis of meaningful modules were carried out. Hub genes from meaningful modules were screened out by two approaches: the intra-modular and protein-protein interaction (PPI) analysis methods. Next, through upstream regulatory analysis, hub genes with high connectivity degree were further validated with differential expression analysis, Kaplan-Meier survival analysis, and the Cox regression model. We found that seven modules were associated with the following clinical traits: anatomical location of gastric adenocarcinoma, histological type, histological grade, and pathological stage. The hub gene ALDH1B1 was found to have potential as a biomarker for gastric cancer cells, the relationship between this hub gene and gastric cancer drug treatment is also worthy of attention. These ndings may contribute to understanding the GC tumourigenic mechanisms, as well as provide new potential prognostic factors and molecular therapeutic targets for GC. The ALDH1B1 hub gene also provides a new vantage point for further clinical experiments and large-scale cohort studies to validate its association with GC patient survival, and provide a new direction for the research of gastric cancer drug treatment.


Abstract
Background Gastric cancer (GC) is one of the leading cancers associated with high mortality and poor prognosis mainly due to its relatively late diagnosis and the limited therapeutic options. Consequently, screening for prognostic GC biomarkers and novel molecular therapeutic targets is necessary to promote patient outcomes.

Methods
Weighted gene co-expression network analysis (WGCNA), a systems biology approach, was applied to analyze the mRNA sequencing data and clinical information of GC patients obtained from The Cancer Genome Atlas (TCGA). Gene modules and clinical traits were constructed according to the Pearson correlation analysis, and the gene ontology (GO) and functional enrichment analysis of meaningful modules were carried out. Hub genes from meaningful modules were screened out by two approaches: the intra-modular and protein-protein interaction (PPI) analysis methods. Next, through upstream regulatory analysis, hub genes with high connectivity degree were further validated with differential expression analysis, Kaplan-Meier survival analysis, and the Cox regression model.

Results
We found that seven modules were associated with the following clinical traits: anatomical location of gastric adenocarcinoma, histological type, histological grade, and pathological stage. The hub gene ALDH1B1 was found to have potential as a biomarker for gastric cancer cells, the relationship between this hub gene and gastric cancer drug treatment is also worthy of attention.

Conclusion
These ndings may contribute to understanding the GC tumourigenic mechanisms, as well as provide new potential prognostic factors and molecular therapeutic targets for GC. The ALDH1B1 hub gene also provides a new vantage point for further clinical experiments and large-scale cohort studies to validate its association with GC patient survival, and provide a new direction for the research of gastric cancer drug treatment.

Background
Despite advances in cancer research, gastric cancer remains the fourth most common cancer and the second leading causes of cancer-related mortalities worldwide. [1] Over 950,000 new cases of GC are diagnosed annually, and an estimated 720,000 GC-related deaths were reported in 2012. [2] GC is a highly heterogeneous disease in terms of tumour architecture and growth, cell differentiation, histogenesis, and molecular pathogenesis. This diversity is re ected by the array of GC histopathological classi cation schemes. [3] For example, the Lauren classi cation [4] classi es GC tumours into intestinal and diffuse types.
A biopsy sample of intestinal-type GC demonstrates well-demarcated, polypoid-glandular formation, with non-in ltrative proliferative stromal and tubular epithelium, resembling colorectal adenocarcinomas. The diffuse-type exhibits solitary and/or poorly cohesive in ltrative neoplastic which spread diffusely along the gastric wall and invaginate into muscularis propria and underlying lymphatics, with no gland formation.
The majority of new cases of GC present in an advanced and unresectable stage because asymptomatic patients were inadequately screened or managed for non-speci c early-stage GC. Despite receiving the standard of care chemotherapy, these patients with advanced GC have a median survival time less than one year. This may be explained by the limited therapeutic e cacy of chemotherapy for late-stage GC. [5] [6] To address the limited therapeutic options, molecular-targeted therapies for advanced GC have been introduced clinically. Up to 20% of GC biopsies overexpress human epidermal growth factor receptor 2 (HER2) due to HER2 gene ampli cation within malignant cells [7]. Furthermore, HER2 expression is correlated to the grading of stages of tumour differentiation. Trastuzumab, a humanized monoclonal antibody directed against HER2, has been demonstrated to improve overall survival in advanced GC patients and is now approved for rst-line treatment of advanced HER2-positive gastric adenocarcinomas.
Anti-angiogenic agents have also been investigated as treatments for advanced GC, as vascular endothelial growth factor (VEGF) expression is correlated to tumour angiogenic microenvironment, tumour size, and metastatic dissemination. Patients receiving a combination of capecitabine, cisplatin, and bevacizumab, a monoclonal antibody directed against VEGF, did not demonstrate greater survival compared to the placebo, capecitabine-cisplatin, or bevacizumab-only groups in the AVAGAST trial [8]. Given the poor long-term therapeutic outcome associated with chemotherapy and scarce molecular-targeted therapy for advanced GC, developing and trialing other molecular-targeted agents is warranted to improve prognosis and increase overall survival rate.
Discovering novel candidate genes involved in tumorigenesis may pave the way for novel molecular-targeted therapies.
While the weighted gene co-expression network analysis (WGCNA) approach is widely used in tumourigenesis research, it has not yet been widely applied to the GC-derived dataset documented in The Cancer Genome Atlas (TCGA). The TCGA's large-scale database, detailing gene sequencing data and clinicopathological information, enables systematic analysis of molecular mechanisms underlying the development and progression of various clinical features associated with cancers. [9] WGCNA is a genome-wide analysis method used to construct gene co-expression networks based on inter-sample similarities in expression pro les. It is capable of ascribing genes with function and infer gene-disease associations. Highly co-expressed genes are interconnected in the network. Among these, similar genes can be sub-classi ed into different co-expression modules based on the different clinical traits. Each co-expression modules consists of functionally related genes that are also involved in solitary functions. Hub genes are identi ed through the most central and connected genes in the modules. These modules and their candidate genes may be associated with the biological processes of GC, such as tumorigenesis, pathogenesis, tumour progression, neoplasm invasion, and metastasis. Thus, the signi cant module genes have the potential to have widespread clinical implications as either potential warning signatures or therapeutic targets.
In this study, WGCNA is applied to screen out key biomarkers associated with GC clinical features based on the correlation analysis between mRNA data of GC patient samples and clinical information acquired from the TCGA database. These information modules may lead to new prognostic markers or therapeutic targets.

Methods
Gene expression data and pre-processing RNA sequencing data sets of GC and the corresponding clinical traits were obtained from the TCGA database (May 3, 2018) (http://portal.gdc.cancer.gov). The sequencing data of mRNA expression pro les was derived from the platform of Illumina HiSeq V2 RSEM genes. Gene expression level was calculated as measurement of fragments per kilobase of transcript per million mapped reads (FPKM). Related clinical information was extracted for WGCNA analysis, which included sex, age at initial pathological diagnosis, anatomical location, AJGG clinical TNM staging (clinical stage, Tumor, Lymph Node and Metastasis), pathological stage (pathological stage I, II, III and IV), histological grade (grade 1, 2, 3 and 4), and histological type.
According to the Lauren classi cation, GC is categorized into two main histological types, diffuse and intestinal, in addition to the mixed and indeterminate types. [4] In the WHO classi cation, GC is subdivided into tubular, papillary, mucinous, poorly cohesive, or rare variants, based on the predominant histological patterns of the carcinoma. [10] The tubular and papillary carcinomas corresponds closely to the Lauren classi cation intestinal-type. Similarly, the poorly cohesive carcinomas (encompassing cases constituted partly or totally by signet ring cells) corresponds closely to the Lauren classi cation diffuse-type.

Gene co-expression network construction analysis
Co-expression network were conducted based on the protocol of WGCNA package in R software [11,12]. Firstly, the similarity matrix between the gene expression pro le was constructed according to the pairwise Pearson's correlation coe cients, which re ect the degree of consistency between gene expression pro les. Next, we used the function power adjacency to transform the similarity matrix into an adjacency matrix. The adjacency matrix was used to measure the connectivity strengths between pairwise genes.
Scale-free gene co-expression networks were constructed using the function power. [11] To validate the results of network construction as being reliable, outliers with connectivity values of less than -2.5 were excluded [12].When a scale-free topology index R 2 calculated by function PickSoftThreshold is greater than 0.85, it is suggestive of the network topology approximating to scale-free status, and the corresponding power value is selected. Subsequently, the adjacency matrix was transformed into a Topological Overlap Matrix (TOM) and the corresponding dissimilarity TOM (dissTOM). The topological overlap is a biologically meaningful measure of gene similarity on basis of co-expression relationships between paired genes. [13] To classify the genes with similar expression pro les into the same module, module classi cation was conducted with the dynamic branch-cut method to generate modules according to the dissTOM-based hierarchical clustering. During the process, a deep-split value of 2 to branch splitting and a minimum-size cut-off of 30 (minClusterSize=30) for the module size were selected to prevent abnormal modules from being produced.

Constructing module-trait relationships of Gastric Cancer
The module eigengene (ME) as the dominant component of the selected module was evaluated by function ME. It represents the gene expression pro les of a given module and captures the maximum variation in the module. If the MEs' correlation of modules were greater than 0.75, the modules would show the similar expression pro les and be integrated together. The correlation between MEs and clinical traits were assessed by Pearson's correlation tests, and p < 0.05 was deemed strongly correlational.

Finding meaningful modules and functional annotation
The correlation between modules and corresponding clinical traits was estimated by using Pearson's correlation test. Modules found to be closely correlated to clinical traits were identi ed as biologically meaningful modules. However, the mechanism of how module genes in uenced the related clinical-traits remained unknown, so all genes in the meaningful modules were mapped onto the DAVID database for Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. [14] A P <0.01 and false discovery rate (FDR) <0.01 were set as the cut-off criteria to screen the annotation information.

Hub genes identi cation and association analysis
Module hub genes are highly connected intra-modular genes, which have the highest module membership (MM) scores (Pearson's correlation >0.8) to every module. MM is a virtual gene which describes the relationship between the ME and the gene expression pro le. As well, MM quanti es an associated degree to describe how close a gene is to the corresponding module. Gene signi cance (GS) parameter is introduced to weigh the correlation (Pearson's correlation >0.2) between a hub gene and clinical signi cance. Therefore, based on GS and MM, genes having a signi cance for a clinical feature and module membership can be identi ed for intra-modular analysis. 12 Representative network plots depicting the modules were constructed using Cytoscape software. Node centrality, assessed by node degree and number connections, identi es functionally hub genes. Moreover, protein-protein interaction (PPI) information is assessed by the online tool, search tool for the retrieval of interacting genes (STRING) [15]. The software, STRING, was used to explore any relationship of PPI networks among the meaningful modules. Meaningful modules genes were uploaded onto the database and a plausible PPI network was constructed with chosen con dence >0.4 and was imported into Cytoscape software. In the PPI network, genes having a connectivity value of ≥10 were de ned as hub genes. Hub genes being in co-expression network and in PPI network were regarded as common hub genes.

Hub gene with upstream regulatory network analysis
To explore the regulatory mechanisms at the transcriptional level, we investigated the hub genes as targets. First, we used the comprehensive experimentally validated miRNA-gene interaction data collected from TarBase and miRTarBase database to identifying upstream miRNAs of the hub genes. The ENCODE ChIP-seq database was then used to examine their upstream TFs. The networks were analyzed by NetworkAnalyst online tool and constructed by using Cytoscape to search for TFs and for miRNAs.

Hub genes validation
The gene expression pro ling interactive analysis (GEPIA) online tool (http://gepia.cancer-pku.cn)[16] was used to perform the differential expression of hub gene. The GEPIA was a newly developed interactive web server that uses a standard processing pipeline to analyze RNA sequencing expression data referenced from TCGA and GTEx Projects. Furthermore, survival analyses were performed with another online database (http://kmplot.com) which provided published microarray datasets for four cancer types besides GC, and included clinical data and gene expression information of 1065 GC patients' datasets. A Kaplan Meier survival plot was generated and a hazard ratio with 95% con dence intervals and log-rank p values were calculated and plotted in R, by using Bioconductor packages [17].

Statistical analysis
The data were analyzed using stata14.0 software. Continuous variables were assessed by student's t-test, while group comparisons of count data were assessed by chi-square test. The univariate and multivariate Cox proportional hazards regression models were evaluated for estimating the independent prognostic values. A P<0.05 was deemed statistically signi cance.

Gene co-expression network of GC
A ow chart of this study is shown in Figure 1. The RNA sequencing data of GC from 375 patients downloaded from TCGA database and the expression values of 56831 genes,were applied to construct the co-expression network. Cluster analysis was performed on these samples using the ashClust package.
After outliers were removed, the soft threshold power value was sat to four (Figure 2), to de ne the adjacency matrix between all genes with power function, according to the standard scale-free network distribution.
Forty-six gene co-expression modules, clustered in size from 31 to 2474 genes, were partitioned by hierarchical clustering and by dynamic tree cutting ( Figure   3A). The unique colour identi er for each module is listed in Additional Table 1. Among these modules, three were merged because of their similar MEs, and forty-three modules were identi ed in total. The gray module represented a gene set that was not assigned to any of the modules. In total, the interaction of the forty-two co-expression modules were analyzed ( Figure 3B ). Abbreviation: GEJ: gastroesophageal junction.
As shown in the network heatmap plot (Additional Figure. 1), each module represented individual validation to one another. To further explore the coexpression similarity of the whole modules, MEs were quanti ed and were clustered according to modules' correlation ( Figure. 4A). These 42 modules were classi ed into four main cluster sets. Within these cluster sets, the big clustering branch included 26 modules that could be subdivided into 12 sub-clusters.
The remaining three clusters were subdivided into 16 other modules. The left and right branch of the clustering tree were both also divided into ve subclusters. This result was consistent with the heatmap plot of the adjacencies (Figure. 4B).

Modules correspond to clinical signi cance
To investigate the corresponding clinical feature of the module, the Pearson correlations were analyzed between MEs and clinical traits, including: sex, age, clinical TNM stage, histological type, anatomical subdivision of neoplasm, histological grade and pathological stage ( Figure 5). Eight modules were positively or negatively correlated with the de ned clinical traits. The relative highest association with the module-feature relationship, was between the tan module and the neoplasm of anatomical location (at the gastroesophageal junction, GEJ) in histological region. The second signi cant association was between turquoise, violet modules, and the histopathological differential features of diffuse-type, histological grade G3, and pathological stage Ia.
A scatter plot of gene signi cance (GS) versus module membership (MM) were produced with select modules that demonstrated high modulefeature associations. (Additional Figure 2). GS and MM were that demonstrated high correlation include the tan, turquoise, purple modules. The black modules had a relatively weaker GS-MM correlation and were consequently excluded from the functional enrichment analysis.

Functional enrichment analysis of genes in meaningful modules
Biological signi cance of selected modules was assessed using GO term function analysis, speci cally: biological process, cellular component and molecular function, and KEGG pathway enrichment analysis. All genes in modules of interest were imported to DAVID software and string-db online analysis.
Light-green module genes were from the nuclear component and involved in RNA binding and poly (A) RNA binding, and were correlated to RNA splicing and RNA processing (Additional Table 2). Enrichment analysis showed that the tan module was involved mainly in epidermis tissue development and cell-cell junction during GC progression. Coding proteins from the tan module genes were also mostly correlated to the composition of extracellular exosome and zona adherent junction. Grey60 module genes were involved in cellular transcription, their coding proteins were the component of nucleus and nucleoplasm.
In modules of histological type, the turquoise, orange and salmon modules underwent enrichment analysis. The steel-blue module was excluded from this analysis. The turquoise module was associated with oxidation-reduction process and cell-cell adhesion, which were enriched in metabolic pathways. The orange module genes were correlated to angiogenesis. The salmon module genes were shown to play a role in exerting regulatory roles in in ammatory and immune responses. The coding proteins of these genes were the component of extracellular exosome and lysosomal lumen, the pathway was therefore involved with lysosome and phagosome.
The histological grade and pathological stage were also assessed in the modules. Here, the violet modules were excluded from enrichment analysis due to insu cient enrichment results. The purple module genes were from cytosolic transduction signal molecules, which were involved in the positive regulation of GTPase activity and protein binding, and were mainly enriched in immune response such as leucocyte migration. The pathway of this module was mainly regulated via NF-κB signaling pathway and human T-lymphotropic virus I (HTLV-I) infection.

Identi cation of hub genes from meaningful modules
The role of hub genes in modules suggest the biological signi cance of these modules. High MM value determines which genes were in network center and played essential biological function in whole network. To identify central nodes of these modules, intramodular analysis were conducted in seven modules including grey60, light-green, orange, purple, salmon, tan, and turquoise modules. From this, 293 genes with high intramodular connectivity value were identi ed as hub genes ( Figure 6 and Additional Table 3). Another analytical approach, PPI network analysis, was introduced to analyze module genes. Hub gene criteria was set as the threshold value of con dence >0.4 and connectivity degree of ≥10. A total of 140 genes with high connectivity degree were obtained from six modules analyzed (the grey60, light-green, orange, purple, salmon and tan modules) (Additional Table 3). Turquoise modules were not assessed by String software, the number of genes exceeded the threshold setting of gene capacity. Finally, 47 intersectional hub genes from two analytical methods were collated, which were de ned as common hub genes.

Common hub gene upstream regulator analysis
To investigate the upstream regulators of common hub genes, the miRNAs and TFs were predicted with the two databases of TarBase and miRTarBase [18]. High connectivity value ≥10 was set as a criterion to screen for the common hub genes. As shown in Additional Table 4, predicted miRNAs of four modular genes were constructed, while grey60 and tan modules were excluded due to low connectivity of modular genes. To further understand the regulatory mechanism between TFs and common hub genes, TF-gene networks of each module were constructed by an online tool, NetworkAnalyst[18], (Additional Table 5). Finally, 16 common hub genes with high connectivity degree with miRNAs and TFs from different modules, were selected as GC biomarkers.

Identi cation and validation of common hub genes
To validate the 16 common hub genes of modules further, differential expression and survival analyses were performed by GEPIA online tool and KM plotter database separately. Five common hub genes, ITGAL, CD86, ALDH1B1, FAM83D and HSPB6 had signi cantly different expression ( Figure 7A). Survival analysis showed ITGAL, ALDH1B1 and FAM83D were correlated with better prognostic outcomes for GC patients ( Figure 7B). The genes CD86 and HSPB6 were not included in the analysis because their abnormal regulation implied a longer survival time. The clinicopathological parameters of three common hub genes were analyzed in Table 1. Compared to ITGAL and ALDH1B1, high FAM83D expression was only associated with the living status of GC patients (p = 0.002).

Discussion
The progression and prognosis of GC in different patients remains variable. Further studies in biomarkers with improved prognostic or predictive value are needed to investigate GC molecular pathogenesis mechanisms as this can help lead to more statistically accurate conclusions and provide clinical information for best evidence-based practice. Here, we used WGCNA analysis to screen biomarkers associated with GC progression and prognosis. WGCNA has proven to have signi cant advantages over other methods, such as focusing on the association between co-expression modules and clinic traits, which can produce results with much higher reliability and biological signi cance. [19] The characteristics of clinical traits-associated modules was changed variously followed by the different clinical traits. Seven modules that were positively correlated with GC clinical traits were selected with the Pearson correlation and p value. Related to the anatomical location of GC, functional enrichment analyses showed that three modules genes, light-green, tan, grey60, were involved in RNA splicing or RNA processing, cell-cell adhesion, epidermis tissue development, and cellular transcription. RNA alternative splicing is often implicated in tumourigenic progression, including cancer initiation and metastasis.
Aberrant patterns of splicing have been shown to lead to transcriptome instability, growth-inhibiting signals resistance, uncontrolled metabolic processes, and dysregulated cellular phenotype, all of which contribute to cancerous growth [20][21][22][23]. Moreover, cell-cell adhesion plays a pivotal role in metastasis and invasion as the lost of adhesive molecules increases the in ltrative and metastatic potential of tumor cells. [24] Cellular transcription regulation, including coding RNA and non-coding RNAs, has also been closely correlated to cancer progression and prognosis. Although the above biological processes were associated with GC oncogenesis, no reported studies thus far have unveiled the relationship between the aberrant biological processes and GC anatomical location.
Turquoise, orange, and salmon modules, represented the three histological subtypes, diffuse, mucious and tubular type. These modules were shown to be independently in cell adhesion, angiogenesis and in ammatory response. The turquoise module was also found to be associated with patients' age. Similarly, the salmon module was also implicated in the size or expansive scope of tumors. Diffuse-type GC has a well-delineated pathogenesis wherein most diffusetype GC has been shown to be driven primarily by defective intercellular adhesions caused by the loss of E-cadherin expression [25]. About 50% of the diffusetype GC and lymph-node metastases is associated with somatic mutations within the E-cadherin gene; this is not the case in intestinal-type GC. [26] Consequently, abnormal E-cadherin variants is closely related to the diffuse-type GC. However, no direct proof exists to elucidate how cell-cell adhesion mediates tumourigenesis of GC in patients with increasing median age. Studies have described the role of angiogenesis, but there is no consensus on the molecular subtyping of GC, further investigation on whether the mucinous gastric adenocarcinoma is closely associated with the angiogenesis molecules will be necessary. In ammatory responses facilitate GC progression through several mechanisms. First, the activation of the COX2/PGE-2 signaling appears to be among the major pathways. [27] The innate immune response also mediates carcinogenesis through the TLR/MyD88 adapter signaling [28,29]. There also appears to be crosstalk between tumour cell-derived and macrophage-derived in ammatory factors during H. pylori infection which promote the in ammatory microenvironment disorder that leads to the acquisition of tumour stem cell properties and progression into gastric epithelium. While many studies have investigated how in ammation plays a role in GC, limited studies have examined the in ammatory changes that occur in tubular-type GC.
The histological grades were associated with the turquoise and purple modules. Cell-cell adhesive loss of turquoise module genes were involved in inhibiting apoptosis and invasion of GC cells, thereby providing opportunities for tumour cell expansion and metastasis [30][31][32]. Adhesive molecules played a crucial role in histological malignancy grading, but which histological grade the module genes being to could not be predicted de nitely. Purple module involved components of signal transduction in NF-κB signaling. The NF-κB signaling pathway, has been shown to be upregulated in many steps of tumorigenesis. For example, H. pylori infection in gastric epithelial cells in GC induces activation and nuclear translocation of NF-κB [33]. Consequently, HuR, a direct transcript target of NF-κB, is activated in GC cell lines, exerting proliferative and anti-apoptotic effects on GC [34]. However, there is no signi cant link between NF-κB and pathological grade.
Among these meaningful GC progression-associated modules, high connectivity value of module genes from intramodular analysis and PPI analysis were selected out as common hub genes. Meanwhile, the upstream regulatory networks of common hub genes also were investigated. Next, differential analyses and survival analyses led to the selection of three common hub genes, ITGAL, ALDH1B1, FAM83D, from the turquoise module. Further analysis and literature review illustrated that the hub gene ALDH1B1 have potential as a biomarker for gastric cancer cells.
ALDH1B1 is a member of the ALDH family and one of the isoenzymes of the aldehyde dehydrogenase 1(ALDH1), and the highly ALDH1 activity has been determined in a variety of cancer stem cells including gastric cancer [35][36][37]. On the basis of previous studies on ALDH1,many researchers have conducted research on the ALDH1 isoenzymes. Kai Li and Jia-Xin Shen et al both found that the high level of ALDH1B1 was associated with better overall survival in GC patients in their [38,39].And this is consistent with our study.
In addition,ALDH can also play a role as a functional regulators in cancer stem cells,which is also worthy of attention [40].First, ALDH is an essential component for the biosynthesis of retinoic acid (RA) and other molecular regulators of cellular function.In this way,ALDH can promote drug resistance through retinoid signaling. Second, Cancer cells with high expression level of ALDH can develop drug resistance of the ALDH-speci c activity catabolyzes particular drugs through oxidation of the speci c aldehyde of the drugs,such as cyclophosphamide and its active derivative hydroperoxycyclophosphamide (4-HC) [41,42],doxorubicin [43], cisplatin [44], arabinofuranosyl cytidine (Ara-C) [45], and dacarbazine [46].As a member of the ALDH family, we believe that ALDH1B1 may have similar functions and these studies will provide us with directions for further research on ALDH1B1.
Several limitations of our study should be taken into account. Although the most pivotal hub genes out of 16 common hub genes was ltered out by bioinformatics methods, inadequate persuasive experimental evidence exist to support our ndings' impact on the prognostic value of biomarker in GC patients.
In summary, WGCNA, a systems biology approach, was adopted and used to analyze RNAseq data and clinical information of GC patients in TCGA database. ALDH1B1, which has been previously suggested to play a crucial molecule in GC development and progression, was as a possible new GC prognostic marker and therapeutic target. Nevertheless, this biomarker needs to be further validated with basic experiments and large-scale cohort studies.

Declarations
Ethics approval, guidelines and consent to participate: Not applicable

Consent for publication
Not applicable Availability of data and materials The basic used in this study can be obtained from the TCGA(https://www.genome.gov/Funded-Programs-Projects/Cancer-Genome-Atlas) database, and other data have been included in the manuscript.And we declare that all methods were carried out in accordance with relevant guidelines and regulations.

Competing interests
The author reports no con icts of interest in this work. Cluster dendrogram was generated by hierarchical clustering based on dissimilarity measure (dis-TOM) of genes. The branches correspond to modules of highly interconnected groups of genes. Two colored bars below the dendrogram represent the original modules and merged modules. Forty-ve modules were identi ed by Dynamic Tree Cutting method. Each module was assigned a color as an identi er. Forty-three modules were generated after merging according to the correlation of modules.