Integrating Multi-Datasets Analysis with Molecular Docking To Explore Potential Therapeutic Compound and Its Mechanism and Target Sites in Gastric Carcinoma

Background Gastric carcinoma still threatens public health with high morbidity and mortality, especially in Eastern Asia. Target therapy possesses a vast value and makes signicant progress in cancer treatment, and it is essential to explore more potential target drugs in gastric carcinoma. Gene expression data was obtained from TCGA (The Cancer Genome Atlas) and GEO (Gene Expression Omnibus) datasets and the interaction networks of chemical-genes was obtained from CTD (Comparative Toxicogenomics Database). Then, GSEA (Gene Set Enrichment Analysis) was utilized to identify the signicant compounds, which related to gene expression in gastric carcinoma. Moreover, PPI (Protein-Protein Interaction) network, screening of hub-genes, KEGG (Kyoto Encyclopedia of Genes and Genomes), and GO (Gene Ontology) analysis were employed to explore potential mechanisms and potential targets of these compounds. Finally, molecular docking was used to analyze the stable binding between the compounds and target proteins. Bazedoxifene was selected by integrating datasets and bioinformatics as a compound. UHRF1, MCM10, HELLS, and DTL with high expression in cancer tissues may be new in gastric carcinoma. Furthermore, Bazedoxifene may target UHRF1 and the pathway of ubiquitin-like protein transferase in gastric carcinoma and has potential therapeutic It is infered that Bazedoxifene may UHRF1 in gastric


Introduction
Gastric carcinoma is the fth common cancer worldwide, with one million new diagnosis cases annually [1]. It is the third most common cause of cancer death, with 784000 deaths globally in 2018, which may be caused by its frequently advanced stage at the rst diagnosis [2]. So, gastric carcinoma seriously threatens the public health.
Traditional treatment which includes surgery, chemotherapy, and radiotherapy have reached bottleneck period because of their great limitations of gastric carcinoma. Because of atypical symptoms of gastric carcinoma in its early stage, many patients have already been in the middle and late stage at rst diagnosis and even with metastasis, thus missing the best opportunity for surgical treatment. Chemotherapy is the main treatment measure for patients with advanced stage gastric carcinoma. Although chemotherapy can prolong some patients' overall survival, it has adverse reactions such as nausea, vomiting, decreased white blood cells and platelet counts [3]. The curative ratio is still unsatisfactory and curative effect of chemotherapy is very limited. In recent years, along with the mechanism of occurrence and development of gastric carcinoma has been gradually elucidated, molecular targeted therapy, including targets on HER2 and VEGF, has gradually made great progress [4,5]. However, the number of patients who meet the clinical indications of targeted drugs is limited, and patients are prone to develop the drug resistance [6]. Therefore, nding new targets for gastric carcinoma treatment and developing related therapeutic drugs is still an urgent mission.
Comparative Toxicogenomics Database (CTD) is a dataset which including datas of chemical-gene/protein interactions, chemical disease, and gene-disease relationships. Many researchers have utilized it to explore the effect of chemicals-gene expression, based on GSEA method [7,8]. AutoDock is a suite of automated docking tools to predict how small molecules bind to a known 3D structure of receptor. It has been widely used for drug prediction and veri cation [9,10]. This study aims to explore the potential drugs and related targets in gastric carcinoma based on CTD, TCGA, and GEO dataset and some bioinformatics methods, including GSEA and Autodock.

Methods
Gene expression data obtained from TCGA and GEO The Cancer Genome Atlas (TCGA) is a public database supported by the National Cancer Institute (NCI) and National Human Genome Research Institute (NHGRI) of USA. It is a landmark cancer genomics program, molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types [11]. In this work, normalized gene expression data of stomach adenocarcinoma (TCGA-STAD-FPKM) and paired normal tissue by utilizing GDC DATA PORTAL (https://portal.gdc.cancer.gov/), o cial website of TCGA, were obtained. NCBI-GEO (https://www.ncbi.nlm.nih.gov/gds) is also an international public repository, including next-generation sequencing and microarray/gene pro les.
This study obtained 3 mRNA microarray expression pro les of stomach tumor tissues and paired normal tissues. Limma package (version 4.0) was utilized to normalize these expression data, which is an essential R/Bioconductor software to process next-generation sequencing or microarray/gene pro les [12].
Comparative Toxicogenomics Database (CTD, http://ctdbase.org/about/) is a robust, publicly available database that aims to fully understand how environmental exposures affect human health. . It provides manually curated information about chemical-gene/protein interactions, chemical-disease and gene-disease relationships [13,14]. Data of chemical-gene interaction and summarized chemical-gene interaction network were downloaded using the DPLYR package (version 0.7.8) of R software.
Identi cation of chemical compounds related to gastric carcinoma by GSEA Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether a priori de ned set of genes shows statistically signi cant, concordant differences between two biological datasets [15]. And some studies had utilized it to explore the relationship between de ned datasets (such as KEGG dataset, immune in ltration cells dataset, and chemical-related genes dataset) and own dataset of gene expression [14,16]. In this study, GSEA was conducted to explore the relationships between chemicals and dataset of each gene expression. Moreover, p< 0.05 was considered statistically signi cant for the results of chemicals related gene. And, venn diagram was used to summarize all chemicals, that obtained from GSEA.
KEGG and GO analysis for potential functions and pathways of the compound The compound was obtained, and paired the genes related to chemical were also selected. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are datasets that aims to establish links between a collective set of genes in the genome and some functions of the cell and the organism [17]. CusterPro ler package (version 3.16) was used to analyze and visualize functional pro les (GO and KEGG) of selected chemical-related genes. p<0.05 was de ned as statically signi cant.

Establishment of PPI network and identi cation of hub-genes
The Search Tool for the Retrieval of Interacting Genes database (STRING, version 11.0) provides credible information in interactions between proteins and supplies detailed annotation [18]. A PPI network of compound-related genes was constructed by using STRING. Cytoscape (version 3.7) is a general-purpose, open-source software which is capable of massively integrate dataof molecular interaction network. Molecular Complex Detection (MCODE) is an app in Cytoscape, which aims to nd densely connected regions of a large molecular interaction network [19]. The most important modules consisting of hub-genes and several relatively important modules in the PPI networks were identi ed by MCODE with selected criteria as following: degree cut-off=2, node score cut-off=0.2, Max depth=100 and k-score=2.
The expression difference of hub-genes between tumor and normal tissue The Gene Expression Pro ling Interactive Analysis (GEPIA) is a web server to analyze the RNA sequencing expression data of 9,736 tumors and 8,587 normal tissues from the TCGA and the GTEx projects standard processing pipeline [20]. It provides vital interactive and customizable functions, including differential expression analysis, pro ling plotting, correlation analysis, patient survival analysis [21]. In this study, it was utilized to explore the difference of hub-genes expression between gastric carcinoma tissues and normal tissues. The result was statistically signi cant when conformed Log2FC >2 and p<0.05.
Molecular docking to explore potential binding proteins of compound ZINC (http://zinc.docking.org/) is a free database of commercially available compounds for virtual screening, which contains over 230 million purchasable compounds in their 3D formats [22]. The 3D structures of chemical compounds were downloaded from ZINC. UniProt is a web dataset which can provide a comprehensive, high-quality and freely accessible protein sequence and functional information [23]. The Protein Data Bank (PDB, https://www.rcsb.org/) was established as the rst open-access digital data resource of biology and medicine, which including vast 3D structure data for large biological molecules (proteins, DNA, and RNA) [24]. So, the 3D structure data of potential binding proteins were obtained from UniPort or PDB dataset.
AutoDock is a suite of automated docking tools designed to predict how small molecules bind to a known 3D structure receptor. AutoDock Vina is a new generation of docking software. It achieves signi cant improvements in the average accuracy of the binding mode predictions while also being up to two faster orders than AutoDock 4 [25]. So, AutoDock (version 4.2) was used for the preparation process, in which water molecules were removed, gasteiger charges were computed, polar hydrogens were added, and non-polar hydrogens were merged. Then it was used to construct a centered grid to ensure coverage of the binding site. AutoDock Vina was utilized to perform docking simulations, generate conformations of a ligand in a complex with the receptor, ranking based on binding energy. Besides, the most conformations (based on binding energy) were visualized in the PyMol (version 2.4) software. And the conformation, which conformed to that docking a nity < -7.5Kcal/mol, was recognized as correct prediction [26,27].

Statistical analysis and visualization
All packages, mentioned above were carried out based on R software (version 3.61). P<0.05 was de ned as statically signi cant in this study. Venn diagram was constructed by FunRich software (version 3.13).

Results
The overview of Gastric carcinoma gene pro le datasets The expression data of gastric carcinoma from TCGA contains 343 tumor cases and 30 normal cases. GSE26899 dataset is an array expression data that contains 96 tumor cases and 12 normal cases and recruited by Baylor college of medicine. GSE54129 dataset is also an array pro le dataset, including 111 tumor samples and 21 normal samples, recruited by Ruijin hospital. Stefan S. Nicolau Institute of Virology recruited GSE103236 dataset, which contains 10 tumor cases and 9 normal cases.
The chemical compound associated with gastric carcinoma By using GSEA analysis for TCGA dataset, 149 chemical compounds related to gastric carcinoma were identi ed (p<0.05). The top 10 chemical compounds were shown in table1. 50 compounds, 160 compounds and 76 compounds were obtained respectively from GSE26899, GSE54129 and GSE103236 datasets (p<0.05), and the top 10 chemical compounds were shown in table 1, respectively. Then, the results of all these datasets were summarized and an intersection was taken. Only one gastric carcinoma-related chemical compound, Bazedoxifene, was selected by Venn diagram (Fig 1). And the details of GSEA results about Bazedoxifene was can be seen in table 2. Taken together, these results suggest that Bazedoxifene is a chemical compound associated with gastric carcinoma. And what signal pathways it can affect in gastric cancer will be explored in the next step.
The Go and KEGG analysis Bazedoxifene was selected as a gene related compound of gastric carcinoma. And the enrichment genes, which were related to Bazedoxifene in gastric carcinoma, were also selected by GSEA, as shown in table 2. So, the union of these Bazedoxifene-related genes were screened, and it were conducted that GO and KEGG analysis to explore the potential biological signaling pathway and functions of Bazedoxifene in gastric carcinoma. The GO result showed that Bazedoxofene was mostly linked with ubiquitin like protein transferase activity, ubiquitin-protein transferase activity, and ubiquitin-like protein ligase activity.
The KEGG result showed that it was related to MAPK pathways, Human T-cell leukemia virus infection and Viral carcinogenesis. The detailed result was shown in Figure 2. Overall, these results indicate that Bazedoxifen act roles in many fronts in gastric carcinoma, and screened the main target site and function is essential.

Construction of PPI network and selection of hub-genes
Function analysis of Bazedoxifene-related genes were conducted above. A PPI network, including 13 notes and 18 edges, was constructed by String, as shown in Figure 3-A. Then, two sub-networks were screened from it through applying Cytoscape MCODE algorithm. And total of eight hub genes (EGR1, EGR2, FOS, DUSP1, UHRF1, MCM10, HELLS, and DTL) were obtained, as shown in Figure 3-B and 3-C.
Meanwhile, GEPIA website was utilized to explore the difference of these hub-genes expression level between gastric carcinoma groups and control groups. As can be seen from Figure 4, there were signi cant differences between gastric carcinoma tissue and normal tissue in the expression levels of UHRF1, MCM10, HELLS, and DTL. The expression levels of all these genes were signi cantly higher in tumor tissues than in normal tissues. Up to now, potential target genes of Bazedoxifene were identi ed by a comprehensive analysis of gene pro le. Whether Bazedoxifene can bind to proteins of these genes-transcription stably will be tested based on 3D structures of proteins.
Molecular docking UHRF1, MCM10, HELLS, and DTL, which had distinct expression differences between tumor and normal tissues, were selected from hub-genes to explore whether they could stably bound with Bazedoxifene or not. Firstly, the 3D structures of proteins, which translated from these selected genes, were obtained from Uniport and PDB datasets. The structures of UHRF1 and DTL were coded as 4GY5, 6QCO in PDB dataset, while, MCM10 and HELLS were named as Q7L590, Q9NRZ9 in Uniport dataset. Structure of Bazedoxifene was got from ZINC dataset. The detailed structures were shown in Figure 5.
Autodock Vina was utilized to analyze the binding site and energy for the mentioned proteins and Bazedoxifene. PyMo visualized each protein's best model (based on binding energy), as Figure 6 showed. It was shown that only UHRF1 (a nity: -7.8 kcal/mol) had stable binging sites (a nity<-7.5 kcal/mol). The best model of MCM10, HELLS, and DTL were regarded as unstable, because their a nity were -6.2 kcal/mol, -7.2 kcal/mol, -6.6 kcal/mol respectively, which were all lower than 7.5 kcal/mol.

Discussion
Gastric carcinoma still threatens public health with higher morbidity and mortality, especially in Eastern Asia, Eastern Europe, and South Africa [2,28]. Recently target therapy represents a vast value and made great progress in cancer treatment, such as in lung adenocarcinoma and colorectal adenocarcinoma [29,30]. However, up to now, excepting the most major target-HER2 and VEGF/VEGFR, other target options,including EGFR and HGF/MET, have emerged but not been proved effective in prolonging the survival time. The homodimers or heterodimers formed in HER family pairs can activate MAPK, PI3K and other signaling pathways, thereby regulating cell proliferation, adhesion, migration, and differentiation [31,32]. Trastuzumab is a type of anti-HER-2 drug that binds to the extracellular domain IV of HER2 and inhibits homodimers' formation. It has shown bene cial in survival, combined with cisplatin and a uoropyrimidine, for the treatment of gastric carcinoma patients with HER2 positive expression [33]. Remizumab is a type of monoclonal antibody and can bind to extracellular domain of VEGFR-2 to suppress angiogenesis. Clinical trial demonstrated that it effectively prolonged survival time when combined with chemotherapy for patients with advanced gastric carcinoma [34]. Cetuximab and Panitumumab are also anti-EGFR antibodies. Clinical trials stated that there had none improvement in survival rates for advanced gastric carcinoma when combined with Cetuximab, and Panitumumab not only failed in improving the survival time, but also increased toxicities [35][36][37]. Meanwhile, Rilotumumab, which can anti-HGF, could improve the prognosis for gastric cancer patients with higher expression of MET, but was proved a worse safety [38,39].Depending on this circumstance, a string of targets and drugs are needed in gastric carcinoma.
Based on this GSEA analysis, which utilized multiple transcriptome data sets and CTD dataset, Bazedoxifene was found closely related with some speci c genes which highly expressed in gastric carcinoma, and it may be a new drug for patients with gastric carcinoma. Bazedoxifene is the third-generation selective estrogen receptor modulator (SERM), which received FDA approval to prevent postmenopausal osteoporosis and moderate to severe vasomotor symptoms associated with menopause [40]. It mainly targeted on estrogen receptor alpha (ESR1) and estrogen receptor alpha (ESR2), which are all nuclear hormone receptor and can activate the expression of reporter genes, containing estrogen response elements (ERE) [41].
Furthermore, some researches demonstrated that Bazedoxifene can be a novel and useful drug in cancer treatment. Jia et al found that Bazedoxifene signi cantly inhibit colorectal cancer cell proliferation and migration by blocking IL-11/GP130 signaling pathway in vivo and in vitro [42]. SJW et al also demonstrated that Bazedoxifene is a novel inhibitor of GP130 signaling pathway, and it can generate synergism when combined with conventional chemotherapy in human pancreatic cancer cells [43]. The study discovered that Bazedoxifene combined with paclitaxel exhibited more potent inhibition of cell viability, colony formation, and cell migration, induced more apoptosis in vitro, and generated stronger inhibition of tumor growth of breast cancer in vivo than either drug alone [44]. Furthermore, a study also implied that it suppressed the development of hepatocellular carcinoma [45].
Interestingly, IL-11/GP130 and ERE related signaling pathway were related with the progression of gastric cancer. Some researches had identi ed that deregulation of gp130-dependent STAT1/STAT3 signal pathway, leading by the crucial cytokine IL-11, involved in the human Helicobacter pylori infection and pathogenesis of gastric cancer [46][47][48].it has been identi ed that decreasing of estrogen receptor binding a nity to the estrogen response element could decrease the expression of the TFF1 gene, which may be involved in development of gastric cancer [49].
In this study, it was shown that UHRF1 was a potential target site for Bazedoxifene in gastric carcinoma. As this study showed, it was screened as a hub-genes from the PPI network and identi ed as an apparent overexpressed gene and demonstrated a stable binding site for Bazedoxifene in protein level. UHRF1 is a nuclear protein gene involved in tumorigenesis and development. It encodes a member of a subfamily of RING-nger type E3 ubiquitin ligases. UHRF1 can participate in DNA methylation modi cation and histone post-translational modi cation through speci c domains, regulate gene expression, and participate in malignant tumors' occurrence and development [50]. That was consistent with our pathway analysis, which indicated that Bazedoxifene mainly targets ubiquitin-like protein transferase activity, ubiquitin-protein transferase activity, viral carcinogenesis MAPK pathway in gastric carcinoma. Meanwhile, UHRF1 is up-regulated in various cancers, including lung cancer, cervical cancer, pancreatic cancer, and so on.Researches reported that knocking down UHRF1 reduced tumor cell proliferation and promote apoptosis, so it is therefore considered to be a therapeutic target [50][51][52][53]. In gastric carcinoma, The UHRF1 expression in serum and tissue were signi cantly higher than health control and may represent a novel biomarker for diagnosis and prognosis [54]. Meanwhile, Zhou L et al also reported that UHRF1 was overexpressed, and it was an independent and signi cant predictor. Moreover, downregulation of UHRF1 suppressed cell proliferation in vitro and in vivo, and UHRF1 upregulation showed opposite effects. Also, UHRF1 overexpression promoted gastric carcinoma proliferation and carcinogenesis by inhibiting apoptosis and increasing the G1/S transition and inducing the hypermethylation of 7 tumor-suppress-genes (CDX2, CDKN2A, RUNX3, FOXO4, PPARG, BRCA1, and PML) [55]. Other research also indicated that UHRF1 could promote the growth, migration, and invasion of gastric carcinoma cells and inhibited apoptosis via a ROS-associated pathway [56]. So, Bazedoxifene may act at UHRF1 to play a role in treating patients with gastric carcinoma.
Some other genes may also be targets for gastric carcinoma, including MCM10, HELLS, and DTL, as this study shown. MCM10 is a subtype of Minichromosome Maintenance proteins (MCM) family, and is essential for replication origin ring [57]. MCM10 can combine with other proliferative markers, initiate DNA replication of cell cycle, thereby mediating cell proliferation, and recruit DNA polymerase-α and a catalytic subunit, preventing the damage of DNA [58]. Moreover, MCM10 was overexpressed in a series of cancers, including pancreatic cancer, cervical cancer, and esophageal cancers. It promoted tumor progression and maybe a prognostic biomarker and potential therapeutic target [59]. HELLS is the primary member of the SNF2 chromatin remodeling enzyme family, and is related to the processes of DNA replication, repair, recombination, and transcription, and involved in typically interacts with DNA methyltransferases [60]

Conclusion
This study conducted an integrated analysis for gene expression data, and chemical -gene interaction networks. GSEA was utilized to identify Bazedoxifene is a stomach-cancer-related compound. This study also underwent a functional analysis and constructed a PPI network to implied the potential function of Bazedoxifene in gastric carcinoma. Moreover, some hub-genes were selected and verify whether they can bind with Bazedoxifene stably. Then, it was found that Bazedoxifene may play a role in the anti-tumor effect by targeting UHRF1 in gastric carcinoma. Furthermore, this present method can be used to analyze other chemicals and malignant disease, which helps assess the potential drugs and target sites. Availability of data and materials: The datasets used and analyzed during the current study are available from the TCGA and GEO database.

Abbreviations
Authors' contributions: JR conceived and supervised the study; JR, SHB, LC, AMJ, YYL, HJK, ZDF, GZL, WM, RL and ZJJ analyzed data; JR and SHB wrote the manuscript; JR, SHB and XZZ made manuscript revisions. All authors have read and approved the nal version of this submission.
Ethics approval and consent to participate: This study was approved by the Ethics Committees of Xi'an Jiaotong University.  Figure 1 The Venn diagram about. Chemical compounds from TCGA, GSE26899, GSE54129 and GSE103236 datasets were taken intersection, and only one compound was screened in this diagram.    The results of molecular docking. A, the best model (affainty: -6.6 kcal/mol) of bazedoxifene binding to DTL. B, the best model (affainty: -7.8 kcal/mol) of bazedoxifene binding to UHRF1. C, the best model (affainty: -6.2 kcal/mol) of bazedoxifene binding to MCM10. D, the best model (affainty: -7.2 kcal/mol) of bazedoxifene binding to HELLS.