Identification  of Significant Genes with poor prognosis in Liver Metastasis of Colorectal Cancer via Bioinformatical Analysis


 Background: Colorectal cancer (CRC) is a common malignant tumor in the world wild, and more than 50% patients have liver metastases. Purpose: The purpose of this study is to identify significant genes with poor outcome and the underlying mechanisms of CRC liver metastases. Methods: Gene expression profiles of GSE50760, GSE41568 and GSE14297 are available on website of GEO database. Differentially expressed genes (DEGs) between CRC liver metastases and primary tissues were picked out by GEO2R tool and Venn diagram software. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to analyze Kyoto Encyclopedia of Gene and Genome (KEGG) pathway and gene ontology (GO). Then protein-protein interaction (PPI) of these DEGs was visualized by Cytoscape with Search Tool for the Retrieval of Interacting Genes (STRING). Results: There were total of 147 consistently expressed genes in the three datasets, including 123 up-regulated genes and 24 down-regulated genes enriched in complement and coagulation cascades, drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, prion diseases, chemical carcinogenesis, staphylococcus aureus infection and linoleic acid metabolism. Of PPI network analyzed by Molecular Complex Detection (MCODE) plug-in, all 39 genes were selected. Moreover, for the analysis of CRC survival among those genes, Kaplan–Meier analysis was implemented and 4 (SERPING1 ITIH2 CDH2 APOE) of 39 genes had a significantly worse prognosis. Conclusion: we have identified four significant DEGs with poor prognosis in CRC liver metastases on the basis of integrated bioinformatical methods, which could be potential therapeutic targets for CRC patients with liver metastases.


Introduction
Colorectal cancer (CRC) is the third most common cancer and one of the leading causes of cancer-related death worldwide (Siegel, Miller et al. 2018). Liver metastasis is a major cause of death in patients with cancer including CRC. According to the statistics, nearly 20% of patients already have liver metastasis when they are diagnosed with colorectal cancer and more than 50% develop metastases during their disease course (Fakih 2015, Zarour, Anand et al. 2017. Although some potential genes and mechanisms related to liver metastasis of colorectal cancer have been found (Huang, Tan et al. 2018, Pretzsch, Bosch et al. 2019, the treatment of liver metastasis of colorectal cancer has not been well developed. Therefore, we should nd more reliable targets to improve the treatment effect and to better understand the mechanism of CRC metastasis. With the development of gene chip and RNA sequencing technology, we can get many gene expression data from public databases more accurately and quickly (Vogelstein, Papadopoulos et al. 2013). In addition, the application of bioinformatics in the eld of colorectal cancer in recent years has provided us with more effective tools to nd new mechanisms and targets.
In this study, we selected GSE50760, GSE41568 and GSE14297 from Gene Expression Omnibus (GEO) and obtained the commonly differentially expressed genes (DEGs) in these three datasets. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to analyze these DEGs including molecular function (MF), cellular component (CC), biological process (BP) and Kyoto Encyclopedia of Gene and Genome (KEGG) pathways. Then, we established the protein-protein interaction (PPI) network, and used the MCODE app of Cytoscape to further analyze the core genes. and verify the relationship between these major genes and prognosis (P < 0.05) via UCSC Xena online database. Taken above,We found that four commonly differentially expressed genes (SERPING1 ITIH2 CDH2 APOE) were associated with poor prognosis, and all these four genes were associated with extracellular exosomes. In conclusion our bioinformatics analysis refults provides additional useful biomarkers which could be an effective target for patients with CRC liver metastasis

GEO data information
We obtained the CRC liver metastasis datasets of GSE50760, GSE41568 and GSE14297 from the free public database NCBI-GEO. GSE50760, GSE41568 and GSE14297 were account of GPL11154, GPL570, GPL6370 Platforms and contained 18 liver metastases and 18 primary foci, 94 liver metastases and 39 primary foci and 18 liver metastases and 18 primary foci, respectively. Data processing of differentially expressed genes (DEGs) DEGs between Primary specimen and Liver Metastasis specimen were identi ed with |logFC| > 1 and adjust P value < 0.05. Then, DEGs list were imputed in Venn software online to detect the commonly DEGs among these three datasets. The DEGs with log FC < -1 were considered as down-regulated genes and that with log FC > 1 were up-regulated genes.
Gene ontology (GO) and pathway enrichment analysis Gene ontology analysis (GO) is an international standard classi cation system of gene function, which is widely used for de ning genes to identify unique biological properties of high-throughput transcriptome or genome data (Ashburner, Ball et al. 2000). KEGG is a large comprehensive database including genomes, diseases, biological pathways, drugs and chemicals (Kanehisa 2000). We used DAVID, which is an online bioinformatic tool (Huang da, Sherman et al. 2009), to visualize the DEGs enrichment of BP, MF, CC and pathways (P < 0.05) Protein-protein interaction (PPI) network and module analysis PPI information can be evaluated using the online tool string (Szklarczyk, Franceschini et al. 2015). Then, PPI results was imported into the Cytoscape and the MCODE app (Shannon, Markiel et al. 2003) was Page 4/19 used to detect the modules in PPI network (degree cutoff = 2, max. Depth = 100, k-core = 2, node score cutoff = 0.2).

Results
Identi cation of DEGs between primary and liver metastasis of CRC After analysis, we got 1449, 442 and 238 DEGs from GSE50760, GSE41568 and GSE14297, respectively.
Then, Venn diagram software were used to identify the commonly DEGs in these three datasets. Results showed that a total of 147 commonly DEGs were detected, including 24 down-regulated genes (logFC< -1) and 123 up-regulated genes (logFC > 1) in the metastases (Table 1 & Fig. 1).
DEGs gene ontology and KEGG pathway analysis in CRC liver metastases All 147 DEGs were analyzed by DAVID and the results of GO analysis showed that 1) for biological processes (BP), DEGs were particularly enriched in acute-phase response, negative regulation of endopeptidase activity, platelet degranulation, brinolysis, regulation of complement activation, triglyceride metabolic process, receptor-mediated endocytosis, lipoprotein metabolic process, phospholipid e ux, positive regulation of cholesterol esteri cation, complement activation-classical pathway, etc.; 2) for molecular function (MF), DEGs were enriched in serine-type endopeptidase inhibitor activity, serine-type endopeptidase activity, heparin binding, phosphatidylcholine-sterol O-acyltransferase activator activity, lipase inhibitor activity, receptor binding, etc.; 3) for GO cell component (CC), DEGs were signi cantly enriched in extracellular region, blood microparticle, extracellular space, extracellular exosome, platelet alpha granule lumen, chylomicron, etc. (Table 2 and Table S1). KEGG analysis results demonstrated that DEGs were particularly enriched in complement and coagulation cascades, drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, prion diseases, chemical carcinogenesis, staphylococcus aureus infection and linoleic acid metabolism (P < 0.05) ( Table 3, Table   S2).

Protein-protein interaction network (PPI) and modularanalysis
The results of STRING and Cytoscape analysis showed that total of 138 DEGs (121 up-regulated and 17 down-regulated genes) of the 147 DEGs were ltered into the DEGs PPI network complex, containing 1391 edges ( Fig. 2A). After MCODE analysis, there were 33 central nodes were identi ed among the 138 nodes (Fig. 2B).

Discussion
To identify more potential prognostic biomarker in CRC liver metastases, three GEO datasets (GSE50760, GSE41568 and GSE14297) associated with CRC liver metastasis were analyzed by bioinformatical methods. One hundred and thirty-liver metastases and seventy-ve primary CRC specimens were enrolled in the present research. Via GEO2R and Venn software, we revealed a total of 147 commonly changed DEGs (|logFC|>1 and adjust P value < 0.05) including 123 up-regulated (logFC > 1) and 24 down-regulated (logFC<-1) DEGs. The DAVID methods were used to analyze Gene Ontology and Pathway Enrichment and the results showed that 1) for biological processes (BP), DEGs were particularly enriched in acute-phase response, negative regulation of endopeptidase activity, platelet degranulation, brinolysis, regulation of complement activation, triglyceride metabolic process, receptor-mediated endocytosis, lipoprotein metabolic process, phospholipid e ux, positive regulation of cholesterol esteri cation, complement activation-classical pathway, etc.; 2) for molecular function (MF), DEGs were enriched in serine-type endopeptidase inhibitor activity, serine-type endopeptidase activity, heparin binding, phosphatidylcholinesterol O-acyltransferase activator activity, lipase inhibitor activity, receptor binding, etc.; 3) for GO cell component (CC), DEGs were signi cantly enriched in extracellular region, blood microparticle, extracellular space, extracellular exosome, platelet alpha granule lumen, chylomicron, etc.
For pathway analysis, DEGs were particularly enriched in complement and coagulation cascades, drug metabolism-cytochrome P450, metabolism of xenobiotics by cytochrome P450, prion diseases, chemical carcinogenesis, staphylococcus aureus infection and linoleic acid metabolism (P < 0.05). Next, DEGs PPI network complex of 138 nodes and 1391 edges was constructed via the STRING online database and Cytoscape software. Then, 39 vital genes were screened from the PPI network complex by Cytotype MCODE analysis. In addition, through Kaplan Meier plotter analysis, we found that 4 (SERPING1 ITIH2 CDH2 APOE) of the 39 genes had a signi cantly worse survival. Finally, we did literature research on these four genes.
Complement and coagulation cascades is the most signi cance pathway in our results. Complement is considered as the rst line of defense against non self or unwanted host elements. Therefore, the researchers designed different strategies to treat tumors by increasing complement activity (Kolev, Towner et al. 2011). However, as early as 1975, shearer et al. reported that complement is able to stimulate tumor growth when treating tumor with low concentration antibody (Shearer 1975). And recent studies have con rmed that complement can promote tumor growth in mouse models (Markiewski and Lambris 2009). Moreover, in vivo and in vitro data show that tumor cells can activate complement (Bjorge, Hakulinen et al. 2005, Corrales, Ajona et al. 2012) and Shi et al. (Shi, Fang et al. 2017) reported that complement component 1, q subcomponent binding protein (C1QBP), in lipid rafts mediates liver metastasis of pancreatic cancer by regulating IGF-1 / IGF1R signaling. They reported that the expression of C1QBP in many human cancers is higher than that in normal tissues, including thyroid, lung, esophageal, gastric and colon cancer, and refer to previous reports that C1QBP mediates EGF induced chemotaxis and distant metastasis by activating receptor tyrosine kinase. It has been accepted that complement and coagulation cascades is key pathway promoting the growth and metastasis of different cancers. However, the mechanisms and effects of complement speci c deregulation on tumor microenvironment are not clear. More works are needed to elucidate the complex process of cancer cells controlling complement activation and how complement affects tumor progression SERPING1 is also known as C1IH or CINH SERPING1 belongs to the serpins family of serine protease inhibitors, which regulate the proteases involved in brinolysis, coagulation, in ammation, cell migration, cell differentiation and apoptosis (Hayashi, Ushizawa et al. 2011). Many studies have reported that the SERPING1 expression is up-regulated in tumors, including glioblastoma (Fornvik, Maddahi et al. 2017), breast cancer and gastric cancer (Wojtukiewicz 1998). Fornvik, K. et al. (Fornvik, Maddahi et al. 2017) found that, in glioblastoma, the high SERPING1 expression is correlated with dismal prognosis and the survival period can be signi cantly prolonged after the SERPING1 being blocked by antibody. The same is true for colorectal cancer, and SERPING1 can be used as an independent predictor of cancer mortality (Kocsis, Meszaros et al. 2011) ITIH2, is also known as SHAP. The complex of ITIH2 and hyaluronan plays a signi cant role in the in ammatory response and the high expression of ITIH2 is associated with poor outcome of NSCLC (Wu, Wang et al. 2013). In addition, a strong link between the expression levels of ITIH2 and estrogen levels has been demonstrated, since ITIH2 contains an estrogen-binding domain which might be critical for metastasis and tumor growth because estrogen has a profound effect on extracellular matrix integrity (Kopylov, Stepanov et al. 2020). At present, there is not much research on the role of ITIH2 in tumor growth and metastasis, which is also a potential direction for the study of tumor metastasis mechanism in the future CDH2 is a member of the cadherin family, which regulates many cellular processes including apoptosis, angiogenesis, and chemoresistance (Miao, Wang et al. 2018). CDH2 is up-regulated in many kinds of cancers, including colorectal cancer , prostate cancer (Zhang, Shen et al. 2014), gastric cancer , bladder cancer (van der Horst, Bos et al. 2014), lung cancer (Zhu 2015) and glioma (Chen, Cai et al. 2018). Moreover, it is reported that CDH2 play an important role in the process of epithelia-mesenchymal transition (Yang, Wang et al. 2015). In the lung cancer research, it was found that CDH2 expression is associated the ge tinib-resistance (Yamauchi 2011) and to the brain metastases of lung cancer cells (Grinberg-Rashi, Ofek et al. 2009). In glioma, CDH2 is recognized as a independent prognostic factor and its high expression is correlated with high-grade glioma and worse outcome (Chen, Cai et al. 2018).
APOE, which has been proved to be up regulated in patients with colorectal cancer (Borgquist, Butt et al. 2016). It has been reported that overexpression of APOE was related to the growth of tumor in the advanced stage and increaseds the risk of tumor metastasis and invasion (Kopylov, Stepanov et al. 2020), which may be related to the key role of APOE in cell proliferation and DNA synthesis (Niemi and Yla-Herttuala 2002). Moreover,APOE is one of the regulators of PI3K/Akt/mTOR pathway, which plays an important role in cell migration and proliferation, therefore, APOE can promote tumor progression by enhancing cell polarity (Vergadi, Ieronymaki et al. 2017). The value of APOE in CRC species has been reported as a prognostic indicator (Borgquist, Butt et al. 2016).
At present, the relationship between colorectal cancer and these four genes has been reported, but most of them are single gene research. It has not been reported that using these four genes as a panel to predict the metastasis and prognosis of colorectal cancer. Therefore, our data can provide a more comprehensive and accurate method to predict the metastasis and prognosis of colorectal cancer and direction for the future study of CRC liver metastasis.

Funding
No funding was received.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Authors' contributions
Junsheng Chen and Hongzhou Liu conceived and designed the idea to this paper; collected and analyzed the data, and drafted the paper. Both the authors read and approved the nal manuscript. analysis by MCODE app in the Cytoscape software (degree cutoff = 2, node score cutoff = 0.2, k-core = 2, and max. Depth = 100) Figure 3