Identication of Signicant Genes With Poor Colorectal Cancer Prognosis In Via Bioinformatical Analysis

Purpose:Identication of signicant genes with poor colorectal cancer prognosis in via bioinformatical analysis. Method:Gene expression proles of GSE74602(cid:0) GSE110223 (cid:0) GSE113513 and GSE 141174 were available from GEO database. There are 65 CRC tissues and 65 normal tissues in the four prole datasets. Differentially expressed genes (DEGs) between CRC tissues and normal tissues were picked out by GEO2R tool and Venn diagram software. Next, we made use of the Database for Annotation, Visualization and Integrated Discovery (DAVID) to analyze Kyoto Encyclopedia of Gene and Genome (KEGG) pathway and gene ontology (GO). Then protein-protein interaction (PPI) of these DEGs with Search Tool for the Retrieval of Interacting Genes (STRING). Results:There were total of 171 consistently expressed genes in the four datasets, including 148 up-regulated and 23 down-regulated genes. up-regulated DEGs were particularly enriched in oxidation-reduction process, in extracellular exosome, in zinc ion binding, in Metabolic pathways, Mineral absorption; and down-regulated DEGs in positive regulation of cell proliferation, in cytosol, in One carbon pool by folate. Furthermore, for the analysis of overall survival among those genes, Kaplan–Meier analysis was implemented and 30 of 88 genes had a signicantly worse prognosis. For validation in Gene Expression Proling Interactive Analysis (GEPIA), 13 of 30 genes were discovered highly expressed in CRC tissues compared to normal tissues. Furthermore, MYC (cid:0) FGFR3 markedly enriched in the Bladder cancer pathway. Conclusion: We have identied two signicant up-regulated DEGs with poor prognosis in CRC , which could be potential therapeutic targets for CRC patients.


Introduction
Colorectal cancer(CRC) is a sort of common malignant tumor of digestive tract its morbidity and mortality rank third and fourth respectively in the world, and in China, the morbidity and mortality of CRC is also very high, and there is a tendency to increase, which seriously threatens the physical and mental health of human beings [1][2] . Therefore, more reliable prognostic biomarkers should be explored as a target for improving the treatment effect and better understanding the underlying mechanism [3][4][5][6] .
Bioinformatics and Gene chip which were used for more than ten years can quickly detect differentially expressed genes and was proved to be a reliable technique that could make many slice data be produced and stored in public databases. Therefore, a large number of valuable clues could be explored for new research on the base of these data. Furthermore, many bioinformatical studies on CRC have been produced in recent years [7] , which proved that the ntegrated bioinformatical methods could help us to further study and better exploring the underlying mechanisms.
In this study, rst we chosed GSE74602,GSE110223 GSE113513 and GSE 141174 from Gene Expression Omnibus (GEO). Second, we applied for GEO2R online tool and Venn diagram software to obtain the commonly differentially expressed genes (DEGs) in the three datasets above. Third, the Database for Annotation, Visualization and Integrated Discovery (DAVID) was used to analyze these DEGs including molecular function (MF), cellular component (CC), biological process (BP) and Kyoto Encyclopedia of Gene and Genome (KEGG) pathways. Fourth, we established protein-protein interaction (PPI) network for additional analysis of the DEGs which would identify some core genes. Moreover, these core DEGs were imported into the Kaplan Meier plotter online database for the signi cant prognostic information (P < 0.05

Data processing of DEGs
DEGs between colorectal cancer and normal specimen were identi ed via GEO2R online tools [8] with |logFC| > 1 and adjust P value < 0.05. Then, the raw datain TXT format were checked in Venn software online todetect the commonly DEGs among the three datasets.The DEGs with log FC < 0 was considered as down-regulated genes, while the DEGs with log FC > 0 was considered as an up-regulated gene.

Gene ontology and pathway enrichment analysis
Gene ontology analysis (GO) is a commonly used approach for de ning genes and its RNA or protein product to identify unique biological properties of high-throughput transcriptome or genome data [9] .
KEGG is a collection of databases dealing with genomes, diseases,biological pathways, drugs, and chemical materials [10] .DAVID which is an online bioinformatic tool is designed to identify a large number of genes or proteins function [11] . We could use DAVID to visualize the DEGs enrichment of BP, MF, CC and pathways (P < 0. 0 5 ) PPI network PPI information can be evaluated by an online tool, STRING (Search Tool for the Retrieval of Interacting Genes) [12] . Then, We could use STRING to visualize the potential correlation between these DEGs.(PPI enrichment p-value < 0.05) Survival analysis and RNA sequencing expression of core genes Kaplan Meier-plotter are a commonly used website tool for assessing the effect of a great number of genes on survival based on EGA, TCGA database and GEO (Affymetrix microarrays only) [13] . The log rank P value and hazard ratio(HR) with 95% con dence intervals were computed and showed on the plot. To validate these DEGs, we applied the GEPIA website to analyze the data of RNA sequencing expression on the basis of thousands of samples from the GTEx projects and TCGA [14] .

Results
Identifcation of DEGs in colorectal cancer .There were 65 CRC tissues and normal tissues in our present study. Via GEO2R online tools, we extracted1643,1269 3497and2180 DEGs from GSE74602 GSE110223 GSE113513 and GSE 141174, respectively. Then, we used Venn diagram software to identify the commonly DEGs in the four datasets. Results showed that a total of 171commonly DEGs were detected, including 148 up-regulated genes(logFC > 0)and 23 down-regulated genes (logFC < 0) in the CRC tissues (Table 1 & Fig. 1). Table 1 All 171 commonly differentially expressed genes (DEGs) were detected from four pro le datasets in the CRC tissues compared to normal tissues DEGs Genes Name   A total of 171 DEGs were imported into the DEGs PPI network complex which included 170 nodes and 204 edges, including 23 down-regulated and 143 up-regulated genes (Fig.2a). Then we applied STRING analysis and results showed that 87 central nodes were identi ed among the 160 nodes (Fig.2b).
Analysis of core genes by the Kaplan Meier plotter and GEPIA Kaplan Meier plotter (http://kmplot.com/analysis) was utilized to identify 88 core genes survival data. It was found that 30 genes had a signi cantly worse survival while 58 had no signi cant (P < 0.05, Table 4 & Fig. 3). Then, GEPIA was used to dig up the 30 gene expression level between cancerous and normal people. Results reported that 13 of 30 genes re ected high expressed in CRC samples contrasted to normal samples (P< 0.05, Table 5 & Fig. 4).   To identify more useful prognostic biomarkers in CRC cancer, this study used bioinformatical methods on the basis of four pro le datasets(GSE74602 GSE110223 GSE113513 and GSE 141174), Sixty-ve CRC specimens and normal specimens were enrolled in the present research. Via GEO2R and Venn software, we revealed a total of 171 commonly changed DEGs (|logFC| > 2 and adjust P value < 0.05) including 148 up-regulated (LogFC > 0) and 23 down-regulated DEGs (Log FC < 0). Then, Gene Ontology and Pathway Enrichment Analysis using DAVID methods showed that up-regulated DEGs were particularly enriched in oxidation-reduction process, in extracellular exosome, in zinc ion binding, in Metabolic pathways, Mineral absorption; and down-regulated DEGs in positive regulation of cell proliferation, in cytosol, in One carbon pool by folate. Next, DEGs PPI network complex of 170 nodes and 204 edges was constructed via the STRING online database. Furthermore, for the analysis of overall survival among those genes, Kaplan-Meier analysis was implemented and 30 of 88 genes had a signi cantly worse prognosis. For validation in Gene Expression Pro ling Interactive Analysis (GEPIA), 13 of 30 genes were discovered highly expressed in CRC tissues compared to normal tissues. Furthermore, two genes(MYC,FGFR3) markedly enriched in the Bladder cancer pathway. In conclusion, we have identi ed two signi cant up-regulated DEGs with poor prognosis in CRC on the basis of integrated bioinformatical methods, which could be potential therapeutic targets for CRC patients.
MYC (oncogene) is highly expressed in colorectal cancer tissues, and is widely activated in colorectal cancer. It is involved in the regulation of growth, invasion and metastasis of colorectal cancer, and has been used as a research target for anti-tumor therapy for many years Guo [15] found that polyamine biosynthesis is often disordered in colorectal cancer, and there is a close relationship between polyamine metabolism pathway and oncogenic signaling pathway in the process of tumor development. Inhibition of SMS and Myc simultaneously has synergistic effects, and combined inhibition of SMS and Myc expression may provide a new therapeutic target and therapeutic strategy for the treatment of colorectal cancer Bian [16] shows that MYC family oncogene and DNA damage repair protein PARP 1 play an important role in the development and development of small cell lung cancer. MYC family genes are ampli ed or highly expressed in some small cell lung cancer cell lines and patient tissues. Combined inhibition of PARP and BET can effectively inhibit the proliferation and survival of MI℃ family genedependent small cell lung cancer tumor tissue.Fibroblast growth factor receptor (FGFR) is a receptor protein binding to Fibroblast growth factor family member protein ligand (FGF) A member of the tyrosine kinase family, it is involved in cell proliferation, stem cell differentiation, embryo development, migration, survival, angiogenesis and organogenesis through multiple signaling pathways.Studies have shown that the activation of FGFR signaling pathway is related to the occurrence and development of a variety of cancers. FGFR2 can also be phosphorylated, but the way and sequence of its phosphorylation and activation remain unclear. The phosphorylation of FGFR3 and FGFR4 is different from that of FGFR1.
Zhao [17] have shown that effective components from the rhizome of R. sinicum inhibit the proliferation of colorectal cancer cells by targeting FGFR1 and down-regulating the expressions of p-jake,p-STAT3 and p-MEK1/2, and inhibit the growth of human colorectal cancer xenograft tumors with high FGFR1 expression Fibroblast growth factor receptor 3(FGFR3) Wang [18] have shown that miR-99a-SP has target gene binding sites with downstream FGFR3. MiR-99a-sp mimics signi cantly down-regulated FGFR3 after overloading; MiR-99a-SP can negatively regulate FGFR3, and the proliferation, metastasis and invasion of cancer cells are all decreased after targeted knockdown of FGFR3.Numerous studies have proved that MYC and FGFR genes were related to various types of cancer's progression,In this study, MYC and FGFR genes were also enriched in bladder cancer, and the mechanism of action of the two genes was also studied. MYC and FGFR genes, as important indicators for detection, could be used as effective targets for diagnosis and treatment of digestive tract tumors.

Conclusions
Taken above, our bioinformatics analysis study identi ed MYC and FGFR genes between CRC tissues and normal tissues on the base of four different microarray datasets. Results showed that MYC and FGFR genes could play key roles in the progression of CRC. However, these predictions should be veri ed by a series of experiments in the future. Anyway, these data may provide some useful information and direction into the potential bio-markers and biological mechanisms of CRC and other cancer.

Declarations
Funding:No funding was received to assist with the preparation of this manuscript.
Con icts of interest:The authors have no relevant nancial or non-nancial interests to disclose.
Availability of data and material:All data generated or analysed during this study are included in this published article [and its supplementary information les.

Figure 4
Signi cantly expressed 13 genes in CRC patients compared to healthy people. To further identify the genes' expression level between CRC and normal people, 13 genes which were related with poor prognosis were analyzed by GEPIA website. 13 of 30 genes had signi cant expression level in CRC specimen compared to normal specimen (*P < 0.05). Red color means tumor tissues and grey color means normal tissues Re-analysis of 13 selected genes by KEGGpathway enrichment. Two high expressed genes in CRC tissues with poor prognosis were re-analyzed by KEGG pathway enrichment.