Identi cation of Key Genes and Pathways in Ulcerative Colitis Through Bioinformatics Analysis


 Background Ulcerative colitis (UC) is a chronic inflammatory disease whose therapy remains largely uncertain due to the lack of etiological understanding. It is also a higher risk factor for colon cancer. Aims This study was designed to identify key genes correlated with intestinal epithelial repair and their associated signaling pathways in the pathogenesis and malignant transformation of UC.Methods With an online database pubmed2ensemble, correlative genes were identified to be associated with both UC and its self-healing, and then were imported in Cytoscape for enrichment analysis. A protein-protein interaction network containing was established, and the most significant gene modules and top ten hub genes were selected for further enrichment analysis. Then, potential drugs that target the corresponding genes were identified with online tool in DGIdb. A map which reflected the links among genes, drugs, and their associated pathways was constructed. Finally, we searched the TCGA database and CPTAC database for testing the gene expression of identified UC-associated genes in colon cancer.Results We identified FN1, GRB2, EGF, CXCL8, VEGFA, STAT3, IL6, IL4, IL10, TNF, CSF2, IL13, IL1A, CCL2, ICAM1 and IL18 had closest associations with the function of epithelial function in UC patients, based on functional enrichment analysis. Besides, 11 of them were validated in colon cancer. Conclusions The corresponding signaling pathways that are altered in UC, which provide a new clue for understanding the mechanism underlying the UC pathogenesis and malignant transformation. Key signals in the process and their target drugs might serve as new treatment strategy for UC.


Introduction
Ulcerative colitis (UC) and Crohn's disease are two forms of in ammatory bowel disease, whose main clinical manifestations are diarrhea and bloody stools. UC is an in ammatory condition affecting the colon, with an annual incidence of approximately 10 to 20 per 100,000 people. UC is associated with genetic susceptibility, composition and function of intestinal microbiota, environmental factors, and mucosal and systemic immune responses [1,2]. These factors and their interactions make damages to the intestinal mucosal barrier and cause disorder of the immune system, nally resulting in continuous intestinal in ammation. However, the speci c pathogenic mechanism for the development of UC has not been clari ed. In order to improve the life quality of the patients and even cure the chronic disease, it is necessary to identify key genes and pathways in the pathogenesis of UC.
The mammalian gastrointestinal mucosa is a rapidly self-renewing tissue, which is lined by continuous epithelial cells. Epithelial cells maintain the physical and functional balance under both physiology and pathophysiology conditions [3]. Composed of intestinal epithelial cells, goblet cells, Paneth cells, endocrine cells and others, the intestinal epithelium bears multiple functions and active metabolism so that it can rapidly turn over and regenerate [4][5][6][7]. These characteristics are possibly associated with the ability of intestine self-healing.
Population-based studies showed that the risk of colorectal neoplasia in patients with IBD increased 2-5 folds as compared with age-matched controls [8][9][10]. In ammation is an important risk factor for interval colorectal cancer in IBD patients [11]. In ammation exerts a eld effect on constantly re-epithelializing and chronically in ames tissue, leading to dysplasia. The dysplastic tissue then progresses to colorectal cancer depending on various molecular pathways, such as chromosomal instability and microsatellite instability [12]. However, factors associated with the transformation of UC to colon cancer are not clear.
With the improvement in bioinformatics technology, more public databases are available to biologists to nd inspirations in biology and medical research. To nd the relevant literature on speci c genomic regions or sets of functionally related genes more easily, pubmed2ensembl offers text-based queries against PubMed and PubMed Central documents in conjunction with constraints on genomic features [13]. Although differentially expressed genes analysis has become mainstream of bioinformatics, text mining still has its signi cance and can also be dug out to obtain valuable information especially in diseases other than tumors.
In this study, we established the gene dataset of UC from pubmed2ensembl and obtained 137 common genes, which are possibly correlated with UC and its intestine self-healing. We then used Rstudio software with different R packages for gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis. Additional analyses, such as PPI network complex construction and gene cluster ndings, were also applied in the current study. Moreover, drug-gene interactions were explored using DGIdb-based gene cluster digging. Connection between the two was then expanded into three elements according to the results of KEGG pathway analysis. Finally, the expression level of 16 genes identi ed by the above-referenced methods was evaluated in TCGA and CPTAC databases using UALCAN tool. In summary, our ndings identi ed clusters of genes and pathways tightly associated with the pathogenesis and malignant transformation of UC and provides a clue for choosing possible drugs in the treatment of UC based on the signal interaction analysis of the genes identi ed.

Gene sets collection
Two gene lists of UC and self-healing were downloaded from pubmed2ensembl website (http://www.pubmed2ensembl.org).
Go enrichment and KEGG pathway analysis GO knowledgebase is a source for information on the functions of genes, composed of biological process, molecular function and cellular component [14]. The KEGG is a collection of manually drawn pathway maps representing the knowledge on the molecular interaction, reaction and relation networks. It links genomic information to high-level functional identi cation [15]. These analyses were carried out with the ClusterPro ler package in Rstudio. p value < 0.05 was considered as signi cant.

Drug-Gene-Pathway Interactions
The Drug Gene Interaction Database (DGIdb www.dgidb.org) is a web means that consolidates disparate data sources describing drug-gene interactions and gene druggability. It is a user-friendly tool for mining the druggable genome for precision medicine hypothesis generation [18]. For DGIdb 3.0, it has been a substantial expansion through the addition of new sources and the updating of existing sources [19]. Using ggplot2 and ggalluvial packages in Rstudio, a Sankey diagram of Drug-Gene-Pathway was drawn to re ect the co-relation among three elements.
Validation of gene expression in TCGA UALCAN, a tool for in-depth analyses of TCGA data, is a publicly available web-platform for researchers to facilitate the study of gene expression variation associations across different tumors [20]. To further screen potential targets in the malignant transformation of UC into colon adenocarcinoma (COAD), candidate genes were evaluated in UALCAN. 286 tumor samples and 41 normal samples both on RNA level and protein level were evaluated. p value < 0.05 was considered as statistically signi cant.

Identi cation of relative gene
In the current study, we obtained two gene lists from pubmed2ensembl using the keyword of UC and selfhealing. The two gene lists included 1104 and 232 genes, respectively. Using venny online tool (website: https://bioinfogp.cnb.csic.es/tools/venny/), we identi ed 137 genes shared by both UC and self-healing individuals (Supplement Table 1 and Fig. 1A).

Function enrichment analysis
For mutually shared genes, we analyzed the functional and signal pathway enrichment using clusterPro ler package in R with the criteria of p value < 0.05. As shown in Fig. 1, the top 10 GO enrichment terms included biological process, cellular component, and molecular function.
In the signaling pathway analysis with KEGG, these common genes were mainly enriched in cytokine − cytokine receptor interaction, JAK − STAT signaling pathway, and proteoglycans in cancer (Fig. 1B).
For those altered in biological process, genes were mainly enriched in the T cell activation, leukocyte migration and positive regulation of cytokine production (Fig. 1C). Genes that involve in cellular component were mainly enriched in the external side of plasma membrane, membrane region, and membrane raft (Fig. 1D). Genes changed in molecular function were mainly enriched in the receptor ligand activity, cytokine receptor binding, and cytokine activity (Fig. 1E).

PPI network and module analysis
In order to build an interactive work, the 137 common genes were uploaded onto Cytoscape software following the instructions in the STRING online database (http://string-db.org). It generated 105 nodes and 399 edges ( Fig. 2A). The other 32 genes didn't fall into the PPI network. Based on MCODE, the signi cant modules containing IL6, IL13, IL1A, IL10, CSF2, IL18, IL4, CCL2, TNF, ICAM1, and STAT3 from the PPI network were selected (Fig. 2B). The MCODE score of gene cluster is 8.8, containing 11 nodes and 44 edges which showed the closest relationship with two keywords selected in this study. Using cytoHubba software, 10 genes (GRB2, IL6, CXCL8, IL10, TNF, STAT3, IL4, FN1, EGF, and VEGFA) with a higher degree of connectivity were identi ed as hub genes (Fig. 2C). The network and sub-networks caught by the topological analysis with cytohubba strategy should lead to new insights on essential regulatory networks and protein drug targets for experimental biologists [21]. Notably, 50% of hub genes (5/10) were consistently included in the most signi cant gene cluster, which further con rmed the critical roles of these elements and also suggest the reliability and consistency of the methods. Interestingly, among the 16 highlighted genes in PPI network, 11 of them are involved in in ammation. ICAM-1 is an adhesion molecule, which plays a critical role in the maintenance of epithelial integrity. GRB2 is involved in EGF receptor-mediated signal transduction, which mainly promotes the proliferation of epithelial cells. FN1 is involved in cell adhesion, cell motility, opsonization, wound healing and maintenance of cell shape. VEGFA, an antigenic growth factor, regulates angiogenesis and vasculogenesis in normal and cancer individuals, implying a relationship between UC pathological changes and tumorigenesis.
In the signaling pathway analysis with KEGG, a total of 16 genes were mainly enriched in cytokine − cytokine receptor interaction, rheumatoid arthritis, and AGE-RAGE signaling pathway in diabetic complications (Fig. 2D). Mainly based on connection data, MCODE can reveal closely relative regions in the PPI network, some of which have been proven to be in a complex. Thus, we analyzed cellular component enrichment of 11 cluster genes, including IL13, IL1A, IL10, CSF2, IL18, IL4, CLL2, TNF, ICAM1, STAT3, and IL6 in Rstudio. We found that these genes that involve in cellular component were mainly enriched in external side of plasma membrane (Fig. 2E). As for altered hub genes containing IL6, CXCL8, IL10, TNF, STAT3, IL4, FN1, EGF, VEGFA and GRB2, those involved in molecular function were mainly enriched in the cytokine receptor binding, receptor ligand activity, and growth factor receptor binding (Fig. 2F). In the biological pathway analysis, hub genes were mainly enriched in positive regulation of cell adhesion, leukocyte migration, and regulation of JAK-STAT cascade (Fig. 2G).

Gene-Drug-Pathway analysis
Pharmacogenomics is a new interdisciplinary eld developed in the 1990s based on genetics, genomics and genetic pharmacology. It mainly focuses on the relationship between human genomic information and drug response. DGIdb mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development [22]. It provides a platform for searching lists of drugs against a compendium of drug-gene interactions.
Based on previously obtained two gene sets, which contained 11 and 10 elements respectively (Fig. 2B,  2C), we searched the DGIdb and acquired a relative drug list.
Among these 16 genes, IL4, EGF, FN1, IL13, CCL2 and STAT3 have not been currently considered as the drug targets for UC according to DGIdb. The nal list comprised only the drugs, which were approved by FDA (Food and Drug Administration) and used as an immunotherapy drug (Supplement Table 2). In this table, four drugs including in iximab, certolizumab pegol, adalimumab, and golimumab were previously used in UC and (or) Crohn's disease. Most of targets (reagents) identi ed were not to be considered as anti-UC drugs previously.
As shown in Sangquito, 32 drugs and 8 genes were relative to the top 10 KEGG pathways (Supplement Table 3 and Fig. 3), containing 768 lines which indicated correlations of three elements. IL6 and TNF were linked to multiple drugs and pathways, which indicated that they might be important therapeutic target in UC. Moreover, etanercept, interferon alfa-2b, mycophenolicacid and procarbazine are associated with more than one gene. Etanercept (Enbrel) is used for the treatment of rheumatoid arthritis and axial spondyloarthritis, two diseases with in ammation as the main phenotype [23][24][25]. Interferon alfa-2b is an enhancing promotor in immune system, which is used in the treatment of mucosal melanoma of the head and neck [26], recurrent conjunctival papillomatosis [27], ocular surface squamous neoplasia [28], and several types of cancers. Mycophenolic acid is an immunosuppressive drug used for the prevention of rejection in solid organ transplantation and for the treatment of some immune diseases [29].Procarbazine is also an anti-tumor drug used for the treatment of Hodgkin's disease which usually in combination with mechlorethamine, vincristine, and prednisone [30]. Together, these data provide new clues for targeted therapy in the treatment of UC patients.

Association between UC and colon cancer
UC is associated with an increased prevalence of colorectal cancer although the mechanisms underlying the neoplastic transformation from UC are rarely understood [31][32][33]. In order to identify the relationship between two pathological changes, we searched the expression of 16 genes identi ed above in TCGA database using an online tool on the UALCAN website. We found that the expression of IL10, IL18, STAT3, GRB2 and EGF were downregulated in COAD individuals, which possibly re ected a protective mechanism for tumorigenesis (Fig. 4A). In contrast, IL1A, ICAM1, CSF2, IL6, FN1 and VEGFA were upregulated in COAD datasets at RNA level as compared with those in normal samples (Fig. 4B). Interestingly, TPM values of IL13 and IL4 genes were extremely low (TPM < 1). There was no statistical signi cance for CCL2 and TNF between COAD and normal samples.
Moreover, using data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) Con rmatory/Discovery dataset, UALCAN also provides a protein expression analysis option for colon cancer. Z-values represent standard deviations from the median across samples for the given cancer type. Log2 Spectral count ratio values from CPTAC were rst normalized within each sample pro le, then normalized across samples. Among the proteins regulated by the gene set of the cluster and hub genes, ICAM1 and CXCL8 are upregulated in colon cancer with signi cant differences (Z < 0.01) than normal samples (Fig. 4C). In contrast, GRB2 has a lower expression level in colon cancer samples (Fig. 4D). All the statistical signi cance data of gene expression were shown in supplement table 4.
The primary purpose of the study focuses on the prediction of key potential genes in the malignant change of UC via data mining and data analysis, and these results need to be con rmed through more molecular and cellular experiments.

Discussion
In this study, our purpose was to identify potential genes and pathways with the pathogenesis and transformation of UC. It has a progressive and remitting or relapsing course related to epithelial regeneration, as a crucial pathophysiological feature of the disease [34]. Therefore, it's paramount to explore the key factors working in these pathological processes.
The current study showed that damage of intestinal epithelial self-healing are related to immune responses in UC. With the rising prevalence in global, drugs were developed in the treatment of UC, including 5-aminosalicylic acid, steroids, immunosuppressants and some active compounds extracted from Chinese medicine. The anti-UC activities of them are mainly associated with increased level of antiin ammatory cytokine (IL-4 and IL-10) and decreased level of pro-in ammatory cytokines (TNF-α, IL-1β, IL-6, IL-8, IL-23 and NF-κB) [35]. But whether these drugs could target the function of intestinal epithelial cells had never been studied. Our research supported that the therapy targeting the self-healing ability of intestinal epithelial might be a new therapeutic strategy for UC.
Moreover, the changes of these potential targets in colon cancer patients suggested that intestinal epithelial self-repairing acted as a bridge between UC and colitis-associated cancer. Researchers previously characterized factors, which share some functional overlaps with targets we identi ed. Deng et al. [36] showed that Yes-associated protein plays a critical role in intestinal epithelial cell self-renewal, regeneration and tumorigenesis. Zundler et al. [35] reported that JAK/STAT signaling controls secretion of cytokines and transition of in ammatory lesions to tumors leading to colitis-associated cancer via the IL6/STAT3 axis.
In our study, we found that 11 immune-related genes (IL6, IL13, IL1A, IL10, CSF2, IL18, IL4, CCL2, TNF, STAT3 and CXCL8) were evaluated as crucial targets in intestinal epithelial damage and repair in UC patients. It suggested that immune system plays an important role in the whole course of UC. Cellular component analysis showed that the proteins and RNA coded by these genes might regulate the pathogenesis of UC at the external side of plasma membrane. And it also might be the position where these potential drugs work. Moreover, clarifying the link between UC and UC-related colon cancer will provide new clues to therapeutic strategies for UC, reduce the chance of malignant transformation and improve the quality of lives for UC patients. However, more direct evidence is needed to con rm the function by various experiments.

Declarations COMPLIANCE WITH ETHICAL STANDARDS
This article does not contain any studies with human participants or animals performed by any of the authors.

CONFLICTS OF INTEREST
All authors have declared no con icts of interest FUNDING This article was supported by Graduate Innovation Fund of Jilin University.

AUTHOR CONTRIBUTIONS
ML wrote the rst draft of the manuscript and performed the most of comparative analysis. All authors participated in the data analysis and discussion. All authors revised the nal version of the manuscript.