Identification of relative gene
In the current study, we obtained two gene lists from pubmed2ensembl using the keyword of UC and self-healing. The two gene lists included 1104 and 232 genes, respectively. Using venny online tool (website: https://bioinfogp.cnb.csic.es/tools/venny/), we identified 137 genes shared by both UC and self-healing individuals (Supplement Table 1 and Fig. 1A).
Function enrichment analysis
For mutually shared genes, we analyzed the functional and signal pathway enrichment using clusterProfiler package in R with the criteria of p value < 0.05. As shown in Fig. 1, the top 10 GO enrichment terms included biological process, cellular component, and molecular function.
In the signaling pathway analysis with KEGG, these common genes were mainly enriched in cytokine − cytokine receptor interaction, JAK − STAT signaling pathway, and proteoglycans in cancer (Fig. 1B).
For those altered in biological process, genes were mainly enriched in the T cell activation, leukocyte migration and positive regulation of cytokine production (Fig. 1C). Genes that involve in cellular component were mainly enriched in the external side of plasma membrane, membrane region, and membrane raft (Fig. 1D). Genes changed in molecular function were mainly enriched in the receptor ligand activity, cytokine receptor binding, and cytokine activity (Fig. 1E).
PPI network and module analysis
In order to build an interactive work, the 137 common genes were uploaded onto Cytoscape software following the instructions in the STRING online database (http://string-db.org). It generated 105 nodes and 399 edges (Fig. 2A). The other 32 genes didn’t fall into the PPI network. Based on MCODE, the significant modules containing IL6, IL13, IL1A, IL10, CSF2, IL18, IL4, CCL2, TNF, ICAM1, and STAT3 from the PPI network were selected (Fig. 2B). The MCODE score of gene cluster is 8.8, containing 11 nodes and 44 edges which showed the closest relationship with two keywords selected in this study. Using cytoHubba software, 10 genes (GRB2, IL6, CXCL8, IL10, TNF, STAT3, IL4, FN1, EGF, and VEGFA) with a higher degree of connectivity were identified as hub genes (Fig. 2C). The network and sub-networks caught by the topological analysis with cytohubba strategy should lead to new insights on essential regulatory networks and protein drug targets for experimental biologists [21]. Notably, 50% of hub genes (5/10) were consistently included in the most significant gene cluster, which further confirmed the critical roles of these elements and also suggest the reliability and consistency of the methods. Interestingly, among the 16 highlighted genes in PPI network, 11 of them are involved in inflammation. ICAM-1 is an adhesion molecule, which plays a critical role in the maintenance of epithelial integrity. GRB2 is involved in EGF receptor-mediated signal transduction, which mainly promotes the proliferation of epithelial cells. FN1 is involved in cell adhesion, cell motility, opsonization, wound healing and maintenance of cell shape. VEGFA, an antigenic growth factor, regulates angiogenesis and vasculogenesis in normal and cancer individuals, implying a relationship between UC pathological changes and tumorigenesis.
In the signaling pathway analysis with KEGG, a total of 16 genes were mainly enriched in cytokine − cytokine receptor interaction, rheumatoid arthritis, and AGE-RAGE signaling pathway in diabetic complications (Fig. 2D). Mainly based on connection data, MCODE can reveal closely relative regions in the PPI network, some of which have been proven to be in a complex. Thus, we analyzed cellular component enrichment of 11 cluster genes, including IL13, IL1A, IL10, CSF2, IL18, IL4, CLL2, TNF, ICAM1, STAT3, and IL6 in Rstudio. We found that these genes that involve in cellular component were mainly enriched in external side of plasma membrane (Fig. 2E). As for altered hub genes containing IL6, CXCL8, IL10, TNF, STAT3, IL4, FN1, EGF, VEGFA and GRB2, those involved in molecular function were mainly enriched in the cytokine receptor binding, receptor ligand activity, and growth factor receptor binding (Fig. 2F). In the biological pathway analysis, hub genes were mainly enriched in positive regulation of cell adhesion, leukocyte migration, and regulation of JAK-STAT cascade (Fig. 2G).
Gene-Drug-Pathway analysis
Pharmacogenomics is a new interdisciplinary field developed in the 1990s based on genetics, genomics and genetic pharmacology. It mainly focuses on the relationship between human genomic information and drug response. DGIdb mines existing resources that generate hypotheses about how mutated genes might be targeted therapeutically or prioritized for drug development [22]. It provides a platform for searching lists of drugs against a compendium of drug-gene interactions.
Based on previously obtained two gene sets, which contained 11 and 10 elements respectively (Fig. 2B, 2C), we searched the DGIdb and acquired a relative drug list.
Among these 16 genes, IL4, EGF, FN1, IL13, CCL2 and STAT3 have not been currently considered as the drug targets for UC according to DGIdb. The final list comprised only the drugs, which were approved by FDA (Food and Drug Administration) and used as an immunotherapy drug (Supplement Table 2). In this table, four drugs including infliximab, certolizumab pegol, adalimumab, and golimumab were previously used in UC and (or) Crohn’s disease. Most of targets (reagents) identified were not to be considered as anti-UC drugs previously.
As shown in Sangquito, 32 drugs and 8 genes were relative to the top 10 KEGG pathways (Supplement Table 3 and Fig. 3), containing 768 lines which indicated correlations of three elements. IL6 and TNF were linked to multiple drugs and pathways, which indicated that they might be important therapeutic target in UC. Moreover, etanercept, interferon alfa-2b, mycophenolicacid and procarbazine are associated with more than one gene. Etanercept (Enbrel) is used for the treatment of rheumatoid arthritis and axial spondyloarthritis, two diseases with inflammation as the main phenotype [23–25]. Interferon alfa-2b is an enhancing promotor in immune system, which is used in the treatment of mucosal melanoma of the head and neck [26], recurrent conjunctival papillomatosis [27], ocular surface squamous neoplasia [28], and several types of cancers. Mycophenolic acid is an immunosuppressive drug used for the prevention of rejection in solid organ transplantation and for the treatment of some immune diseases [29].Procarbazine is also an anti-tumor drug used for the treatment of Hodgkin’s disease which usually in combination with mechlorethamine, vincristine, and prednisone[30]. Together, these data provide new clues for targeted therapy in the treatment of UC patients.
Association between UC and colon cancer
UC is associated with an increased prevalence of colorectal cancer although the mechanisms underlying the neoplastic transformation from UC are rarely understood [31–33]. In order to identify the relationship between two pathological changes, we searched the expression of 16 genes identified above in TCGA database using an online tool on the UALCAN website. We found that the expression of IL10, IL18, STAT3, GRB2 and EGF were downregulated in COAD individuals, which possibly reflected a protective mechanism for tumorigenesis (Fig. 4A). In contrast, IL1A, ICAM1, CSF2, IL6, FN1 and VEGFA were upregulated in COAD datasets at RNA level as compared with those in normal samples (Fig. 4B). Interestingly, TPM values of IL13 and IL4 genes were extremely low (TPM < 1). There was no statistical significance for CCL2 and TNF between COAD and normal samples.
Moreover, using data from Clinical Proteomic Tumor Analysis Consortium (CPTAC) Confirmatory/Discovery dataset, UALCAN also provides a protein expression analysis option for colon cancer. Z-values represent standard deviations from the median across samples for the given cancer type. Log2 Spectral count ratio values from CPTAC were first normalized within each sample profile, then normalized across samples. Among the proteins regulated by the gene set of the cluster and hub genes, ICAM1 and CXCL8 are upregulated in colon cancer with significant differences (Z < 0.01) than normal samples (Fig. 4C). In contrast, GRB2 has a lower expression level in colon cancer samples (Fig. 4D). All the statistical significance data of gene expression were shown in supplement table 4.
The primary purpose of the study focuses on the prediction of key potential genes in the malignant change of UC via data mining and data analysis, and these results need to be confirmed through more molecular and cellular experiments.