MicroRNA expression analysis for identification of biomarkers in inflammatory bowel disease subtypes

The potential biomarkers in inflammatory bowel diseases (IBDs) were analyzed from GSE53867 dataset. Differentially expressed microRNAs (DEMs)-genes and protein-protein interaction networks were constructed, and hub genes selected using Cytoscape. Differentially expressed genes were analyzed for GO and Reactome-pathway. Seven DEMs were upregulated in Crohn's disease (CD), 4 downregulated in ulcerative colitis (UC), 8 upregulated and 2 downregulated in IBD. A 620, 2377, and 1821 target-genes were in CD, UC, and IBD, respectively. SOCS3, upregulated by miR-650, was hub gene in CD, induced by cytokines, through NFKB-signalling pathway to mediate ubiquitin-proteasomal degradation. CIRH1A, downregulated by miR-16, was hub gene of UC, acted by impairing ribosome-biogenesis. SKP2 and ASB1, up- and downregulated, by miR-142 and miR-665, respectively, were hub genes of IBD, induced cytokines through activation of TLR- and TNF-signalling pathways to mediate ubiquitin-proteasomal degradation. SOCS3, CIRH1A, SKP2 and ASB1 genes might serve as valuable biomarkers to differentiate CD, UC and IBD.


Introduction
The inflammatory bowel diseases (IBDs), including ulcerative colitis (UC) and Crohn's disease (CD), are chronic, idiopathic inflammation of the gastrointestinal tract (GIT), with 6·8 million cases worldwide, as reported in 2017 [1]; 1.64 million cases in the USA, and 1.4 million in India, reported in 2010 [2]. The potential causes of IBD are genetic susceptibility, environmental factors, and dysregulation of immune system [3]. CD can affect any area of the GIT, including the small intestine and colon, has patchy transmural inflammation; while UC affects only the colon, has continuous superficial inflammation [4]. The overlapping symptoms and complications of the disease along with limited treatment options having side effects, highlights the significance of developing specific diagnostic and novel therapeutic strategies in IBD. Recent studies have indicated the role of microRNA (miRNA) in IBD pathogenesis with therapeutic prospects. MiRNAs are small (18)(19)(20)(21)(22)(23)(24) nucleotides), single stranded, conserved, noncoding RNAs that bind to complementary 3′-UTRs (untranslated regions) of mRNA (messenger RNA ) target causing instability and inhibition of translation, rendering miRNAs as important epigenetic modifiers of gene expression [5]. For example, 3 miR-21, miR-31, and miR-141, have been implicated in IBD pathogenesis by targeting respectively PDCD4 and Rho GTPase RhoB, IL-25, and CXCL5 and CXCL12β genes mediated inflammatory responses [6]. However, the pivotal molecular interactions of IBD based on differentially expressed genes (DEGs) have not been extensively investigated. Therefore, the current study is an attempt to identify the potential biomarkers that distinguish IBD and its subtypes, investigate underlying molecular interactions based on DEGs profiling, functional enrichment and significant pathways associated with the DEGs, protein-protein interaction (PPI) network of common DEGs. To achieve this, the GSE53867 dataset was downloaded from the Gene Expression Omnibus (GEO) database (www.ncbi.nlm.nih.gov/geo/). Differentially expressed miRNAs (DEMs) were identified between and unique to UC and CD patients, compared with controls. The DEMs-gene network and protein-protein interaction (PPI) network were constructed to identify potential biomarkers and major regulated genes. GO (Gene Ontology) Process and Reactome pathway analysis was performed with modules of PPI network to determine the functional enrichment and significant pathways associated with the DEGs, to gain further insight into the pathogenesis of UC and CD at the molecular level, which may help in the differential diagnosis of UC and CD and determine prospective molecular targets for the IBD subtypes.

Microarray data acquisition
The miRNA expression dataset GSE53867 was obtained from the gene expression omnibus database (www.ncbi.nlm.nih.gov/geo/), containing two CD (CD1 and CD2), two UC (UC1 and UC2) compared to a single non-IBD control (C) human colon biopsy samples.

Identification of DEMs
The DEMswere identified between UC and C (defined as UC group), CD and C (defined as CD group), UC and CD (defined as IBD group) groups, using GEO2R (http://www.ncbi.nlm.nih.gov/geo/geo2r/) for performing comparisons on GSE53867 raw data applying GEOquery, limma R, and Biobase packages from the Bioconductor project [7,8]. The raw microarray data was preprocessed to obtain "NA" filtered expression data and normalized using "normalizeBetweenArrays" in limma, with quantile method. The normalized microarray expression data was fit to a linear model and compared between 4 groups using Empirical Bayes moderated t test, also in limma, to obtain log2FoldChange (log2FC) differential expression of CD, UC, IBD, Average Expression (AveExpr), F-statistics (F), P value (P.Value) and FDR (False Discovery Rate) adjusted P value (adj.P.Value). The threshold for DEMs were set as P value <0.05 and |log 2 FC| >1.

Hub genes selection and identification of significant modules
The top 10 genes in the PPI network were selected using the Cytoscape plugin CytoHubba, version 0.1, ranked by Maximal Clique Centrality (MCC) method [17]. Genes possessing higher scores in the MCC analysis were considered hub genes.
The Cytoscape plugin MCODE (Molecular Complex Detection) version 1.5.1 was applied to obtain significant modules associated with the PPI network with degree cutoff 2, node score cutoff 0.2, k-core 2, and maximum depth 100 [18].

GO and Reactome pathway enrichment analyses
The DEGs were analyzed at the functional level for GO and Reactome pathway enrichment using the STRING Enrichment App of the Cytoscape software, with P value<0.05 as the cutoff criterion. The miRNA microarray signal intensity downloaded from NCBI GEO dataset GSE53867 before and after normalization have been indicated in Fig. 1a and Fig. 1b.

Target genes of DEMs
The number of target genes obtained from the databases were 725 (478 validated and 247 predicted), 3313 (3003 validated and 310 predicted), and 2201 (1654 validated and 547 predicted), in CD, UC, and IBD, group respectively; after removal of duplicate target genes resulted into 620, 2377, and 1821 number of genes in the CD, UC, and IBD, group respectively, an amount of which were further used for construction of regulatory network between the miRNAs and their targets. The miRNAs, hsa-miR-886-3p, hsa-miR-886-5p, and hsa-miR-1979 were excluded from the DEMs-Gene network, due to non-availability of records in the miRNA database, miRBase and withdrawn status from Hugo Gene Nomenclature Committee (HGNC) (https://www.genenames.org); hsa-miR-519d was excluded from the network due to lack of mature sequence miRNA in the dataset. Fig. 1d represents   6 the Venn diagram indicating a total of 620 DEGs in the CD group, 2377 DEGs in the UC group, and 1821 DEGs in the IBD group, out of which a total of 110 DEGs fitted all the three groups, while 149, 659, and 501 DEGs overlapped CD and UC, UC and IBD, CD and IBD respectively.

DEMs-Gene Network
The DEMs-Gene regulatory network of the CD, UC, and IBD, group constructed using Cytoscape, is depicted in Suppliment Figures (Fig. 1S1, Fig. 1S2, and Fig. 1S3), respectively (confidence score cutoff 0.7).

GO and pathway enrichment analyses of DEGs
A functional enrichment of the screened DEGs in the CD and IBD groups (PPI enrichment value 1.0E- 16), indicated that most of the DEGs were closely associated with protein ubiquitination GO Process 7 (GO.0016567) and class I MHC mediated antigen processing Reactome pathway (HSA-983168) related to ubiquitination & proteasome degradation (Fig. 3). Among UC genes, GO Process (GO.0006364) and Reactome pathway (HSA-6790901) enrichment were significantly related to rRNA processing and modification respectively (PPI enrichment value 3.28E-10) (Fig. 3). The representative KEGG pathways of statistically enriched DEGs in CD, IBD, and UC are represented in Fig. 4a, 4b, and 4c respectively. In the GO Process and pathway analysis, the top enriched pathways (PPI enrichment 1.0E-16) in CD included respectively protein ubiquitination and class I MHC mediated antigen processing related to ubiquitination and proteasome degradation (FDR 9.69E-1, Reactome pathways) (Fig. 3). However, other GO terms and pathways included post-translational protein modification, negative regulation of intracellular signal transduction, neddylation, synthesis of active ubiquitin including roles of E1 and E2 enzymes in CD exhibited significant expression.
In the GO and pathway analysis, the top enriched pathways in UC included rRNA processing and modification in the nucleus and cytosol. In addition, the GO terms in the category biological process significantly enriched by these genes were chromatin organization and remodeling, ribosomal small

Discussion
Idiopathic IBD, which includes CD and UC, creates several diagnostic impediments owing to common clinical, radiographic, endoscopic, and histologic characteristics. Thus it is important to recognize the subtypes of IBD for developing specific diagnostic and novel therapeutic intervention in IBD. The miRNAs are potential candidates for therapeutic application owing to well-defined mechanism of action during dysregulation in human IBD and simple oligonucleotide design [19]. This paper investigated the expression pattern of miRNAs in IBD in comparison to the other IBD subtypes, namely CD and UC, in order to identify miRNAs as potential diagnostic and prognostic biomarkers to better delineate the IBD subtypes.
In the current study, genes which concurrently fitted 5 databases (miRWalk, miRDB, miRanda, RNA22, and TargetScan) were chosen as predicted target genes, and validated target genes from miRWalk were taken, amounting to 1673 upregulated and 3145 downregulated genes. The miRWalk database contains information about predicted interactions based on integration of several algorithms and experimentally validated interactions as well (http://mirwalk.umm.uni-heidelberg.de) [10].Selecting interactions predicted by multiple tools ensures enhanced recovery of identical miRNA-gene interactions. Cytoscape tool was used to visualize the interaction data on the predicted and validated miRNA and their putative target gene regulatory network structure [15]. There were 620 DEGs  Fig. 1S3). The UC and CD groups had 149 DEGs in common, while 110 DEGs fitted all the three groups (Fig. 1d), implying overlapping genetic constituent within CD, UC and IBD. The PPI network was visualized using Cytoscape among 571, 300, 947, 762 proteins in the CD, UC, upregulated and downregulated IBD, groups respectively, followed by analysis of the most connected nodes, considered as key regulators of pathways and biological functions (Suppliment Fig. 2S). Among the eleven topological analysis methods in CytoHubba, the MCC method, which was based on the degree of connectivity of each protein and the size of the interactions connected to the rest of the network, was proposed to have superior accuracy of identifying hub genes encoding proteins in the network [17]. Hub genes, being central element of the PPI network, serve as prospective biomarkers, therapeutic targets and novel tool for analyzing crucial mechanisms regulating disease processes.
The top ranking hub gene of the CD group was SOCS3 (Suppressor Of Cytokine Signaling 3) upregulated by miR-650 (log 2 FC = 1.745, P value = 0.018561). Cheng et al reported the role of miR-19b in inhibiting inflammatory response by downregulateing SOCS3 to alter chemokine production in CD [25]. PPI from cytoscape showed cytoplasmic expression of SOCS3 in most tissues and additionally in the plasma membrane; among blood cell types, it was specific to neutrophils. SOCS3 was expressed in the cells of epithelium and lamina propria in the colon in IBD, UC and CD cases [26].
SOCS3 as a member of the SSI (STAT-induced STAT inhibitor) family act to inhibit cytokine signal transduction through the JAK/STAT pathway [27]. SOCS3 also acted as a substrate recognition component of a SCF-like ECS (Elongin B/C-Cul2/Cul5-SOCS-box protein) E3 ubiquitin ligase complex to mediate ubiquitination and subsequent proteasomal degradation of cytokine receptors in inflammation [28].
Among the top hub genes of each group, the top 7 upregulated hub genes of the CD group namely, SOCS3, WSB1, ASB6, UBE2D2, UBE2D3, CDC27, UBE2B were also expressed as the upregulated hub genes of the IBD group, while the WSB1 gene from the CD group was concomitantly present in the downregulated hub genes of the IBD group (Suppliment Fig. 2S). The DEGs in both CD and IBD groups were significantly enriched (PPI enrichment 1.0E-16) in pathways such as class I MHC mediated antigen processing related to ubiquitination & proteasome degradation (Fig. 3). Analysis of biological processes showed that the DEGs in both CD and IBD groups were significantly enriched in protein ubiquitination (Fig. 3). Intracellular antigens and cytokines induce proteasome, immunoproteasome containing PA 28 and other accessory particles, for further degradation of the ubiquitinated substrates by an ATP-dependent mechanism (Fig. 4a). Peptides in association with HSP chaperones are translocated by TAP to the endoplasmic reticulum (ER) lumen, where they are placed on the MHC class I complex, made of heavy chain and β2m (Fig. 4a). MHC I fold and assemble in ER, assisted by several ER chaperones. MHC class I complexes present peptides on the cell surface, for recognition by CD8+ T cells and NK cells. The SOCS3 and SKP2 genes, in the present study were up-regulated during the course of CD and IBD respectively concomitant with upregulation of pro-inflammatory cytokine TLR6 in both CD and IBD; upregulation of pro-inflammatory cytokines such as IL6, CCL22, IL-1A in IBD only, through activation of TLR-and TNF-signalling pathways via activation of NFKB (nuclear factor kappa B) pathway; JAK signaling pathway and cytokine-cytokine receptor interaction all leading to ubiquitin mediated proteolysis, by K63-linked polyubiquitin chains (Fig. 4b). The ubiquitin proteasome pathway, is essential for protein degradation carried out by three classes of enzymes E1, E2, and E3; E1 and E2 prepare ubiquitin chains that are then linked to proteins by the E3 [30]. The SOCS3, SKP2, and ASB1 are components of the SOCSbox in ECS complex of E3 ubiquitin ligase containing CUL5, RNF7/RBX2, Elongin BC complex and SOCSbox [28].
CIRH1A, encode WD40-repeat-containing protein, involved in nucleolar processing of small subunit pre-18S rRNA. In the GO and pathway analysis, the top enriched pathways in UC included rRNA processing and modification in the nucleus and cytosol (Fig. 3). The CIRH1A gene, specific to the UC group, was down regulated concomitant with downregulation of U3 snoRNP complex, 90S preribosome components, pre-40S, pre-60S, and export factors components (Fig. 4c). Impairment of 13 ribosome biogenesis induce ER stress leading to proinflammatory cytokine production via NFKB activation with IKK (IKBA kinase ) activity maintained at basal level via IRE1 (inositol-requiring ER-tonucleus signal kinase 1), an ER stress sensor [31]. The significantly downregulated genes in UC, in the present study, included receptor families such as TLR6 via NFKB signaling pathway; downregulation of pro-inflammatory cytokines such as IL6, IL-1A, IFNG through activation of JAK signaling pathway and cytokine-cytokine receptor interaction (Fig. 4b). IL-6 enhanced upregulation of rRNA transcription stimulated the MDM2 (mouse double minute 2 homolog)-mediated p53 proteasomal digestion, by reducing the availability of ribosome proteins for MDM2 binding [32].

Conclusions
It was found that the SOCS3, CIRH1A, SKP2, and ASB1 genes might play predominant role in the pathogenesis of CD, UC, IBD during upregulation, and downregulation resepectively. The SOCS3 was upregulated by miR-650 in CD, induced by various cytokines, through the NFKB signalling pathway to mediate ubiquitination and subsequent proteasomal degradation of cytokine receptors in inflammation. CIRH1A gene downregulated by miR-16 was specific to the UC group, acted by impairing ribosome biogenesis to induce ER stress via NFKB signaling pathway, JAK signaling pathway and cytokine-cytokine receptor interaction. SKP2 was upregulated by miR-142 and ASB1 was downregulated by miR-665, to induce pro-inflammatory cytokines such as IL6, CCL22, IL-1A in IBD through activation of TLR-and TNF-signalling pathways via activation of NFKB pathway; JAK signaling pathway and cytokine-cytokine receptor interaction all leading to ubiquitin mediated proteolysis. The SOCS3, CIRH1A, SKP2 and ASB1 genes might serve as valuable biomarkers to differentiate CD, UC and upregulated IBD and downregulated IBD respectively.

Supplementary Files
This is a list of supplementary files associated with this preprint. Click to download.