Genome-wide transcriptional patterns of candidate genes linked to distinctive pathogenesis in Crohn’s disease

Background: Crohn’s disease (CD) is an inflammatory bowel (IBD) disease with variable and complex pathophysiology. The objective of the present study was to identify genome-wide gene expression profiles underlying the progression of CD. Methods: Surgery biopsies (n=48) were analyzed by Illumina cDNA-mediated annealing, selection, extension, and a ligation microarray process specifically designed for formalin-fixed, paraffin-embedded clinical samples. Tissue samples were collected from consecutive actively-involved and uninvolved sites from the same CD patient. The CD-involved and CD-uninvolved samples were compared with non-inflammatory bowel disease healthy controls. Results: CD patients’ uninvolved sites demonstrated an average gene expression between CD-involved patients and healthy controls suggesting the CD uninvolved site was unable to restore the full healthy control phenotype. In addition, peroxisome proliferator-activated receptors (PPAR) signaling-associated genes were involved in both CD involved and uninvolved sites, showing a stepwise decrease in the gene and protein expression of PPAR and downregulation of 3-hydroxy-3-methylglutaryl-CoA synthase 2, as confirmed by gene expression analysis and immunohistochemistry. Conclusions: The results of the present study provide evidence for clear differentiation of the two states, and in some cases repeated flares appeared chronically at the previous uninvolved locus.

treatment are important to reduce the risk of complications and the need for surgery [7][8][9] .
Since early treatment relies on early diagnosis, discovering a CD-specific molecular signature involved in disease initiation and progression is the key to better understanding the pathogenesis of CD and to allow an early diagnosis.
Recent studies on genome-wide microarray gene expression profiling of intestinal mucosal biopsies have showed that immune disorder, autophages, and the proinflammatory pathway play important roles in the pathogenesis of CD [10,11] . Interestingly, several studies have demonstrated that the molecular-level CD-uninvolved site is unable to restore a full healthy control phenotype. Distinctive genes including interleukin 8 (IL-8), SSA1, regenerating family member (REGL), domain-containing protein 2(NOD2), and autophagy-related 16 like 1(ATG16L1) have been shown to be differentially regulated in inflamed or uninflamed CD biopsies compared to normal controls [11] . Therefore, involved biopsies and uninvolved biopsies in the same patient with CD were collected to further study the early pathogenic process of CD.
The principal objective of the present study was to confirm if the CD-uninvolved phase did not have a gene expression profile consistent with that of healthy tissue. A biologically relevant gene expression slope extending from healthy checks through to CD-uninvolved and CD-involved phases as well as important pathways differentiating CD-involved from CD-uninvolved states are identified in this study thus providing a better understanding of the molecular processes leading directly to colon tissue dysplasia and colorectal cancer (CRC). 4

Expression profiling differentiated the CD-involved phase from the CD-uninvolved phase and controls
Surgery biopsies collected from successively active-involved and uninvolved sites at the same time in individual CD patients were analyzed by genome-wide gene expression profiling. These CD-involved and CD-uninvolved samples were compared with each other as well as with non-IBD healthy controls. Principal component analysis (PCA) was conducted using the entire gene expression data from all specimens to yield a threedimensional plot (Fig. 1). Both CD-involved (red) and CD-uninvolved (blue) sites were clearly isolated from healthy controls (green), and grouped into two groups. The results of a full dataset (carrying large parts of genes) strongly suggested differentiation between CD-involved and CD-uninvolved disease states that were both separate from healthy controls.
Gene expression profiles comparing healthy controls with diseased tissues ( Fig. 2A) were conducted directly on a gene-by-gene basis. A total of 5,329 greatly perturbed genes (3,191 upregulated genes; 2,138 downregulated genes) relevant to controls were compared with CD cases involved with controls ( Fig. 2B), and 3,528 genes were significantly perturbed (1,976 upregulated genes; 1,552 downregulated genes) relative to CD-uninvolved sites by directly comparing CD-involved with CD-uninvolved sites, respectively (Fig. 2B).
Among these genes, several genes were found to overlap between the two groups; 1339 genes performed at a higher level of expression (Fig. 2C) and 894 genes had a lower level of expression (Fig. 2D) in both CD-involved group linked to the controls and the CDinvolved group relative to the CD-uninvolved group, respectively. By comparing CDuninvolved samples directly to controls, several of the 2,948 genes were significantly 5 perturbed (1,618 upregulated genes, and 1,330 downregulated genes) and those relevant to the controls were also identified (Fig. 2B).

Signature identification confirmed that the CD-uninvolved site totally disabled recovery to a healthy phenotype
A group of statistically significant regulated genes were selected for functional enrichment analysis (FEA) to study the biological functions distinguishing CD-involved sites from CDuninvolved sites as well as from healthy controls (Fig. 2B). These selected genes were enriched into multiple biological pathways including inflammatory, immune responses, cell adhesion, apoptosis, EMT, and PPAR signaling ( Table 2). An increase in biological functions associated with cell inflammation and immune reactions was found in both CD-involved sites and CD uninvolved sites thus demonstrating strong inflammation in the remission stage and in active CD.
The expressions of nine selected genes were analyzed using QRT-PCR in an independent cohort of patients (cohort 2, Table 1). IL8, colony-stimulating factor 3 (CSF3), and prostaglandin D2 synthase (PTGDS) genes related to inflammation and chemokine showed a pronounced upregulation in both CD-involved and CD-uninvolved samples, compared with controls. The cell adhesion genes matrix metallopeptidase 1 (MMP1), epidermal growth factor-like protein 6 (EGFL6), insulin-like growth factor 2 (IGF2), and LEP exhibited significant upregulation only in the CD-involved samples, compared with CD-uninvolved samples and controls. Gene expression of the colorectal cancer biomarker gene chordinlike 2 (CHRDL2) was also confirmed by QRT-PCR in independent COHORT 2 by showing a noticeable upregulation in both CD-involved and CD-uninvolved samples. Conversely, WNK4, a gene related to cell junctions, was prominently downregulated in CD-involved and CD-uninvolved samples compared with controls (Fig. 5).

EMT marker genes distinguished CD-involved samples from CD-uninvolved
6 samples and control tissues The expressions of genes for biological functions related to the epithelial mesenchymal transition (EMT) pathway were found to be upregulated in CD involved samples, but not in CD uninvolved samples (Table 2). FEA and gene set enrichment analysis (GSEA) confirmed both up-and downregulated gene expressions for the EMT pathway as up-and downregulation was seen in both CD-involved samples relative to CD-uninvolved samples and controls, clearly supported by a unique pattern of enhancement as shown in Figure 3.
EMT is a process of cellular dedifferentiation allowing epithelial cells to undergo multiple biochemical changes for mesenchymal cells involved in tissue repair and pathological processes. However, EMT gene expression can also initiate metastasis in the progression of disseminated cancers [12] . The distinctive difference of EMT genes in the molecular phenotype between CD-involved and CD-uninvolved samples is important in metastatic potential at active locations of ulceration in CD patients, which may increase the cancer risk in patients with active CD. The fundamental event in EMT conversion is the loss of Ecadherin from epithelial cells [13] . Conversely, both initiation of cell movement and suppression of cytokeratin expression distinctive of the final stages of the epithelial to mesenchymal cell transition can be activated by vimentin [14] .
In the present study, RT-PCR in independent COHORT 2 confirmed that the expression of E-cadherin and vimentin genes was significantly deregulated in CD-involved sites, compared with CD-uninvolved sites and non-IBD controls (Fig. 3, p<0.05).

CD involved and uninvolved sites showed gradually decreasing expression of PPAR signaling genes
The expression of genes related to biological functions of the peroxisome proliferatoractivated receptor (PPAR) signaling pathway was found to be downregulated in CD-involved samples and CD-uninvolved patients (Table 2). Figure  inflamed tissue, and compared with healthy control mice, indicated significant lower expression in tumor tissues and adjacent inflamed tissues than those in controls (Fig. 4E).

Discussion
In this study the CD-uninvolved samples displayed a level of gene expression that was intermediate between controls and CD-involved samples, while gene and protein 8 expression of PPAR markers HMGCS2 and PPAR-γ were reduced significantly in both CDinvolved and uninvolved samples, as well as in tumors samples, thus indicating that the EMT pathway was specifically upregulated in the pathogenesis of CD-involved samples.
The results provide new evidence for better understanding increasing cancer occurrence risk linked to patients with severe and chronic CD [15,16] . Chronic CD inflammation increases tissue dysplasia and metastatic potential underlying dysregulated colonic mucosal epithelial tissues.
The gene expression associated with the PPAR signaling pathway was downregulated in the samples for CD-involved and CD-uninvolved patients. The expression of many regulatory genes in lipid metabolism and insulin sensitization is governed by PPAR-γ, a nuclear receptor together with family members PPAR-α and PPAR-β/δ originally discovered in adipose tissue. Previous studies using animal models of colitis and IBD patients have also identified PPAR-γ's role in regulating inflammation and the immune response in the colon through epithelial cells [17][18][19] . Furthermore, recent studies have revealed how PPARγ decreased expression in many types of tumors including cancers of the breast, lung, pancreas, and colon. Therefore, PPAR-γ is now known as a growth-limiting and differentiation leading factor suppressing tumor development [20,21] . Moreover, naturallyoccurring and synthetic PPAR-γ promotes agonist growth inhibition and apoptosis supported by our findings showing a significant decrease in gene and protein expression of PPAR markers HMGCS2 and PPAR-γ in CD-involved and uninvolved samples as well as in tumor samples.
The presence of a consecutive inflammatory state in CD-uninvolved samples is confirmed in the present study as well as in a previous study [11] in which the gene expression of inflammation including IL8, CSF3, and PTGDS was not recovered to normal levels but 9 remained much improved in CD-uninvolved samples. An intermediate inflammatory state in CD-uninvolved samples was also identified by other researchers [10] . Furthermore, in addition to inflammatory pathways, a set of the important dysregulated genes in CDinvolved samples were involved in multiple pathways similarly perturbed in both CDinvolved and CD-uninvolved samples such as pathways responsible for the biological functions of cellular proliferation, angiogenesis, and cell junctions, and cancer-related pathways.
A set of EMT-mediated genes was dysregulated only in active CD, as supported by our findings of gene expressions in the EMT pathway that showed significant upregulation and downregulation in CD-involved samples relative to CD-uninvolved samples and controls.
EMT is a process of transition activity of epithelial plasticity for all changes in cell morphology from epithelioid to a mesenchymal-like phenotype as myofibroblasts [22] .

Epithelial cells express E-cadherin, while mesenchymal cells display N-cadherin,
fibronectin, and vimentin [19,23] . Regulation of the EMT procedure selectively losing Ecadherin and increasing vimentin in CD was validated and confirmed with QRT-PCR (Fig. 3) in the present study thus proving the EMT process occurs in CD-involved samples and distinguishes CD-involved samples from both CD uninvolved samples and controls. Since EMT plays an important role in the pathogenesis and invasion of colon cancer [24,25] , the CD involved site which exhibits the active EMT process may increase the risk of colon cancer.

Conclusions
The present study provides rigorously defined whole genome expression profiles in the same patients with different phases of CD, in comparison with controls, and also confirms the occurrence of particular changes of gene expression determined from healthy controls as well as CD-uninvolved and CD-involved groups. The gene expression of multiple pathways is permanently dysregulated in CD patients thus allowing CD samples to be easily distinguished from heathy samples.

Patients and biopsy samples
All protocols were approved by the Nanfang Hospital Medical Ethics Committee, and all participating subjects provided written informed consent. All methods were performed in accordance with relevant guidelines and regulations. Experienced pathologists and physicians followed WHO diagnostic criteria as well as the clinical disease activity index (CDAI) in diagnosing CD and measuring disease activities [26,27] . CD-involved and CDuninvolved samples were collected from inflamed areas and uninflamed areas, respectively, in the same CD patients at the same time. Healthy control specimens were collected from the normal colon tissue of healthy individuals.

RNA extraction
Three 5-μm specimens per FFPE block were applied for RNA extraction and purification using an AllPrep DNA/RNA FFPE Kit (Qiagen, Gaithersburg, MD, USA).

DASL cDNA microarray gene expression profiling and data statistical analysis
Whole genome DASL (Illumina, USA) was conducted as previously described in the study by Reddy et al (2016) [28] and data was generated on Illumina array platforms using Genome Studio software. Raw and normalized data were accessed from the Gene Expression Omnibus (GEO) database (GSE95095). Detailed data analysis information is described in Supplementary Methods.

Quantitative reverse transcription PCR (QRT-PCR)
Using a cDNA Archive Kit (Applied Biosystems, Foster City, CA), cDNA from total RNA was used for QRT-PCR. Probes and primers were developed to adapt the shorter amplicons (average size is equal to 95 nucleotides) possibly from partially degraded FFPE RNA specimens, and were synthesized by Integrated DNA Technologies, Inc. (San Diego, CA, USA). QRT-PCR was conducted in triplicate by the SYBR Green system on a RT-PCR apparatus HT7300 (Applied Biosystems). Relative levels of gene expression were analyzed using the 2 -∆∆Ct method following the description of previous study 25 . The ∆Ct value of each sample was estimated using GAPDH as endogenous control gene.      Supplementary file.pdf