Bioinformatics Analysis of Differentially Expressed Genes Involved in Irritable Bowel Syndrome With Diarrhea

Purpose: Irritable bowel syndrome with diarrhea (IBS-D) is a common functional gastrointestinal disorder around the world. However, the molecular mechanisms of IBS-D are still not well understood. This study was designed to identify key biomarkers and immune inltration in the rectal mucosa of IBS-D by bioinformatics analysis. Methods: The gene expression proles of GSE36701 were downloaded from the GEO database. The differentially expressed genes (DEGs) were identied and functional enrichment and pathway analyses were performed. Using STRING and Cytoscape, protein-protein interaction (PPI) networks were constructed and core genes were identied. Subsequently, 22 immune cell types of IBS-D tissues were explored by the Cell type Identication by Estimating Relative Subsets of RNA Transcripts. Finally, the co-expression network of DEGs was estimated by the weigh gene co-expression network analysis method to identify IBS-D-related modules and deeply hub genes. Results: 224 up-regulated and 171 down-regulated genes in IBS-D patients: Our analysis indicated that several DEGs might play crucial roles in IBS-D, such as CDC20, UBE2C, AURKA, CDC26, CKS1B and PSMB3. Later, we found that immune inltrating cells such as T cells CD4 memory resting, M2 macrophages are crucial in IBS-D progression. In the end, a total of 9 co-expression gene modules were calculated and the black module was found to have the highest correlation. 15 hub genes were identied both in DEGs and the black module. Conclusions: This study identied molecular mechanisms and a series of candidate genes as well as signicant pathways from the bioinformatics network, which may provide a diagnostic method and therapeutic targets for IBS-D. in the black module were superimposed to identify the hub genes, the results were shown in gure 10c. Finally, 15 hub genes were identied both in DEGs and the black module, including DFNB59, FLJ45513, GOLGA8A, HIST1H2AE, HIST1H3C, LINC00893, LOC100506114, LOC101927391, LOC101928068, LOC101929988, LOC286367, PFDN2, RP11-395I6.3, RP11-676J12.4, SEC31B.


Introduction
Irritable bowel syndrome (IBS) is a frequent functional bowel disorder which is the characteristics of abdominal discomfort and change in bowel evacuation habit [1]. IBS is one of the most common gastrointestinal diseases causing the decline of quality of life and the reason for patients to seek medical advice. Compared with the general population, the physical functioning, sense of well-being, social functioning of IBS patients are signi cantly limited, accompanied by obvious discomfort, pain and fatigue. It exerts an enormous economic burden and takes responsibility for a considerable high incidence rate worldwide [2][3][4]. The ultimate cause of IBS is not yet completely understood. It seems to be multifactorial and many pathogenic factors can play a signi cant role (1). IBS can be classi ed into three predominant subtypes: IBS with constipation; IBS with diarrhea (IBS-D); mixed IBS. It was reported that 31%-48% of IBS was IBS-D, whose prevalence was higher than the other two subtypes [5].
Studies have shown that the richness of the intestinal microbiota in the luminal niche has been found to reduce in IBS-D patients [6,7]. Altered gastrointestinal motility, visceral hypersensitivity, post in ectional reactivity, brain-gut interactions, alteration in fecal micro ora, bacterial overgrowth, food sensitivity, carbohydrate malabsorption, and intestinal in ammation all have been implicated in the pathogenesis of IBS [8]. Also, IBS-D may be associated with increased platelet depleted plasma 5-HT concentrations [9,10]. However, the mechanisms of gene and protein expression in IBS-D are still unclear.
Microarray analyses have been increasingly used to explore the pathogenic processes of several diseases and identify disease-associated genes and pathways, which presents as an important technology for the prevention, diagnosis and treatment of diseases [11]. Therefore, the IBS-D-related genes downloaded from the GEO database provide genetic support for future research and recognition of IBS-D.

Material And Methods
Microarray data screening Gene expression pro le dataset GSE36701 [12] was downloaded from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/). The data was produced using the GPL570 [HG-U133 Plus 2] Affymetrix Human Genome U133 Plus 2.0 Array. The GSE36701 dataset contained data from 93 volunteers, including 40 healthy control (HC) and 53 IBS-D patients.

Data processing and DEG screening
The downloaded platform and series of matrix les were preprocessed using the Affymetrix package [13] in R version 4.0.3 language software (cran.at.r-projcet.org). The limma package in the Bioconductor package [14] (http://www.bioconductor.org/) is used for Gene differential expression analysis [15]. The Pvalue of each gene symbol was normalized by R software using the Limma package and then saved as a TXT le. Only the difference between DEGs with adjusted-P<0.05 and |log FC(fold change)|>1 was statistically signi cant.

GO and KGEE pathway analysis of DEGs
Gene Ontology (GO) [16] (www.geneontology.org) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) [17] (www.genome.ad.jp/kegg) are essential tools to systematically extract the pathway information from molecular interaction networks and identify the differentially expressed genes (DEGs). R was widely used to perform GO and KEGG pathway analysis to determine the biological signi cance of DEGs. Adjusted-P<0.05 was considered a statistically signi cant difference.
PPI network and functional modules screening PPI networks are mathematical analysis of interactions between proven protein and predicted protein using The Search Tool for the Retrieval of Interaction Genes (STRING) database [18] (http://stingdb.org/), which provide information to construct the PPI networks by selecting protein interaction with a combined score>0.4. Besides, the PPI networks were visualized using Cytoscape software [19] (http://www.cytoscape.org) and the cytohubba in Cytoscape software was used to screening the hub genes. Ranked by MCC, the top 10 genes were colored marked in the pictures.

Results
The microarray data information and identi cation of DEGs in IBS-D The IBS-D expression microarray datasets GSE36701 were downloaded from GEO and normalized by the limma package, and the results were shown in gures 1a and 1b. 395 DEGs were obtained using the limma package (adjusted-P<0.05, |log FC| > 1), including 224 up-regulated genes and 171 down-regulated genes. The up-regulated genes and down-regulated genes in the top 50 were shown in Table 1. The volcano of total DEGs between the HC and IBS-D sample data in GSE36701 is shown in gure 2. The cluster heatmap of the top 100 DEGs was shown in gure 3. The top 10 up-regulated and down-regulated DEGs identi ed between HC and IBS-D tissue data were shown in Table 2.

Gene Ontology (GO) enrichment
To identify the biological function of the DEGs, the R software was used to integrate microarray data with GO term enrichment and divided into up-regulated and down-regulated genes with an adjusted-P-value of<0.05. In general, Go enrichment includes three functional groups: biological processes (BP), molecular function (MF) and cellular component (CC). The enriched biological process was shown in gure 4a and gure 4b. The signi cant DEGs analysis results were demonstrated in Table 3. In the biological process group, the up-regulated genes are mainly concerned with RNA and mRNA splicing, rRNA and ncRNA processing, protein-containing complex localization, regulation of cellular amide metabolic process RNA localization. The down-regulated genes were mainly played roles in the protein catabolic process, regulation of chromatic segregation, protein-DNA complex assembly, regulation of mitotic nuclear division and positive regulation of ubiquitin protein ligase activity. In the cellular component group, the up-regulated genes mainly concerned with nuclear speck, Cajal body, spliceosomal complex. The downregulated genes were mainly enriched in ubiquitin ligase complex, transfer complex, protein-DNA complex and RNA polymerase complex. In the molecular function group, the up-regulated genes are mainly concerned with threonine-type activity, general transcription initiation factor activity and TBP-class protein binding. The down-regulated genes mainly played roles in small nucleolar RNA (snoRNA binding, transferase activity, protein serine/threonine kinase activity and glycosyltransferase activity. These results indicated that DEGs signi cantly enriched in cell division, gene expression, transcription and translation activity as well as protein catabolism.

KEGG pathway analysis
The pathways of these DEGs were identi ed by KEGG pathway analysis. The signi cantly enriched pathway concerned with IBS-D were 'Spliceosome'(22%), 'Human T-cell leukemia virus 1 infection'(26%), 'Progesterone-mediated oocyte maturation'(16%), 'Oocyte meiosis'(18%) and 'Ubiquitin mediated proteolysis'(18%), the results were shown in gure 5a and 5b and Table 4. The results of IICs found that M0 macrophages were not expressed in DEGs. Therefore, only the remaining 21 types of immune in ltrating cells were analyzed for their correlation. The quanti ed contrast of the distribution of IICs subsets between HC and IBS-D tissues was shown in gure 8a. The results indicate that the proportion of T cells CD8 (p=0.03) and Mast cells activated (P=0.04) is relatively high in HC tissues compared to IBS-D tissues. The results are shown in gure 8b. Based on the above results, the anomalous immune in ltration in IBS-D tissues and its nonuniformity and heterogeneity indicates that it may have vital signi cance to guide clinical practice as a tightly regulatory process.

Construction and analysis of WGCNA associate with IBS-D and HC
To nd the hub genes, the WGCNA was used to identify the co-expression set of genes and modules. WGCNA analyzed 395 DEGs to explore the co-expression network. Pearson's correlation coe cient was used to perform the cluster analysis and drew a clustering tree shown in gure 9a. The power equal to 3 when the scare free 2 reached 0.9 was chosen as the soft threshold for further analysis. Then, the cluster dendrograms of HC and IBS-D tissues were performed to detect gene modules. The results were shown in gures 9b and 9c. 9 distinct co-expression modules were detected, including brown, blue, red, black, pink, green, magenta, yellow, turquoise. After correlation analysis, the black module (cor=0.35,p=7×10 -4 ) is highly related to the pathological process of IBS-D, the module-trait relationships were shown in gure 10a. The correlation between each gene in the black module was identi ed by the scatter plot, the results were shown in gure 10b. Then, the 395 DEGs and 172 genes in the black module were superimposed to identify the hub genes, the results were shown in gure 10c. Finally, 15 hub genes were identi ed both in DEGs and the black module, including DFNB59, FLJ45513, GOLGA8A, HIST1H2AE, HIST1H3C, LINC00893, LOC100506114, LOC101927391, LOC101928068, LOC101929988, LOC286367, PFDN2, Discussion IBS is a chronic condition and the largest gastrointestinal clinical subgroup affecting 9% and 23% of the general population. It has the most signi cant impact on patients' work, life, health-care and society. The proportion of 15%-43% of patients has to pay for the treatment expense for immediate cure [8, 24, 25]. So far, researches about IBS-D focused on the multiple types of immune cells that in ltrated the intestinal mucosa and released in ammatory mediators, disrupting the intestinal epithelial barrier and nervous system signaling [26]. Several studies showed that low-grade was highly associated with the pathophysiology of IBS-D [27, 28]. However, the micro-level of the pathogenic mechanism of IBS-D remains doubtful because IBS-D has no structure and metabolic abnormalities accounting for its syndrome [29,30]. Therefore, it is of signi cant importance to explore the gene and molecular mechanisms and development of the IBS-D to identify the underlying cause of the disease. The differentially expression genes (DEGs) of IBS-D have become a hot spot since the establishment of gene database of TCGA, SEER and GEO. GEO is a comprehensive genes database of both tumor and nontumor when TCGA and SEER only contain genes of tumor diseases. There, this study aims to explore the DEGs, protein-protein interaction (PPI) network, immune in ltrating cells and gene co-expression network so as to do further statistical analysis about IBS-D using the GEO database. GO and KEGG effectively clusters the functional genes into different biological processes to systematically analyze the gene function in the biological pathway [31,32]. PPI network plays a crucial part in predicting the function of interacting genes or proteins as well as providing evidence for evolutionary conservation of gene interacting [33].
395 DEGs were obtained using the robust multiarray averaging algorithm, including 224 up-regulated genes and 171 down-regulated genes. We constructed a DEGs-encoded protein network and explored 10 up-regulated hub genes: SRSF1 (serine and arginine rich splicing factor 1), SNRNP70 (small nuclear ribonucleoprotein U1 subunit 70), SRSF6 (serine and arginine rich splicing factor 6), HNRNPA2B1 (heterogeneous nuclear ribonucleoprotein A2/B1), HNRNPR (heterogeneous nuclear ribonucleoprotein R), SRRM2 (serine/arginine repetitive matrix 2), CCAR1 (cell division cycle and apoptosis regulator 1), FUS (FUS RNA binding protein), HNRNPU (heterogeneous nuclear ribonucleprotein U), PRPF3 (pre-mRNA processing factor 3). And 10 down-regulated hub genes: CDC20 (cell division cycle 20), UBE2C(ubiquitin conjugating enzyme E2 C), AURKA (aurora kinase A), CDC26 (cell division cycle 26), CKS1B (CDC28 protein kinase regulatory subunit 1B), PSMB3 (proteasome 20S subunit beta 3), PTTG1 (regulator of sister chromatid separation, securin), CCNB (cuclin B), PSMA2 (proteasome 20S subunit alpha 2), PSMB9 (proteasome 20S subunit beta 9). Researches showed that the HNRNP is a speci c selection of autoantigens stimulating T cells by activating antigen-presenting cells with Toll-like receptor to initiate in ammation [34,35]. Kalva S et al. [36] showed that FUS had an impact on the occurrence of IBS among the thousands of DEGs from the PPI network. Our research indicates that the biological process of RNA splicing and spliceosomal complex and the pathway of spliceosome play major roles in the pathogenic mechanism of IBS-D. Wohlfarth C et al. [37] signi ed that the reduction of miR-16 AND miR-10 weaken the function of 5-HT 4  Researches showed that IBS-D patients had increased faecal serine protease activity which took a signi cant part in visceral hypersensitivity and gave rise to increased colonic paracellular permeability causing allodynia and diarrhea [39,40]. Thus, the down-regulated hub genes of PSMB3, PSMA2, PSMB9 and gene ontology enrichment of proteasomal protein catabolic may inhibit the degradation of serine protease and may be a promising candidate in IBS-D pathophysiology. In addition, the down-regulated genes of CDC20, CDC26, CKS1B, PTTG1 and its biological process such as regulation of sister chromatid segregation, regulation of mitotic sister chromatid segregation, gene pathway of progesterone-mediated oocyte maturation and Oocyte meiosis indicate that the reduction in cell mitosis and meiosis is the cause of IBS-D. El-Salhy M et al. [41] showed that the decrease of cells expressing Musashi 1(Msi-1), neurogenin 3 NEUROG3 were found in patients with IBS. Human T-cell leukemia type 1 (HTLV-1) is the rst demonstrable retrovirus found to cause T-cell leukemia/lymphoma, or other lymphocyte-mediated disorders [42,43]. In addition, CD4 T cells are susceptible to HTLV-1 infection, which deregulates their differentiation, function and homeostasis. It indicates the pathogenic mechanisms of HLTV-1, including the induction of CD4 T cells transformation and chronic in ammatory disease [44]. Kirsch R et al. [45] Showed that chronic, low-level, subclinical in ammation was involved in the progression of IBS-D and was the cause of the persistence of IBS-D symptoms.
CIBERSORT provides a new method to explore immune biomarkers for diagnosis and prognosis, which can accurately determine the diversity and proportion of immune in ltrating cells of disease [46]. Our research showed that the T cells CD4 memory resting, M2 macrophages, B cells, plasma cells and mast cells resting played an important role in IBS-D disease progression. Many researches showed that the immune in ltrating cells such as T cells and mast cells existing in the intestinal mucosa of patients with IBS-D [47]. Cremon C et al. [48]showed that the proportion of CD4 T cells and mast cells in IBS-D patients is relatively high compared with healthy controls. The elevated T cell activation corresponds with the lowlevel immune activation hypothesis of IBS-D and may be associated with the development of symptoms of IBS-D [49]. Visceral hypersensitivity refers to abnormal pain in the gut caused by stimuli below the pain threshold and increased response to painful stimuli [50]. Most IBS-D patients saw increased visceral sensitivity but the biological process associated with visceral hypersensitivity was unclear [51]. Mujagic Z et al. [52] showed that in ammatory factors activate COX-2 to prompt abnormal synthesis of PEG2 by mast cells and cause visceral hypersensitivity in patients with IBS-D. Boyer J et al. [53] denoted that the number of macrophages increased in IBS-D than HC. Vicario M et al. [54] indicated the unique feature of WGCNA is an approach to cluster genes based on expression patterns, giving a sight into the relationship between gene modules and traits and is a method of identifying candidate biomarkers or therapeutic genes [55,56]. A total of 9 co-expression gene modules were calculated by WGCNA and the black module was found to have the highest correlation. The 15 hub genes both in DEGs and the black module were screened, including DFNB59 (deafness, autosomal recessive 59), FLJ45513, GOLGA8A (golgin A8 family member A), HIST1H2AE (histone cluster 1, H2ae), HIST1H3C (H3 clustered histone 3), LINC00893 (long intergenic non-protein coding RNA 893), LOC100506114, LOC101927391, LOC101928068, LOC101929988, LOC286367, PFDN2 (prefoldin subunit 2), RP11-395I6.3, RP11-676J12.4, SEC31B (SEC31 homolog B, COP coat complex component). DFNB59 is a member of the gasdermin family, which has pore-forming activity and mediates homeostasis as well as in ammation in the gastrointestinal tract and various immune cells, leading to the in ammation death related to damaged cell membrane integrity and increased in ammatory mediators [57,58]. Heyd F et al. [59] showed that the reduced expression of HIST1H2AE resulted in increased T cell apoptosis and decreased cell number. Li S et al. [60] showed that the over-expression of LINC00893 increased the expression of PTEN. At the same time, our research showed that PTEN was a crucial gene of the Human T-cell leukemia virus 1 infection pathway causing IBS-D. PFDN2 was discovered to regulate the cytoskeleton organization [61, 46]. Kuznetsova IM [62] hypothesized that proteins in natural stats were characterized by instability, but the kinetic stable state of the proteins was expressed by the intracellular folding mechanism of PFEN2. In the meanwhile, the protein instability was associated with the colonic transit in IBS-D [63]. The relationship between IBS-D and other hub genes has not been reported yet, which needs to be explored.

Conclusion
In this study, we found the differentially expressed genes of IBS-D through the GEO database. As a result, we analyzed these DEGs and identi ed the biological function, pathways, IICs and WGCNA hub genes to indicate the function of interacting genes or proteins, which might be potential targets and biomarkers for IBS-D treatment.

Declarations
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code available
Not applicable.

Authors' contributions
Yuan-Mei Lou and Yan-Zhi Ge designed and supervised the study; Yan-Zhi Ge and Wen Chen performed the data processing; Lin Su, Jia-Qi Zhang, Gui-Yue Wang and Ya-Qin Qi contributed to the data analysis; Jin-Ying Yang, Zu-Xiang Chen performed data mining and downloading; Hong Song organized and revised the paper. All authors reviewed the nal manuscript.

Ethics approval and consent to participate
The data of this research was downloaded from the GEO database, a public website. All institutional and national guidelines for the care and use of participates were followed.   IBS-D, irritable bowel syndrome with diarrhea. Figure 1 Normalization of IBS-D expression microarray datasets GSE36701 a GSE36701 datasets before normalization; b GSE36701 datasets of normalization. Abbreviation: IBS-D, irritable bowel syndrome with diarrhea. Figure 2 Volcano plot of DEGs Notes: Red spots represent up-regulated genes. Green spots represent downregulated genes and the black spots represent genes with no signi cant difference. All the data is on the basis of adjusted-P<0.05 and |LogFC| > 1. Abbreviation: DEGs, differentially expressed genes; LogFC, log fold change.