GBP4 is an Accurate Diagnostic Biomarker and a Potential Treatment Target for Crohn’s Disease

Background: Extensive evidence has shown that immune cell inltration is associated with the pathogenesis of Crohn’s disease (CD). In the present study, we explored the potential mechanism underlying the pathogenesis biomarkers for CD. Methods: The GSE179285 dataset containing sequence data for intestinal mucosal was downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) in the intestinal mucosa of CD patients and healthy individuals were then identied. The inltration pattern of 22 immune cell types was assessed using the CIBERSORT algorithm. The DEGs and 22 immune cell types were combined to nd the key gene network using weighted gene co-expression network analysis (WGCNA), and pathway enrichment analyzes were performed on the hub module in the WGCNA. A linear regression model for the relationship between the expression of the hub genes in CD patients and inltration of immune cells were also developed. The utility and accuracy of the hub genes for CD diagnosis were assessed using receiver operating characteristic (ROC) analysis. The accuracy of the model was validated using GSE20881 dataset. Results: There were 1135 DEGs between the intestinal mucosal tissue of CD patients and healthy individuals. Of these DEGs, 711 genes were upregulated, whereas 424 of them were downregulated. There was also a signicant difference in the inltration of immune cells to the intestinal mucosal between the CD patients and healthy individuals. WGCNA revealed that the turquoise module genes were strongly correlated with the inltration of M1 macrophages (cor=0.68, p=10 -16 ). Pathway enrichment analysis further showed the genes in the turquoise module mainly regulated the secretion of interferon-gamma and other immune effector molecules. Finally, the expression of GBP4, the identied hub gene, strongly correlated with the inltration of M1 macrophages (adjusted r-squared=0.661, p<2x10 -16 ), and is a relatively good marker for CD diagnostic prediction (AUC=0.736). The relationship between GBP4 expression and inltration of M1 macrophages (adjusted r-squared=0.435, p<2x10 -16 ) and prognostic value of the gene (AUC=0.702) were veried using the GSE20881 validation dataset. Conclusion: GBP4 is a potential biomarker for accurate CD diagnosis. The expression of GBP4 promotes the inltration of M1 macrophages to the intestinal mucosa of CD patients.


Introduction
Crohn's disease (CD) is a chronic in ammatory bowel disease caused by genetic and environmental factors, and alteration in the composition and abundance of gut microbiota. Research also shows the disease can lead to severely debilitating and dysregulated immune response [1,2]. The incidence of CD is increasing worldwide, but it is highest in North America and Northern Europe [3][4][5]. In China, the economic growth in the country has paralleled an increase in the incidence of CD [6,7]. Although the precise etiology of CD remains unclear, dysregulated and excessive immune responses against pathogenic gut microbiota have been implicated in the development of CD [8]. Obviously, immune responses, especially the immune cells, play an important role in CD. Traditional techniques such as immunohistochemistry and ow cytometry, do not explicitly reveal the immune landscape in the intestinal mucosa of CD patients. Among the more well-studied genes, such as NOD2 [9,10], CARD15 [11,12] and PRKCQ [13], have been implicated in CD occurrence and development. However, these genes are not entirely related to immune response and therefore are not ideal targets for immunotherapy. As the immunotherapy has been recommended by clinical guideline of CD treatment [14,15], it is imperative to identify reliable targets in CD patients for immunotherapy. CIBERSORT is a gene expression-based algorithm that accurately reveals the in ltration pattern of immune cells based on gene expression pro les [16]. We investigated in ltration of 22 immune cell types to the intestinal mucosa of CD patients and healthy individuals. Weighted gene co-expression network analysis (WGCNA) is a bioinformatics analytical method for accurate exploration of the relationships between genes and phenotypes [17]. The distinct advantage of WGCNA is that genes can be clustered into co-expression modules, which connect the phenotypic characteristics and the changes in gene expression. The diagnostic value of hub genes can be assessed using receiver operating characteristic (ROC) curve analysis [18]. In the present study, the gene-sequence data for the in ltration of immune cells to the intestinal mucosa of CD patients and healthy individuals were downloaded from the Gene Expression Omnibus (GEO) database. The nding of this study will unpack the complex activities in the immune microenvironment of intestinal mucosa of CD patients, which may reveal new therapeutic targets for the treatment of the disease.

CD microarray datasets
The diagrammatic ow of this study was shown in Figure 1. GSE179285 [22] and GSE20881 [23] datasets were used in this study. GSE179285 was the training set, whereas GSE20881 was the validation set. Data on GSE number, numbers of samples, gender, sites of mucosal collection, platform, and in ammation are shown in Table 1. There was no statistically signi cant difference (p > 0.05) between the training dataset and the validation dataset.

DEGs between CD patients and healthy individuals
Based on the GSE179285, there were 1135 DEGs between CD patients and healthy individuals, in which 711 genes were upregulated whereas 424 genes were downregulated (Figure 2A). The expression pro le of the top 50 most upregulated genes and the top 50 most downregulated genes (Additional le 1) were displayed using a heatmap ( Figure 2B). The upregulated genes occurred in the ileum, whereas the downregulated genes occurred in colon.

Immune cell in ltration
The proportion of immune cells varied between the intestinal mucosa tissues of CD patients and normal individuals ( Figure 3A-3B, Table 2). Compared with normal tissue, the proportion of CD8 T cells, activated CD4 T cells memory, M1 Macrophages, and neutrophils were signi cantly higher in the intestinal mucosa of CD patients. Contrarily, a reverse trend was observed for T regulatory cells (Tregs), gamma delta T cells, activated NK cells, M2 Macrophages, and resting Mast cells ( Figure 4A). The proportions of plasma cells, CD4 naïve T cells, activated dendritic cells were almost insigni cant. There was a strong positive correlation between in ltration of M1 Macrophages and neutrophils (Pearson correlation = 0.519, p < 0.0001), but a strong negative correlation between in ltration of resting Mast cells and activated Mast cells (Pearson correlation = -0.523, p < 0.001) ( Figure 4B) Overall, these ndings demonstrated the complex, intricate network of immune response in the intestinal mucosa of CD patients.

WGCNA and identi cation of hub genes
The soft thresholding power β was set at 18 in the subsequent analysis, because the scale independence reached 0.85 and had a relatively high-average connectivity ( Figure 5A). A total of 23 outliner samples were detected, and the height cut-off value was set at 680 ( Figure 5B). Four coexpression modules of DEGs were constructed by WGCNA ( Figure 6A), and the relationship between modules and in ltration of the immune cells was performed. We found the most signi cant correlation between the turquoise module and in ltration of Macrophages M1 (cor=0.68, p=1x10 -25 ) ( Figure 6B). The immune-related gene in the turquoise module (GBP4) was then identi ed based on MM > 0.9 and GS > 0.7 ( Figure 7A). The expression level of the hub gene is shown in Figure 7B. Compared with healthy individuals, GBP4 was signi cantly upregulated in the colon and ileum of CD patients.

Functional enrichment analysis
The GO analysis showed that the brown module mainly regulated vesicle coating, vesicle targeting, Golgi vesicle budding, positive regulation of lipid biosynthetic process, and lipoprotein particle assembly, the grey module mainly regulated antigen processing and presentation, adaptive immune response, reactive oxygen species responses, interferon-gamma responses, immune effector process regulation, and the turquoise module mainly regulated interferon-gamma responses, immune effector process regulation, regulation of response to biotic stimulus, positive regulation of cytokine production, and leukocyte cellcell adhesion ( Figure 7C, Additional le 2). The KEGG analysis further revealed that the grey module mainly regulated antigen processing and presentation, allograft rejection, viral myocarditis, graft-versushost disease, and Type I diabetes mellitus, and the turquoise module mainly regulated antigen processing and presentation, allograft rejection, viral myocarditis, staphylococcus aureus infection, pertussis, cytokine-cytokine receptor interaction, leishmaniasis, and viral protein interaction with cytokine pathways ( Figure 7D, Additional le 3).

Linear model and ROC curve analysis
There was a positive linear correlation between the expression of GBP4 and in ltration of M1 Macrophages to the intestinal mucosa of CD patients (Macrophage M1=0.0359382+0.0061959*GBP4, adjust r-squared=0.661, p < 2x10 -16 ) ( Figure 8A). The AUC for the diagnostic value of GBP4 for CD was 0.736 ( Figure 8B). The strong correlation between the expression of GBP4 and in ltration of Macrophages M1(Macrophage M1=0.0009155+0.1334921*GBP4, adjust r-squared=0.435, p < 2x10 -16 ), as well as the good diagnostic value of the gene for CD (AUC=0.702) ( Figure 8D) was con rmed using the validation set.

Discussion
CD is a relapsing in ammatory disease, mainly affecting the gastrointestinal tract, and frequently presents with abdominal pain, fever, bowel obstruction or as well as bloody or mucoid diarrhea [24]. The precise pathogenesis of CD remains unclear, but it has been linked to excessive immune response [25][26][27]. Unraveling the complex immune network underlying CD pathogenesis can uncover new targets for the treatment of the disease.
In the present study, we identi ed 1135 DEGs between CD patients and healthy individuals, some of which have been previously reported. OLFM4, which was the most upregulated gene, negatively regulates H. pylori-speci c immune responses [28] and mucosal defense responses during in ammatory bowel disease [29]. The downregulated gene, FABP1, is a validated biomarker of CD diagnosis [30]. The function of other notable in CD such as CHP2 is not well understood. Furthermore, the upregulated gene expression was observed in the ileum, which is the most common site for the disease [31].
CIBERSORT revealed a signi cant difference in proportion of immune cells in the intestinal mucosa of CD and healthy individuals. Macrophage and CD4+ T cells accounted for the largest proportion of the in ltrating immune cells. So far, it had already been reported that macrophage and CD4+ T cells played an important role in CD [32,33]. Intestinal macrophages are a heterogeneous population of cells thought to be derived from classical blood monocytes, mediated by CCR2 [34]. During in ammation, the recruited monocytes differentiate into in ammatory macrophages sensitive to stimulation by Toll-like receptors.
The macrophages also secret proin ammatory cytokines, further promoting in ammation [35][36][37]. In CD patients, the CD14+ macrophages, which secret abundant TNF-α, are the largest proportion of immune cells on the in amed mucosa [38,39]. The proportion of in ltrating macrophages in the intestinal mucosa of CD patients is in line with our analysis by CIBERSORT. CD4+ T cells can also release a large amount of proin ammatory cytokines such as IFN-γ and IL-17/IL-22, and these cytokines contribute to the progression of CD [40]. We observed a signi cant difference in the proportion of resting NK cells, activated NK cells, monocytes, resting mast cells, and neutrophils in the intestinal mucosa of CD patients and normal individuals. Monocytes regulate the phagocytosis of pathogens, digesting processing and presentation of antigens, and releases of effector molecules such as chemokines and cytokines. Moreover, monocytes are thought to be the only source of intestinal macrophages, and changes in the composition of peripheral blood monocytes in CD patients have been reported [41,42]. NK cells provide a rapid innate immune response, killing target cells without priming. Mast cells, which predominate at mucosal surfaces, are also crucial for early host defense. Mast cells selectively recruit and positively modulate the function of NK cells through soluble mediators such as interferons [43].
WGCNA of the GSE179285 dataset identi ed a strong link between the turquoise module and in ltration of macrophages M1. GO analysis revealed the genes in the turquoise module mainly regulate interferongamma response, regulation of immune effector process, regulation of response to biotic stimulus positive, regulation of cytokine production, and leukocyte cell-cell adhesion. Interferon-gamma can induce transcription of metal transporter, which contributes to CD pathogenesis [44]. Interferon-gamma-target therapy can be used in treating active CD [45]. In ammation is closely related to regulating the immune effector process, response to biotic factors, production of cytokine, and adhesion of leukocytes to endothelial cells [46]. KEGG analyses demonstrated that staphylococcus aureus infection, pertussis, cytokine-cytokine receptor interaction, leishmaniasis, and interaction of viral protein with cytokine and cytokine receptor were important pathways in our study. Staphylococcus aureus [47], pertussis [48], and leishmaniasis [49] are some of the opportunistic infections in CD patients due to the immunomodulation and immunosuppressive therapies.
Herein, we found a strong linear relationship between the expression of GBP4 and the in ltration of M1 macrophages in CD patients. Guanylate Binding Protein 4 (GBP4) regulates innate immune response via interferon gamma. GO annotations revealed GBP4 regulates several biological processes, including GTP binding and GTPase activity. Little is known about the GBP families. In mice, GBPs protect against lethal bacterial infections [50] through the GBP4 in ammasome-dependent production of prostaglandins [51]. Moreover, GBP4 is an immune-related signature biomarker for predicting prognoses and immunotherapeutic responses in patients with muscle-invasive bladder cancer [52], and an immune microenvironment biomarker for the prognosis of ovarian cancer [53]. Also, GBP4 takes part in the type-I interferon response and displays a positive correlation with macrophages [54]. However, there is no report about CD with GBP4.
Regarding limitations, rst, the results are based on the computational algorithm. Although the accuracy of this technique has been validated, the nding of this study should be veri ed using in vivo experiments in the future. Second, given the small sample size, the nding of this study may have been exaggerated.

Conclusion
In conclusion, there is a signi cant difference in the in ltration of immune cells to intestinal mucosa tissues of CD patients and healthy individuals. Given that GBP4 is a differently expressed gene between healthy individuals and CD patients and is a driver gene of macrophages, the gene is a potential biomarker for the CD diagnosis and prognosis as well as an immunotherapeutic target for CD treatment.

Source of data
Gene expression data of CD patients and healthy individuals was downloaded from GEO database (http://www.ncbi.nlm.nih.gov/geo/). The screening criteria for the gene expression datasets were as follows: (1) the study type was limited to expression pro ling by array; (2) gene expression data in the intestinal mucosa of CD patients and normal individuals; (3) Each dataset contained for at least 100 samples; (4) analyzable processed data or raw data. Data preprocessing and differential gene analysis Data were preprocessed and analyzed using the R software (https://www.r-project.org/) through the following steps: (1) The probe names of each gene were converted to gene symbols, moreover, when a target gene corresponded to multiple probes, the average expression values of the probes was used to represent the expression level of the gene; (2) genes were excluded if the gene expression level was zero in more than half of the samples; (3) genes lacking expression level data for over 30% of the samples were also removed. Differential expression analysis was performed using the "limma" R package [19]. Adjusted p value < 0.05 and fold change >1.2 or fold change <-1.2 were set as the threshold for signi cant differential expression.

Immune in ltration analysis
The composition and proportion of 22 immune cells in the intestinal mucosa of CD patients and healthy individuals were estimated using the Cell-type Identi cation By Estimating Relative Subsets Of RNA Transcripts (CIBERSORT) tool in combination with leukocyte signature matrix (LM22) based on gene expression pro les of the cells [16]. The permutations (perm) of the deconvolution algorithm were set at 1000.

Construction of network and identi cation of hub genes
The coexpression network of DEGs and the in ltration of immune cells was performed as previously described [17]. First, the soft thresholding power β, to which coexpression similarity was raised to calculate adjacency, was calculated using the pickSoftThreshold function in the "WGCNA" R package.
Second, the samples were clustered to identify any obvious outliers. Third, the coexpression network was then constructed. Fourth, key gene modules were identi ed using hierarchical clustering and the dynamic tree cut function. Gene signi cance (GS) and module membership (MM) were then calculated to match modules to speci c immune cells. According to the correlation between the immune cells and ME and p value, and the module with the highest correlation coe cient and the smallest p value was selected as the most relevant module for the immune cells. Finally, the hub genes in the relevant module for the immune cells were identi ed based on MM > 0.9 and GS > 0.7.

Functional enrichment analysis
Biological process and pathway regulated by the genes in the modules were identi ed using Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) enrichment analysis via the "clusterPro ler" R package [20]. The cutoff of the q-value was set at 0.05.

Linear model and ROC curve analysis
The Best linear model for immune cell and hub genes was derived using a stepwise forward linear regression analysis. The following statistic model was developed: y=β0+β1x1+β2x2+…+βixi, where y is the proportion of immune cell, xi is the expression value of hub genes, β0 was the intercept of the regression equation, and βi is the regression coe cients.
The utility and accuracy of the hub genes for CD diagnosis were assessed by receiver operating characteristic (ROC) analysis using the "ROCR" R package [21]. The area under curve (AUC) was then calculated and screened for genes with AUC greater than 0.7.

Statistical analysis
Data were analyzed using R software (Rx64 4.0.3). Differences between two groups were analyzed using the Wilcoxon test, whereas the Kruskal-Wallis test used for multiple groups. The correlation between different immune cell subtypes to the intestinal mucosa of CD patients was performed using the Pearson correlation coe cient. Statistical signi cance was set at p < 0.05.  Tables   Table 1 Characteristics of training and validation Figure 1 The diagrammatic work ow of the preset study.    The node size re ects the gene count, and the node color re ects the P-value [−log10 (P value)]. Figure 8