3.1 Screening of DEGs between schizophrenia and Crohn's disease
Using the Limma package, 2681 DEGs in total were identified in the schizophrenia data set (GSE92538); 1299 of these were up-regulated, and 1382 were down-regulated (Fig. 2A, 2B). While 3235 DEGs were identified in the Crohn's disease data set (GSE36807), of which 1464 were up-regulated, and 1771 were down-regulated (Fig. 2C, 2D).
3.2 Modular genes selection and weighted gene co-expression network analysis
A cluster tree diagram of Crohn's disease and the control group was created using the soft threshold of the study (β = 8) (Fig. 3A). Based on this, 30 gene co-expression modules (GCM) were constructed (Fig. 3B, 3C), and the association between Crohn's disease and GCM was demonstrated (Fig. 3D). It was observed that the sienna3 module (359 genes) had the maximum correlation with Crohn's disease (correlation coefficient = 0.65, p = 1.8e-3). The correlation between module members and gene significance in the sienna 3 module of schizophrenia was also calculated, and a considerable positive correlation was observed (r = 0.61) (Fig. 3E).
3.3 Functional enrichment analysis of Crohn's disease
The function enrichment of the sienna 3 gene in Crohn's disease was analyzed. KEGG analysis revealed enrichment of CGs mostly in the "meta pathways", "Proteasome," and other pathways (Fig. 4A). The GO analysis revealed that CGs were mainly found in the "vascular", "endomembrane system", "extracellular space", and "extracellular region " of the cell components (CC) (Fig. 4B). The key biological processes (BP) of CGs included the small molecule metabolic process and the small molecule biological process (Fig. 4C). Based on molecular function (MF), it was observed that the key components in CGs were "catalytic activity" and "identity protein binding" (Fig. 4D). The results of these analyses revealed that Crohn's disease was mainly related to metabolism and the immune system, which was similar to schizophrenia.
3.4 Construction and function enrichment analysis of protein-protein interaction network of intersection genes of schizophrenia and Crohn's disease
Initially, 210 major genes related to Crohn's disease were obtained by cross-screening Crohn's disease module genes with DEGs, followed by the cross-screening of schizophrenia DEGs and Crohn's disease-related genes through the Venn diagram, and 35 related genes were obtained (Fig. 5A). The function enrichment analysis of these candidate genes revealed that CGs were mainly enriched in "meta pathways" and "Rap1 signaling pathways" (Fig. 5B). GO analysis revealed that in terms of CC, CGs were mainly found in the "endomembrane system" and "organelle subset" (Fig. 5C). The key BPs of CGs included "vascular mediated transport" and "small molecule metabolic process" (Fig. 5D). Based on the molecular function (MF), "cell adhesion molecule binding" and "cadherin binding" were the most important items in CGs (Fig. 5E). By comparing the above results, it was revealed that there was a major correlation between schizophrenia and Crohn's disease, and both of them were related to metabolism and the immune system. The 35 candidate genes were analyzed through the String database, and it was found that 20 of these genes were related (Fig. 5F).
3.5 Screening candidate genes through machine learning and construction of an artificial neural network
LASSO regression was used to screen candidate genes, and 22 potential biomarkers were identified from these results (Fig. 6A, 6B). The RF regression analysis was also done to screen candidate genes, and 17 potential candidate biomarkers were displayed ultimately (Fig. 6C). The screening results from the two previously mentioned machine learning techniques were cross-analyzed with 20 candidate genes in the PPI network, and finally, 7 candidate genes (CAP1, INSIG1, MSMO1, PHLDA2, PSMB6, TBC1D2, UBA5) were obtained (Fig. 6D). These seven genes were used to construct neural networks, and the results revealed that these seven candidate genes could well differentiate schizophrenia samples and control samples (Fig. 6E). The expression profile analysis of seven candidate genes was evaluated, and the results indicated a considerable variation of the candidate genes between the schizophrenia and the control groups (Fig. 6F).
3.6 Construction and verification of diagnostic model
A forest map of the seven potential candidate genes was established (Fig. 7A), and the AUC and 95% CI (AUC 0.84, CI 0.90 − 0.78) of the forest map were also calculated. The ROC curves were plotted to assess their specificity and sensitivity (Fig. 7B). In order to further verify the model, a forest map of candidate genes in the test group (GSE21935) was established, and their ROC curves (AUC 0.78, CI 0.93 − 0.64) were plotted (Fig. 7C, 7D). The results of the test group showed that the model had certain significance for the diagnosis of schizophrenia. The GSEA analysis of the seven candidate genes was carried out, and the results showed its correlation with metabolism and immunity (Fig. 8A-F).
3.7 Drug prediction
Nine drugs with the highest combined score related to candidate genes (Valproic acid, minocycline, tetrandrine, norcyclobenzaprine, loperamide, protriptyline, rescinnamine, maprotiline, and trifluoperazine) were selected through Enrichr database (Table 2), and the intersection of the candidate genes and drugs was visualized for subsequent analysis (Fig. 9).