Differential expression genes identification in AS and CD
Based on the AS dataset GSE43292, 727 differential genes (DEGs) were found using the limma R package. The volcano plot depicts the detected DEGs, of which 418 are upregulated and 309 are downregulated (Fig. 2A). The heatmap demonstrates DEGs (Fig. 2C). Besides, the CD dataset GSE186582 obtained a total of 1294 DEGs, with 651 genes identified to be upregulated and 643 identified to be downregulated (Fig. 2B). Heatmaps of DEGs are shown in Fig. 2D. A total of 24 common upregulated DEGs and 10 common downregulated DEGs of GSE43292 and GSE186582 were observed (Fig. 2E-F).
Functional enrichment analysis
To further clarify the biological mechanisms and signaling pathways associated with common genes, GO and KEGG enrichment analyses were conducted. According to the GO analysis (Fig. 3A), common DEGs were considerably enriched in the following biological processes (BP): leukocyte chemotaxis(GO:0030595), positive regulation of vasculature development(GO:1904018), and leukocyte migration(GO:0050900). The cell component (CC) analysis revealed that DEGs were predominantly located in the collagen-containing extracellular matrix(GO:0062023), clathrin-coated vesicle membrane(GO:0030665), and glycoprotein complex(GO:0090665). In the molecular function(MF) category, the DEGs were mainly enriched in cytokine receptor binding(GO:0005126), signaling receptor activator activity(GO:0030546), and cytokine activity(GO:0005125). KEGG analysis(Fig. 3B) results revealed that common genes were mainly associated with cytokine-cytokine receptor interaction(hsa04060), IL-17 signaling pathway(hsa04657), Wnt signaling pathway(hsa04310), and PI3K-Akt signaling pathway(hsa04151).
PPI network construction and candidate genes selection
Using the string database, PPI analysis was conducted on these 34 common DEGs. After the removal of DEGs with poor interaction (n = 17), 17 DEGs were maintained (Fig. 4A). In addition, two important modules were selected through MCODE(Fig. 4B). CXCL8, AQP9, CSF3R, S100A8, TREM1, IL1RN, and S100A9 were the hub nodes in module A. TNFSF13B, CD69, PRDM1 ,and IL7R were the hub nodes in module B. Finally, 11 DEGs were selected using MCODE as candidate genes for further machine-learning analysis.
Identification of potentially common hub genes
Two distinct machine learning algorithms were used to select DEGs for ROC assessment. Figure 5A-B depicts the Lasso regression results. From the 11 DEGs, 3 DEGs with the lowest binomial deviation were identified. According to the SVM-RFE results, the top 7 DEGs had the best accuracy and lowest error in choosing potential biomarkers (Fig. 5C-D). Then, a Venn plot showed that 3 hub DEGs (IL1RN, PRDM1, TNFSF13B) were intersected(Fig. 5E). The details of hub genes are shown in Table 2. Next, ROC curves were prepared for 3 hub genes to evaluate the capability of individual genes to distinguish AS samples from controls. As indicated in Fig. 5F, the AUC of all hub genes was ༞0.7. Accordingly, a logistic regression model was established based on the 3 hub genes, and subsequent ROC curve analysis results showed that the AUC of the model was 0.873 (Fig. 5G).
Table 2
Gene symbol | Official full name | Function |
IL1RN | Interleukin 1 Receptor Antagonist | The protein encoded by this gene is a member of the interleukin 1 cytokine family. This protein inhibits the activities of IL1α and IL1β and modulates a variety of IL1-related immune and inflammatory responses, particularly in the acute phase of infection and inflammation. |
TNFSF13B | TNF Superfamily Member 13B | The protein encoded by this gene is a cytokine that belongs to the tumor necrosis factor (TNF) ligand family. This cytokine is expressed in B cell lineage cells and acts as a potent B cell activator. It has been also shown to play an important role in the proliferation and differentiation of B cells. |
PRDM1 | PR/SET Domain 1 | This gene encodes a protein that acts as a repressor of beta-interferon gene expression. The protein binds specifically to the PRDI (positive regulatory domain I element) of the beta-IFN gene promoter. |
Single-gene GSEA of hub genes
A single-gene GSEA analysis was carried out to further probe the possible mechanism of the hub genes. Figure 6A-C depicts the top 6 pathways that were enriched for each hub gene. The GSEA analysis showed that the 3 hub genes were found most significantly enriched in “CYTOKINE CYTOKINE RECEPTOR INTERACTION”, “T CELL RECEPTOR SIGNALING PATHWAY”, “TOLL LIKE RECEPTOR SIGNALING PATHWAY”, and “CHEMOKINE SIGNALING PATHWAY” in AS. These signature gene sets were identified as being close to the occurrence and development of CD. This showed that the 3 hub genes were linked to AS in CD patients.
Immune infiltration analysis and correlation with hub genes
As the enrichment analysis suggested that immunity plays an important role in the development of AS and CD, we evaluated whether unique patterns of immune infiltration could be detected using the CIBERSORT approach based on the 22 categories of immune cells. First, we examined the immune cell infiltration composition in the AS dataset (GSE43292) and the CD dataset (GSE186582). The Violin diagram revealed substantial changes in AS and control samples in plasma cells, CD8+ T cells, activated NK cells, M0 macrophages, and M2 macrophages (Fig. 7A). In the AS sample, CD8+ T cells and activated NK cells were much lower than in the normal sample, but plasma cells, M0 macrophages, and M2 macrophages were significantly higher. In addition, we conducted CIBERSORT in the CD dataset. Memory B cells, follicular helper T cells, M0 macrophages, and M2 macrophages were found to be decreased in patients, whereas resting memory CD4+ T cells, activated dendritic cells, and neutrophils were found to be increased (Fig. 7B).
However, changes in immune cell composition proportions are only one part of AS and CD's shared pathogenesis. We still need to see if these 3 hub genes are linked to immunity. As a result, Pearson correlation analysis was employed to evaluate the relationships between hub genes and immune cells in AS(Fig. 8A). We observed that IL1RN had a significant positive correlation with M0 macrophages and activated mast cells and a significant negative correlation with activated dendritic cells, resting dendritic cells, resting mast cells, monocytes, resting memory CD4 + T cells, and CD8+ T cells. PRDM1 had a significant positive correlation with M0 macrophages and a significant negative correlation with resting dendritic cells, and activated NK cells. TNFSF13B had a significant positive correlation with resting memory CD4 + T cells and gamma delta T cells and a significant negative correlation with regulatory T cells.
We also discovered some links between hub genes and immune cells in CD (Fig. 8B), indicating that immune cells are involved in the pathogenesis of CD.
Validation of hub genes in external datasets
Finally, we validated the expressions of the hub genes in the GSE28829 and GSE102133 datasets. The results showed that the expression tendencies of 3 hub genes were consistent with the GSE28829 dataset and GSE102133 datasets in AS and CD(Fig. 9).