Profiles of DEGs
Online R Three datasets [GSE46960 [14] , GSE69948 and GSE122340 [15]] were downloaded from the GEO database. GSE46960 was used as a training dataset, while GSE69948 and GSE122340 were used for validation. 150 DEGs were identified by "limma" R package, satisfying the criterion of |log2FC| ≥ 1.0 and FDR <0.05 from GSE46960 with 64 BA liver tissue samples and 14 other cholestatic diseases (Fig. 1A). The volcano map showed the DEGs with high and low expression (Fig. 1B). PCA suggests that DEGs can strongly distinguish the BA and DC (Fig. 1C-D).
Function and Pathway Enrichment Analyses
The new Studies have indicated that the immune microenvironment plays a crucial role in the progression of BA. DEGs were significantly enriched in immune-related GO terms, including leukocyte chemotaxis, cell chemotaxis, granulocyte chemotaxis, and cytokine activity (Fig. 2A). Moreover, according to the KEGG pathway enrichment results, these genes were enriched in the IL-17 signaling pathway, and TGF-beta signaling pathway (Fig. 2B). Besides, GSEA enrichment suggests that Interleukin-10 signaling, Interleukin-4, and Interleukin-13 signaling, and chemokine receptors bind chemokines were covered (Fig. 2C). Overall, these results suggested that the DEGs were significantly enriched in immune-related functions, implying that there are differences in the immune microenvironment during the progression of BA and DC. Similarly, GSVA enrichment analysis also revealed significant differences in BA and DC immune-related pathways (Fig. 2D).
Construction and Validation of Risk Model
Due to the apparent enrichment of DEGs to the immune pathway, we used correlation analysis to identify 133 highly expressed DEGs that were strongly associated with immune genes based on P <0.01 and Cor ≥0.8 (Fig. 3A). PCA showed that these 133 immune-related genes could be well distinguished between BA and DC (Fig. 3B). Next, we performed LASSO regression analysis to screen covariates, then applied 10-fold cross-validation with minimum criteria to obtain an optimal λ. The final value of λ yielded a minimum cross-validation error. Consequently, a total of 8 DEGs, namely EDN1, HAMP, SAA1, SPP1, ANKRD1, MMP7, TACSTD2, and UCA1, were identified (Fig. 3C-D). Risk scores = 0.06572961×EDN1+ 0.21265551×HAMP+ 0.32723253×SAA1+ 0.15779739×SPP1+ 0.82666376×ANKRD1+ 0.09991172×MMP7+ 0.11985278×TACSTD2+ 0.08895487×UCA1.
Establishment of PPI and ceRNA networks
Construction of PPI networks with 133 immune-related genes for probing gene to gene interactions (Fig. 4A). Through the HMDD database and PubMed search, we identified 25 miRNAs associated with biliary atresia, including hsa-miR-92a-3p, hsa-miR-222-3p, hsa-miR-34a-5p, hsa-miR-142-5p, hsa-miR-150-3p and others (Online Resource 3). The mirTarBase database predicted the mRNAs controlled by these 25 miRNAs. After that, we hybridized the target genes of each miRNA with 133 immune-related genes (Online Resource 3). We identified two lncRNAs, H19 and MEG3, associated with biliary atresia in the MNDR v3.1 database. We searched for the relationship between lncRNAs and miRNAs associated with biliary atresia using the online tool LncBase Experimental v.2. Finally, we created the ceRNA regulatory network using Cytoscape software (Fig. 4B). The mRNA, miRNA, and lncRNA were represented by blue circle nodes, red triangle nodes, and pink diamond nodes, respectively.
Hub Gene of immune-related model
The 8 genes of the immune-related model were analyzed using Cytoscape, and hub genes were extracted through the plugin CytoHubba with the DEGREE scored system. Located at the center of the network, EDN1 received the highest score (Fig. 4C).
The ROC curve validation of the model
The risk scores demonstrate a strong ability to distinguish BA from DC with AUC values were 0.975 in the training group and 0.775 in the external validation group (Fig. 5A-D). In the GSE46960, the AUC values of the eight genes in the model were 0.881, 0.741, 0.764, 0.857, 0.940, 0.880, 0.884, 0.897 (Fig. 5B-C). To further demonstrate that model scores differentiate BA and DC more strongly than other indicators, our ROC curves incorporated scores, age at diagnosis, bilirubin, ALT, GGT, and sex, with AUC values of 0.984, 0.533, 0.659, 0.573, 0.784, 0.655 (Fig. 5E). Similarly, the violin plot demonstrates the difference in the expression of BA and DC of the eight genes and score in the GSE46960 dataset (Fig. 6A-I).
Immune infiltration analysis
Among CIBERSORTx was used to determine the composition of immune cells of BA and DC livers from the GSE46960 dataset. Among the 22 immune cell types, seven were significantly different between BA and DC (Fig. 7A). Macrophages M0, Neutrophils, T cells CD4 memory resting were elevated in BA liver compared with DC. Dendritic cells resting, NK cells activated, T cells CD8, and T cells regulatory were decreased. We found that NK cell activation and T cells regulatory were significantly inhibited in BA, while macrophage was significantly activated, which was similar to the original studies [17].
Relationship of the score and hub gene with immune cells
To verify whether the scoring model can fully explain the results of immune infiltration, we performed a correlation analysis of the scores and genes with immune cells (Fig. 7B). Score and T cells CD4 memory resting showed a positive correlation with r=-0.567 (p<0.001), and Neutrophils also showed the same positive correlation with r=-0.505 (p<0.001). Score and T cells CD8 showed a negative correlation with r= -0.573 (p<0.001), and T cells regulatory also showed the same negative correlation with r= -0.419 (p<0.001). However, the score did not show a correlation with macrophages and dendritic cells, and we suggest that the score may be more influenced by the amount of ANKRD1 gene expression. EDN1 and Macrophages M0 showed a strong positive correlation with r=0.309 (p<0.01) (Fig. 7C). EDN1 and T cells regulatory showed a strong negative correlation with r=-0.262 (p<0.05) (Fig. 7D). This suggests that EDN1 gene activation may promote more macrophage infiltration, as well as inhibit T cells regulatory infiltration. Overall, our model was able to explain well the difference between BA and DC immune infiltration.
Exploration of EDN1 downstream pathways and validation
The GSVA results were queried through the MsigDB website indicating that the TNFA_SIGNALING_VIA_NFKB pathway activated the expression of the EDN1 gene. We next explored the downstream pathway for this gene. GSE46960 had a total of 31 differential genes, all of which were highly expressed, between the EDN1 high and low expression groups (Fig. 8A). GSEA enrichment analysis showed that EDN1 was mainly involved in biological processes such as ECM-receptor interaction and Mucin type O-glycan biosynthesis (Fig. 8B). GSE122340 had a total of 28 different genes between the EDN1 high and low expression groups, and these had 25 genes with high and 3 genes with low expression (Fig. 8C). GSEA enrichment analysis showed that EDN1 was mainly involved in biological processes such as ECM-receptor interaction and Focal adhesion (Fig. 8D). This suggests that EDN1 is involved in the fibrosis process and cell-cell interactions.