Differentially expression analysis between normal tissue and IPF tissue
In the present study, the gene expression profile dataset GSE32537 and GSE110147 were employed as the training dataset and external dataset. To identified differentially expressed genes between the IPF tissue and normal tissue, we conducting a limma analysis with a screening criterion: |log2FC| >1 and Adjusted p value < 0.05. As a result, a total of 365 DEGs were identified, including 119 down-regulated genes and 246 up-regulated genes (Figure 1A-B). Among these genes, we also discovered some of genes that have been found to be associated with IPF were presented in our result, such as KRT5, BPIFB1 andAGER.The GO enrichment analysis result revealed that the genes involve in response to growth factor, blood vessel development, cell-substrate adhesion and epithelial cilium movement and cell adhesion molecule binding. The pathway enrichment analysis showed that genes in IPF mainly involved in ECM-receptor interaction, Cytokine-cytokine receptor interaction and Protein digestion and absorption.
Protein-Protein network construction
To constructed a protein-protein network, we uploaded all DEGs to the string database and set median confidence to 0.400. We then obtained 1037 edges and 308 nodes (Figure 2A).The hub modules was identified by using the MCODE algorithm with the degree more than 2. AS a result, we selected the top two modules with 37 genes including DNAH3, HYDIN, RSPH1, DRC1, SERPIND1, MATN3, TNC, CDH2, GOLM1, LTBP1, DNAH11, CP, DNAH10, DNAH9, DNAH12, DNAH6, DNAAF1, RSPH4A, DNAH7, WDR78, ARMC4, WDR63, DNAH5, DNAI1, FGA, FGG, THBS1, COL1A1, VCAM1, COL1A2, POSTN, COMP, CXCL12, MMP13, CTSK, SERPINE1, COL3A1 (Figure 2B).
Selection of the key genes
In order to narrow down the selection of the genes from the two hub modules and obtained more reliable key genes, we first performed LASSO analysis to identify 13 genes for IPF (Figure 3A-B). The SVM-RFE algorithm were also applied to identified 29 genes for IPF (Figure 3C). The potential gene markers characterized by the two algorithms overlapped, and finally 13 key genes were selected including SERPIND1, CDH2, CP, WDR63, DNAH5, FGG, THBS1, VCAM1, COL1A2, POSTN, CXCL12, MMP13, SERPINE1 (Figure 3D). The expression of these genes were further identified by another external dataset GSE110147 dataset. As shown in Figure 4, we observed that most of key genes were presented a significant divergence, including SERPIND1, CDH2, CP, WDR63, DNAH5, VCAM1, COL1A2, POSTN, MMP13, suggesting that the robusticity and reliability of our genes.
Immune Cell Infiltration analysis
The 22 types of immune cell infiltration level was calculated using the CIBERSORTx algorithm and the sample were retained with the p value < .05. The correlation heatmap of 22 immune cells revealed that activated NK cells was most positively correlated with Plasma cells, while resting NK cells was most negatively correlated with activated NK cells (Figure 5A). The violin plot of the immune cell infiltration divergence showed that Plasma cells, CD4 memory resting T cells, regulatory T cells(Tregs), resting Mast cells were significantly high expressed in the IPF tissue, while CD4 naive T cells, Monocytes, Neutrophils were significnatly high expressed in the normal tissue (Figure 5B). Moreover, we also evaluate the relationship between key genes and immune cells. Figure 6 showed that resting Mast cells, Monocytes and Neutrophils were most correlated with all key genes, suggesting that these key genes may regulate the function of resting Mast cells, Monocytes and Neutrophils.
CMAP analysis
To identify the potential molecular drugs of IPF, the DEGs were uploaded to the Connectivity Map (CMAP) database and matched them with small molecule therapy. The Table 1 listed the potential enriched molecular drugs and corresponding enrichment values with the enrichment value that < -0.6. The molecular drugs including clorsulon, thiamine, AH-6809 and altizide were the most enriched in the analysis. Considering the negative enrich potential small molecule drugs could reverse the gene expression induced by IPF, we further exploited the molecular drugs with enrichment value that < -0.6 to the downstream analysis. The mechanism of action for the potential drugs were analyzed and mainly enriched in ACE inhibitor, rilmenidine, Adrenergic receptor agonist, Adrenergic receptor antagonist, Antiarrhythmic, Cyclooxygenase inhibitor, Dopamine receptor antagonist, GABA receptor antagonist, Glutamate receptor modulator and Thiazide diuretic (Figure 7).