3.1 Single-Cell Sequencing Analysis Reveals Cellular Composition Changes in IPF Lung Tissue
To explore the changes in cellular composition in IPF lung tissue, we collected and analyzed single-cell sequencing data from normal and IPF lung tissues. After quality control and batch effect correction, we obtained 66,500 qualified cells expressing 25,765 genes. Through unsupervised dimensionality reduction and clustering algorithms, we annotated and integrated cell types. Ultimately, we identified 16 cell clusters and visualized them using t-SNE (Figure 1C), including monocytes (CD14 and CD16), mast cells (MS4A2 and CPA3), endothelial cells (VWF), macrophages (MACRO and MRC1), NK cells (KLRD1 and CD3E), ciliated cells (FOXJ1), fibroblasts (COL1A1), myofibroblasts (COL1A1, ACTA2, and PDGFRA), type II alveolar epithelial cells (SFTPB and SFTPC), dendritic cells (MHC II), AT-1 cells (AGER), mesenchymal stem cells (CD44 and ENG), goblet cells (MUC5B), plasma cells (CD79A), T cells (CD8 and CD4), and club cells (CYP2F2). Next, we investigated the changes in cellular composition in IPF lung tissue compared to normal. We found a significant increase in monocytes and macrophages, while ciliated cells, fibroblasts, myofibroblasts, and AT-1 cells showed a significant decrease (Figure 1E). Furthermore, we analyzed differentially expressed genes for each cell type (Figure 1D), with the top 3 upregulated and downregulated genes in macrophages being C1QA/C1QB/AP0C1 and MGP/SCGB3A1/SFPTC, respectively.
3.2 Single-Cell Analysis Reveals Macrophage Heterogeneity in IPF Lung Tissues
To further explore specific changes in macrophages, we extracted macrophages and further classified them into 13 subclusters based on macrophage-specific highly variable genes (Figure 2A, B). We observed that Clusters 0, 2, 7, 8, 10, and 12 were increased in IPF lung tissues compared to normal lung tissues, with Cluster 0 identified as IPF-related macrophages (IPF-MΦ). In contrast, Clusters 1, 3, 4, 5, 9, and 13 were decreased, and interestingly, Cluster 11 was exclusively present in IPF lung tissues. We conducted a separate analysis of Cluster 11 (Supplementary Figure 1). In this cluster, we found high expression of the ATP5 gene family, including ATP5F1E, ATP5MG, ATP5MC3, ATP5MC2, SMIM25, ATP5MPL, ATP5MD, ATP5MF, ATP5F1D, etc. (Supplementary Table 1). Based on the expression characteristics of the ATP5 gene family, we named Cluster 11 as ATP5-MΦ. We analyzed the differentially expressed genes in ATP5-MΦ and generated a volcano plot (Supplementary Figure 1A). Next, we performed enrichment analysis on these differentially expressed genes. GO enrichment analysis revealed (refer to Supplementary Figure 1B) that these differentially expressed genes were mainly involved in biological processes such as proton transmembrane transport, proton-transporting two-sector ATPase complex, and proton transmembrane transporter activity. GSEA analysis showed enrichment of upregulated oxidative phosphorylation and energy production gene sets in the entire gene set (Supplementary Figure 1C), while the IL-17 signaling pathway gene set was downregulated.
To explore the gene expression data of each macrophage subtype at the single-cell level, we calculated the relative expression levels of each gene in each individual cell and assigned a score to each gene. Subsequently, we performed unsupervised clustering of these genes, forming different gene clusters (Figure 1C). We grouped genes with similar expression trends, obtaining a consensus of 13 different expression trends related to various biological functions, as shown in the clustering results in Figure 1C. We observed that genes in Cluster 1 of IPF-MΦ exhibited significantly high expression levels, and these genes were highly enriched in biological functions such as Th1 and Th2 cell differentiation, PPAR signaling pathway, pertussis, complement, and coagulation cascades, renin-angiotensin system, and regulation of fat breakdown in adipocytes, among others. The differential gene heatmap of the 13 macrophage subclusters is displayed in Figure 1D, where FOXM1 and MYC were highly expressed in IPF-MΦ, and the expression levels of Cluster 8, 9, 10, and 11 differential genes were similar. Through GSEA analysis (Figure 1E), we found an upregulated enrichment of accessory factor synthesis and p53 signaling pathway gene sets in IPF-MΦ, while oxidative phosphorylation and tyrosine metabolism gene sets showed a downregulated enrichment.
3.3 Pseudo-time trajectory analysis and cell communication analysis of IPF-associated macrophages.
We delved into the developmental trajectory of macrophages and the communication network between macrophages and other cell types in IPF through pseudo-time trajectory analysis and CellChat analysis. Pseudo-time trajectory analysis revealed the unique position of IPF macrophages on the developmental trajectory (Figure 3A). The formation of IPF macrophage cluster 10 was observed in the initial stages of the trajectory. IPF-MΦ was discovered at trajectory branches 3, 4, and 5, suggesting a specific developmental process for IPF macrophages. Subsequently, CellChat analysis was employed to further understand the intricate communication network between IPF-MΦ and surrounding cell clusters. The results indicated that IPF-MΦ exhibited outstanding communication capabilities with dense interactions with various cell types (Figure 3B). Particularly noteworthy was the intense interaction of IPF-MΦ with other macrophages through various ligand receptors, such as GRN-SORT1 and MIF-(CD74+CD44) (Figure 3C). The strength of both incoming and outgoing interactions of IPF-MΦ was higher compared to other macrophages (Figure 3D). Signal pathways like MIF, ANNEXIN, GALECTIN, and VISFATIN showed similar patterns in both incoming and outgoing signals, representing the utilization of similar signaling transduction pathways for incoming or outgoing signals. The pathways with the highest incoming and outgoing signal strengths for IPF-MΦ were MK and cxcl signaling pathways.
Further differential analysis revealed that, compared to other macrophages, IPF-MΦ exhibited significantly enhanced communication with myofibroblasts, ciliated cells, AT2, epithelial cells, and fibroblasts (Figure 3E). This enhanced communication might be closely related to the emergence of the ANXA1-FPR2 and NAMPT-(ITGA5+ITGB1) signaling pathways in IPF-associated macrophages (Figure 3C). The activation of these signaling pathways might play a crucial role in the pathogenesis of IPF, influencing interactions and communication between different cell types, and thereby affecting inflammation, immune responses, and potential fibrotic processes. Additionally, we analyzed the metabolic levels of macrophage subtypes (Figure 3F-G). Compared to other macrophage subtypes, Cluster 9 and 11 showed lower levels of nitrogen metabolism and inositol phosphate metabolism, while Cluster 10 exhibited lower levels of glycolysis and gluconeogenesis. These differences in metabolic levels may be related to the functional distinctions among macrophage subtypes in IPF.
3.4 hdWGCNA Reveals Module Hub Genes in IPF-MΦ
To unravel the potential functions of the IPF macrophage subtype, we employed high-dimensional weighted gene co-expression network analysis (hdWGCNA). We chose a power of 8 to construct a scale-free network, resulting in 12 gene modules (Supplementary Figure A, Figure 4A-C). Interestingly, the purple, green, and yellow modules exhibited substantial expression in cluster 2 and cluster 5 macrophages, while the blue, magenta, and yellow-green modules were highly expressed in IPF-MΦ and cluster 2 macrophages. We visualized protein-protein interaction (PPI) networks for each module based on the top 10 hub genes within the 12 gene modules (Supplementary Figure 1B, Figure 4D).
Further exploration of the inter-module relationships (Figure 4F) revealed a robust positive correlation between the red module and the blue and green-yellow modules. Moreover, the brown, red, and tan modules exhibited a preference for expression in IPF-MΦ and displayed significant correlations (Figure 4B and 5E). We identified the genes within the brown, red, and tan modules as characteristic genes of IPF-MΦ and constructed PPI networks using the hub genes from each module (Supplementary Figure 1C).
3.5 Various Machine Learning Algorithms Reveal Signature Genes for IPF-MΦ
To further explore the relationship between hub genes of IPF-MΦ gene modules and the onset and progression of IPF, we constructed predictive models using the hub genes from the brown, red, and tan modules. Employing various machine learning methods to reduce the false-positive rate of screening results, we conducted a detailed analysis of the RNA dataset (GSE32537). Initially, through the LASSO regression algorithm, we successfully identified 26 key genes closely associated with the prognosis of IPF patients (Figure 5A-B). Subsequently, these genes were ranked using random forest analysis (Figure 5C), and the top 30 genes in importance were extracted. Using SVM-RFE methodology, we evaluated all genes through 10-fold cross-validation and obtained the top 30 important genes based on their average rankings (Figure 5D). Finally, we generated a Venn diagram to identify overlapping genes from the three machine learning methods (Figure 5E), resulting in nine significant genes, including FAM174B, PMP22, ATF4, DLD, ELOB, CTDP1, SV2B, USP10, and PHACTR1, all closely associated with the prognosis of IPF patients.
3.6 Relationship Between IRMG and the Immune Microenvironment
To elucidate the relationship between the gene characteristics of IPF-MΦ and the immune microenvironment, we employed CDF curve analysis to determine the effectiveness of this algorithm in patient grouping (Figure 6A). The apparent inflection points on the curve at different values suggested optimal algorithm performance when patients were divided into two groups. Subsequently, for a more comprehensive understanding of the biological characteristics of these two subtypes and their roles in the development of IPF, we utilized the differentially expressed IPF-MΦ characteristic genes (IRMG) and the NMF algorithm to cluster IPF patients into two subtypes (see Figure 6B). We then assessed IRMG scores and the Diffusion Capacity of the Lungs for Carbon Monoxide (DLCO) index for the two subtypes (see Figure 6C-D). It was observed that compared to Type 1 patients, Type 2 patients exhibited significantly higher IRMG scores and a markedly lower DLCO index, indicating poorer lung function in patients with higher IRMG scores. Additionally, we analyzed the immune microenvironment of the two subtypes (see Figure 6E-G). In comparison to Type 1 IPF patients, Type 2 IPF patients showed a significant increase in the proportions of macrophages, γδ T cells, plasmacytoid dendritic cells, central memory CD8 T cells, type 17 T cells, effector memory CD4 T cells, and monocytes. Conversely, the proportions of activated B cells, activated CD4 T cells, and activated CD8 T cells were significantly elevated. These findings suggest distinct immune response patterns between Type 1 and Type 2 IPF patients, with Type 2 IPF patients potentially exhibiting a more activated and inflammatory immune state.
Furthermore, these nine genes exhibited a strong correlation with immune cells. It is noteworthy that different genes showed varying correlations with immune cells, and even opposite results were observed. For example, FAM174B exhibited a significant negative correlation with most immune cells, while UPP1 showed a significant positive correlation with most immune cells. We posit that this may be related to the different roles these genes play in various immune regulatory pathways.
3.7 Machine Learning to Build a Predictive Model
Based on the identified IFMG through machine learning, we constructed a predictive model for predicting the onset and progression of IPF in patients. To find the most optimal algorithm, we employed seven machine learning algorithms from the mlr3 package. When evaluating the performance of these models, using the Area Under the Curve (AUC) as the metric, the Support Vector Machine (SVM) machine learning algorithm exhibited the highest accuracy, sensitivity, and specificity (Figure 7A-B). Subsequently, we validated the performance of the predictive model using the GSE110147 and GSE70866 datasets, yielding AUC values of 0.893 (Figure 7C) and 0.951 (Figure 7D), indicating that the model can effectively distinguish between high-risk and low-risk IPF patients. This series of work has yielded more effective clinical diagnostic markers and provided important clues for future diagnostic and therapeutic research.
3.8 FAM174B, PHACTR1, DLD, and ATF4 identified as genes influencing the prognosis of IPF patients
To confirm the association of these nine genes with the prognosis of IPF patients, we subjected these significant genes to univariate Cox regression analysis and Kaplan-Meier (KM) survival analysis. The Cox regression analysis results (Figure 8A) indicated that FAM174B, PMP22, ATF4, DLD, ELOB, SV2B, USP10, and PHACTR1 were associated with a higher risk of prognosis, while CTDP1 was linked to a lower risk. In KM survival analysis, we observed that DLD, PHACTR1, ATF4, FAM174B, PMP22, and USP10 were correlated with patient prognosis (Figures 9B-G). Finally, ROC curves illustrated (Figures 9H-K) that FAM174B (AUC=0.7762, P=0.0221), PHACTR1 (AUC=0.9021, P=0.0009), DLD (AUC=0.8741, P=0.0019), and ATF4 (AUC=0.7622, P=0.0298) might serve as valuable biomarkers.