Screening the proteome for ILD causal proteins
After applying Bonferroni correction (P < 2.64 × 10-5), our primary MR analysis revealed significant associations between ILD and four plasma proteins, with IPF displaying similar associations. Notably, BRSK2(Serine/threonine-protein kinase BRSK2), LRRC37A2(Leucine-rich repeat-containing protein37A2), and ADAM15(A disintegrin and metalloproteinase 15) exhibited shared associations, while CDH15(Cadherin-15)showed no significant association with IPF. In contrast, the results for sarcoidosis identified four candidate proteins that were entirely distinct from those associated with ILD and IPF after Benjamini-Hochberg correction: VEGFB(vascular endothelial growth factor B ), ANXA11(annexin A11), LTBR(Lymphotoxin-beta receptor), and ANGPTL4(Angiopoietin-related protein 4). Furthermore, no significant association was found between SAD-ILD, allergic bronchopulmonary aspergillosis (ABPA), and any proteins in the study.
Specifically, the Wald ratio indicated that the genetically predicted increase in CDH15 was associated with an elevated risk of ILD, with an odds ratio (OR) of 1.32 (95%CI 1.16-1.49; P = 1.09 × 10−5). Similarly, elevating the expression levels of BRSK2(OR=1.30; P=1.70 × 10−13)also increased the risk of ILD, while increasing the expression levels of ADAM15(OR=0.86,95% CI 0.81-0.92; P=1.59 ×10−6)and LRRC37A2(OR=0.82; P=7.40 × 10−8)decreased the risk of ILD. Similarly, the odds ratio (95% CI) of IPF per standard deviation increase in genetically predicted levels of protein was 1.40 (1.26-1.55) for BRSK2, whereas 0.81 (0.75–0.89) for ADAM15, 0.74 (0.66–0.82) for LRRC37A2. For sarcoidosis, another subtype of ILD, higher plasma levels of LTBR were associated with a significantly increased risk of developing sarcoidosis, with on average 39% increased risk(OR=1.39; p=9.38×10−6), while ANGPTL4 (OR=0.72; p=1.68×10-5), VEGFB (OR=0.21; p=3.49×10-13) and ANXA11 (OR=0.16; p=1.09×10-7) were associated with decreased risk, respectively(refer to Table 1 and Fig.2).
Furthermore, to explore specific circulating proteins for the non-IPF F-ILD subtype and non-IPF ILD subtype, we also obtained datasets labeled "Other interstitial pulmonary diseases with fibrosis" and "Other interstitial pulmonary diseases" from the UK Biobank and GWAS Catalog databases for MR analysis. Interestingly, we observed that KLRF1 had a relatively low p-value (p = 3.10 × 10-5) with the non-IPF F-ILD dataset from the UKB, although it exceeded the threshold of P < 2.64 × 10-5. Nevertheless, we deemed it necessary to report this finding. Furthermore, no other significant results were found. Additionally, some previously described biomarkers for IPF(8), specifically FUT3_FUT5, were also evaluated in this MR study (the protein quantification method in the UKB-PPP study categorized FUT3 and FUT5 under the same cis-pQTL) (refer to Table 1a).
1Table 1a-1b can be placed here.
External validation of potential drug targets
To replicate the preliminary findings across various outcome GWAS datasets, we adopted both the same-variant and significant-variant approaches(Additional file 1: Table S3). Notably, significant associations for BRSK2 were identified in both ILD and IPF GWAS Catalog cohorts using two strategies. Specifically, employing independent significant pQTL documented by Zhang et al. as genetic instruments(15), elevated expression of BRSK2 was associated with an increased risk of ILD(OR=2.43; 95%CI 1.59-3.72; p=4.37×10-5). For ADAM15 and CDH15, significance was only in the ILD GWAS Catalog cohort using the same-variant strategy (P < 0.05). As no other independent significant pQTLs were for LRRC37A2, the original pQTL continued, still significantly associated in both external cohorts for ILD and IPF. Furthermore, LTBR showed significant associations in the sarcoidosis UK Biobank cohort using the significant-variant strategy, while VEGFB and ANXA11 showed associations using the same-variant strategy. However, ANGPTL4 did not yield notable results upon external validation (Fig.3).
The figures visually show the differences for specific proteins in OR and 95%CI between the primary and replicated analyses. Therefore, we conducted heterogeneity tests separately for the MR analysis results of each protein and found that almost all exhibited high heterogeneity (I2>75%) (Additional file 2: Table S1). Given the larger sample size and higher credibility of evidence from the primary analysis, we relied on it for the risk ratio of proteins to the disease.
Sensitivity analysis for causal proteins
Firstly, in the bidirectional MR analysis, we observed a genetically predicted significant inverse causal association between ILD and IPF and BRSK2(ILD:βIVW=0.145,p=2.28×10-4; IPF: βIVW=0.083,p=2.56×10-3), Further Q-tests and pleiotropy tests revealed evidence of heterogeneity(IPF: pQ=1.15×10-27, ppleiotropy=0.01; ILD: pQ=1.15×10-18, ppleiotropy=0.03). Leave-one-out analysis indicated that rs35705950 drove this effect. After removing this genetic instrument variable, no reverse causal association or heterogeneity was observed. In addition to BRSK2, sarcoidosis also demonstrated a reverse causal association with ANGPTL4 (βIVW=-0.03, p=0.016), and the absence of heterogeneity(pQ=0.12, ppleiotropy=0.86) further reduced the likelihood of false-positive results and suggested that this reverse causal effect was not contributed by individual SNP. Furthermore, ILD or its subtypes had no observed causal effects on the remaining six proteins. Steiger filtering further ensured the directionality of the associations (Table 1, Additional file 1: Table S4 and Additional file 2: Fig.S2-3).
Secondly, Bayesian colocalization based on pQTLs strongly indicated that ADAM15(coloc.abf-PPH4=0.997)and CDH15(coloc.abf-PPH4=0.863)share the same variants with ILD, and IPF also shares the same variants with ADAM15 (coloc.abf-PPH4=0.966). Additionally, three out of the four proteins preliminarily identified for sarcoidosis—VEGFB(coloc.abf-PPH4=0.907), LTBR(coloc.abf-PPH4=0.947), and ANGPTL4(coloc.abf-PPH4=0.957)—received robust support by genetic colocalization analysis (PPH4> 80%) under standard priors and window(± 1Mb). On the other hand, BRSK2, LRRC37A2, and ANXA11 did not share the same variants with ILD or its subtypes, suggesting that the causal associations of these proteins were likely driven by different SNPs within their respective genomic regions (Table1, Fig.4 and Additional file 2: Fig.S4). This phenomenon highlighted the possibility of LD confounding even with cis-pQTL.
Lastly, to mitigate bias stemming from horizontal l pleiotropy, we discovered significant associations between VEGFB (rs660442) and ANGPTL4 (rs2278236) with suspected risks of sarcoidosis through "Phenoscanner" and "LDlink" (P < 5 × 10-8). Specifically, rs660442 was associated with rheumatoid arthritis (RA), and previous clinical observational studies reported a considerable proportion of sarcoidosis cases accompanied by RA(33). However, Mendelian randomization studies have not confirmed this association. Considering that both mechanisms may be immune-related, we highly suspect RA is one of the risk factors for sarcoidosis and that RA may act as a confounding factor for the positive association between VEGFB and sarcoidosis. Additionally, rs2278236 was found to be associated with high-density lipoprotein cholesterol levels. Previous epidemiological and genetic studies have reported abundant genetic overlap between lipid metabolism and immune-related diseases(34). Therefore, we cannot dismiss the possibility of false-positive results stemming from these two confounding factors. Apart from these specific findings, we did not identify any other proxy instruments, such as ADAM15 (rs11589479), associated with known risk factors for ILD or its subtypes (Table 1b).
Based on the evidence outlined above, we categorized preliminary identified proteins into priority and sub-priority groups. Three proteins (ADAM15, CDH15, and LTBR) passed all tests, making them priority candidates for further exploration as plasma protein markers and potential drug targets. Proteins that passed colocalization analysis but failed other sensitivity analyses, such as ANGPTL4 and VEGFB, or proteins that failed in colocalization analysis, such as LRRC37A2, BRSK2, and ANXA11, are classified as sub-priority.
PPI and druggability evaluation on the potentials of therapeutic targets
The PPI network suggests potential associations of the priority proteins (Additional file 1: Table S5). Specifically, using STRING (Version12.0), a tool that evaluates the degree of association between proteins based on text mining, co-expression, experimental evidence, and other methods, we found that CDH1, associated with ADAM15, and CDH2, associated with CDH15, were closely related in previous studies related to pulmonary fibrosis.
We attempted to discover a causal association between ADAM15 and CDH15 through single-sample MR analysis using GWAS data from these proteins in the UKB-PPP dataset. We found that ADAM15 has a causal association with CDH15(IVW: p=0.043), while the reverse is invalid. The MR-Egger intercept did not significantly deviate from zero in our study (ppleiotropy=0.444) (Fig.5), suggesting no evidence of horizontal pleiotropy. Additionally, ADAM15 showed bidirectional causal associations with CDH1 (IVW: p < 0.05). These findings suggest potential causal associations between these proteins at the plasma level, and in particular the unidirectional causal association of ADAM15 with CDH15 is of interest. Due to the presence of heterogeneity (pQ<0.05), we used IVW's random-effects model to minimize the effect on the results (Additional file 1: Table S6 and Additional file 2: Fig.S5-6).
Additionally, we discovered reliable interactions between ADAM15 and Src tyrosine kinase (SRC), HCK, and CDH15 and CTNNB1 (with a minimum required interaction score threshold >0.7). Some of these proteins have been previously identified as targets for current treatments of ILD and IPF. Specifically, SRC is one of the targets for Nintedanib(35), a drug recommended in the current IPF treatment guidelines. Furthermore, the mechanism by which increased expression levels of CDH1 delay the progression of pulmonary fibrosis may be explained by the targeted regulation of YTHDF2 by miR-494-3p to delay the epithelial-mesenchymal transition (EMT)(36). Additionally, studies have reported that the anti-fibrotic function of CDH15 with the NRF2 activator sulforaphane (SFN) may be achieved by inhibiting the expression of CDH2(37). These findings suggest substantial evidence demonstrating a potential association and feasibility of ADAM15 and CDH15 with ILD in terms of molecular mechanisms and drug development. Furthermore, for LTBR, we found that it has the strongest correlation with LTB.