Study design overview
The design of the MR study is shown in Figure 1. In brief, the causal effects of AD on CVD were estimated first, then the causal effects of CVD on AD were evaluated. Genetic variants should strictly meet the three assumptions[12]. We utilized summary statistic datasets from recent genome-wide association studies (GWAS) of allergic and CVD.
Data sources and SNP selection of genetic instruments for AD
The AD includes asthma, hay fever, and eczema, which share many genetic variants that dysregulate the expression of immune-related genes and often coexist in the same individuals. The summary statistics for the AD were obtained from the GWAS catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics) or the IEU OpenGWAS project (https://gwas.mrcieu.ac.uk/), conducted by Ferreira MA et al. [13]. This study included 360,838 European individuals (180,129 cases vs 180,709 controls). Since many observational studies paid attention to asthma[5-7], but not all of the three AD and the three coexist in the same individuals partly, the causal effects of asthma on CVD were also estimated. The summary statistics for asthma were obtained from the GWAS catalog or the IEU OpenGWAS project as well, conducted by Valette K et al., including 408,442 individuals with European ancestry (56,167 cases vs 352,255 controls)[14]. Detailed information on conduction procedures and diagnostic criteria is described in the original publications. The sample characteristics of the study population are described in Additional file 1 Table S1.
Genetic variants associated with AD and asthma, respectively, at genome-wide significance (P<5×10-8), were reported by the GWAS. In the meantime, to clump SNPs for independence, a linkage disequilibrium (LD) test was performed between SNPs. The criteria for LD were defined as SNPs with r2>0.001 and physical distance kb<10,000. SNPs with the lowest p values were retained. Then we searched all 74 and 80 remained SNPs associated with AD and asthma respectively, in, the Phenoscanner database (http://www.phenoscanner.medschl.cam.ac.uk/) to identify whether these SNPs associated with confounding factors or directly affected the outcome[15] (P<5×10-8). We found 10 SNPs correlated with AD associated with confounders (rs56375023, rs7406234 were associated with coronary artery disease; rs11168245 were associated with hypertension; rs6011033 were associated with diastolic blood pressure; rs1419675 were associated with type 1 diabetes; rs301802, rs5758343 were associated with body fat percentage; rs2854001, rs1689510 were associated with body mass index and rs7224129 were associated with HDL cholesterol). As for SNPs of asthma, 7 SNPs associated with confounders (rs10477741、rs56375023 were associated with coronary artery disease; rs112267124、rs2412099 were associated with self-reported hypertension; rs11042902 were associated with blood pressure; rs4480384 were associated with body fat percentage and rs1689510 were associated with body mass index). After removing 10 SNPs and 7 SNPs in AD and asthma GWAS separately to avoid possible pleiotropic effects, we used the remaining SNPs as the IVs in the bidirectional MR analysis (see Additional file 1 Table S2, S3). After removing the SNPs whose proxy SNPs were not available in the outcome GWAS, we harmonized the exposure and outcome data before estimating the association between AD or asthma and CVD.
Data sources and SNP selection of genetic instruments for CVD
Summary statistics on CVD, including heart arrhythmia, atrial fibrillation, supraventricular tachycardia, atherosclerosis, aortic aneurysm and dissection, stroke, peripheral vascular disease, cardiomyopathy, heart valve problem, heart failure, cardiac arrest, and essential hypertension, were obtained from the second round of GWASs results of the UK Biobank (UKB) (http://www.nealelab.is/uk-biobank). The UKB study is a prospective cohort study that collected and offered genetic and other information on more than 500,000 participants living in the UK[16]. These diseases were defined by the UKB study by self-reported status or ICD10 codes. Additional file 1 Table S1 summarizes of the detailed information on these different traits, including sample size, the number of SNPs, phenotype code, and others.
We obtained the variants associated with these CVD at genome-wide significance (P<5×10-6) and independence (r2<0.001, kb>10,000) levels, respectively. Similarly, we checked all SNPs using the Phenoscanner database, and none were associated with confounding factors or directly affected the outcome. Details of SNPs used as IVs are listed in Additional file 1 Tables S4-S15. Then the SNPs not available in the outcome GWAS were removed before estimating the causal effects between the 12 kinds of CVD and AD or asthma, respectively.
MR analysis
We conducted bidirectional MR analysis using by TwoSampleMR package[17] with R software (version 4.2.2). The inverse variance weighted (IVW) method was used as our primary analysis approach, which is the most efficient and widely-used method. The Wald ratio method is used to calculate the effect of each SNP, and a meta-analysis of the individual effect of each SNP was conducted by IVW to generate the concluding beta estimate (beta outcome/beta exposure). Since the outcome was binary[18], we transformed it into the odds ratio (OR). To detect possible violations of IVs assumptions because of directional horizontal pleiotropy, namely the third assumption of SNP, we applied MR-Egger since it can test the null causal hypothesis effectively, which is the base method of horizontal pleiotropy in our analysis[19]. The IVW method relies on the assumption that all SNPs are valid IVs, which can be made up of a weighted median (WM) model providing a consistent effect even if half of the SNPs are pleiotropic[18].
Cochran Q statistics and I2 statistics are used to quantify heterogeneity based on IVW and MR-Egger methods[20]. We used MR pleiotropy residual sum and outlier (MR-PRESSO) test[21] to detect outlier SNPs (Nb Distribution = 10000, Significant Threshold = 0.05), which was conducted by the package “MR-PRESSO”, and the results are shown in Additional file 1 Table S16. Besides, a “leave-one-out” analysis was performed to detect if there is any single SNP too sensitive to be disproportionately responsible for the result. Finally, F statistics were used to evaluate the strength of SNPs to satisfy the first assumption[17]. We calculated the F statistic with the formula:
$$F=(N-k-1)/N\times {R}^{2}/(1-{R}^{2})$$
(N = sample size of the exposure, k = the number of selected SNPs, and R2 represents the phenotype variance induced by the SNPs.) When R2 is not available, we used the formula:
$${R}^{2}=2\times MAF\times (1-MAF)\times {(\beta /SD)}^{2}$$
(β = the effect value of the genetic variant of the exposure, MAF = the effect allele frequency of selected SNPs, \(SD=SE\times \surd N\), SE = the standard error of the genetic variant of the exposure, N = sample size of the exposure). All instruments had an F-statistics above the standard cutoff (༞10), indicating sufficient powerful instruments[22]. Each R2 and F are shown in Additional file 1 Table S16.
Given the multiple testing, a p-value below 0.002 (0.05/24) was considered robust significance after a Bonferroni correction. The p-value between 0.002–0.05 was considered suggestive significance, and the p-value above 0.05 was considered no significance. All statistical analyses were two-sided.