MR design
The study design is depicted in Fig. 1. Using single nucleotide polymorphisms (SNPs) as instrumental variables (IVs), we analyzed a dataset from the Genome-Wide Association Study (GWAS) comprising 731 immune cells (7 groups) [19]. Later, a separate GWAS confirmed the correlation between these SNPs and MAFLD [20]. Subsequently, we corroborated these initial findings by employing summary-level data from an independent dataset, focusing on the noteworthy immune traits. Finally, we assessed the interactions between immune traits and risk factors for MAFLD, encompassing coronary artery disease (CAD), type 2 diabetes (T2D), body mass index (BMI), blood lipids, and systolic blood pressure. Additionally, we investigated the degree to which these risk factors mediate the impact of immune traits on MAFLD.
In the field of causal inference, the utilization of instrumental variables to represent risk factors is contingent upon the introduction of genetic variation. The validity of this approach depends on three essential assumptions: (i) the existence of a direct link between genetic variation and the exposure factor; (ii) the genetic variation being independent of both the exposure factor and potential confounding variables related to the outcomes; and (iii) the impact of genetic variation on the final outcome being solely mediated by the action of the exposure factor [21] (Supplementary Figure S1).
Moreover, the current investigation followed the STROBE-MR recommendations [22] (Supplementary Table S1). During the initial study, ethical approval was obtained and written informed consent was obtained for all publicly accessible datasets used in this investigation.
Data sources for MAFLD in GWAS
Two independent datasets were retrieved from recently conducted GWASs of MAFLD [20, 23]. The data for discovery analysis were acquired from an extensive GWAS performed by Ghodsian et al [20]. The study included four European groups, specifically the Finn Gen, the eMERGE network, the UK Biobank, and the Estonian Biobank. These cohorts consisted of 8,434 individuals with MAFLD and 770,180 individuals without MAFLD. After accounting for sex, age, ten principal components derived from ancestry, and genotyping [20], the associations were evaluated in this meta-analysis of GWASs. Using the electronic health records, individuals of European ancestry with MAFLD, which includes hepatic steatosis, non-alcoholic steatohepatitis, or liver fibrosis, were identified. In addition, electronic health record codes (ICD9 571.5, ICD9 571.8, ICD9 571.9, ICD10 K75.81, ICD10 K76.0, and ICD10 K76.9) were used to establish the diagnosis of MAFLD. Furthermore, replication analysis involved the acquisition of data from a comprehensive GWAS of MAFLD in individuals of European descent, comprising 1483 cases and 17,781 controls [23]. MAFLD was diagnosed using liver biopsy in this study, which is considered the gold standard method for diagnosing MAFLD (Supplementary Table S2).
Immunity-wide GWAS data sources
This study combined information from a recent GWAS on immune characteristics that included a population of 3757 individuals of European origin [19]. In this specific GWAS, the influence of approximately 22 million genetic variations on 731 attributes associated with immune cells was examined in a group of 3757 individuals from Sardinia. The examination considered variables that could influence the results, including factors such as age, sex, and age2. The GWAS included a wide variety of 731 immunophenotypes and different measurements such as total cell numbers (n = 118), median fluorescence intensity (MFI) representing surface antigen levels (n = 389), morphological parameters (MP) (n = 32), and proportions of cell counts (n = 192). MFI, AC, and RC exhibit different types of immune cells, including B cells, CDC cells, T-cell maturation stage cells, monocytes, myeloid cells, TBNK cells (consisting of T cells, B cells, and natural killer cells), and Tregs. Conversely, CDC and TBNK were included among the MP features.
Risk factors GWAS data sources
The respective GWASs were utilized to extract summary statistics of various risk factors, such as coronary artery disease (CAD) [24], T2D [25], BMI [26], blood lipids (high-density lipoprotein cholesterol [HDL-C], low-density lipoprotein cholesterol [LDL-C], triglycerides, apolipoprotein A-1 [ApoA1], apolipoprotein B [ApoB]) [27], and systolic blood pressure (SBP) [28] (Supplementary Table S1).
Choosing instrumental variables (IVs)
According to previous studies [29, 30], a small number of SNPs were shown to be associated with immune cells at a level of significance below P < 5 × 10− 8. To guarantee a sufficient number of SNPs for future bidirectional Mendelian analyses, the significance threshold for each immune trait was decreased to below 1 × 10− 5. Furthermore, we conducted a screening process to eliminate SNPs located on chromosome 23, as well as SNPs harboring more than two alleles to mitigate any potential confounding influences on our MR analysis outcomes. Afterward, SNPs with minor allele frequencies (MAF) less than 0.01 were excluded. Following this, samples from the 1000 Genomes European Project were utilized as a point of reference to evaluate the linkage disequilibrium (LD) among independent variables (IVs), in accordance with the stipulations of r2 < 0.001 and a window size surpassing 10,000 kb. Furthermore, we utilized PhenoScanner [31] to identify SNPs that exhibited a significant association with confounding factors, as a means to initially mitigate the influence of horizontal pleiotropy
Moreover, it is crucial that the instrumental variables demonstrate a robust correlation with the exposure factors to meet the initial hypothesis. Otherwise, the inclusion of inadequate instrumental variables could introduce bias to the findings. Furthermore, to confirm the initial assumption, it is crucial for the instrumental variables to demonstrate a robust correlation with the exposure factors; otherwise, the inclusion of inadequate instrumental variables could lead to biased outcomes. For evaluation purposes, we utilized the R2-statistic(R2 = 2×EAF×(1-EAF)×beta^2/(2×EAF×(1-EAF)×beta^2) + 2×EAF×(1-EAF)×se×N×beta^2)) [32, 33] and F-statistic (F=(N-K-1) ×R2/(1-R2) ×K) to determine the level of association between each instrumental variable and the exposure factors. Instrumental variables with F values less than 10 were considered weak, and only those with F values exceeding 10 were kept [34].
Statistical analysis
In our study, we employed the Wald ratio for MR analysis to investigate immune traits associated with a single instrumental variable (IV) in both the discovery and replication stages.
Conversely, for immune traits comprising multiple IVs, we employed the inverse-variance weighted (IVW) approach as the primary analytical method to investigate the association. The IVW method is widely utilized to obtain variant-specific causal estimates, enabling the amalgamation of effect values from multiple IVs into a single estimate, thereby facilitating a more precise analysis of the causal relationships between variables[35]. Furthermore, we incorporated additional analytical approaches such as the weighted median method, MR-Egger[36], simple median method, and MR-PRESSO as supplementary techniques. To improve the accuracy of the causal estimates, a fixed-effect meta-analysis was performed using the meta R package[37] to combine the estimates from both the discovery and replication stages.
The MR-PRESSO global test was utilized in our study to identify horizontal pleiotropy. This test involves conducting a weighted regression analysis of all genetic variants and subsequently calculating the residual sum of squares (RSS). Each IV was systematically removed, and the corresponding RSS value was computed. A significant decrease in the RSS compared to the previous iteration, accompanied by statistical significance (p < 0.05), would indicate the presence of horizontal pleiotropy for the respective SNP. The MR-PRESSO outlier test was employed to identify outlier SNPs, and subsequent recalculations of estimates were conducted after excluding these outliers, thereby mitigating potential pleiotropic effects on our MR analysis. [38].
A series of sensitivity analyses was employed to assess the robustness of the findings. Heterogeneity was measured using Cochran's Q statistic, which is considered to indicate heterogeneity if the p value is less than 0.05. Additionally, the I2 statistic was used to quantify the extent of heterogeneity, calculated as I2 = (Q − Q_df )/Q. If I2 exceeded 25%, heterogeneity was inferred to exist [35]. The results obtained from the analysis, conducted using the random effects model of the IVW method, may be deemed more dependable if a substantial level of heterogeneity is observed among SNPs.
Two-step MR analysis was also conducted to investigate the mediating role of risk factors in the relationship between immune traits and MAFLD [39]. To estimate the proportion of the total effect mediated by risk factors, we divided the indirect effect (β1 × β2/β3) by the total effect using β1, β2, and β3 to represent the respective effects of immune traits on risk factors, risk factors on MAFLD, and immune traits on MAFLD. We calculated standard errors using bootstrapping and effect estimates using MR analysis on two samples. We used R version 4.2.1 (R Foundation for Statistical Computing, Vienna, Austria ).