2.1.Exposure sources of GM taxa
The exposure sources of GM taxa were accessed from a largest GWAS conducted by the MiBioGen consortium, comprising 18,340 samples from 24 cohorts in 11 countries[26, 27]. The GWAS study included 211 GM taxa , Genetic variants associated with nine phyla, 16 classes, 20 orders, 35 families, and 131 genera were completely identified. The GM data is available for download at https://mibiogen.gcc.rug.nl/[28-30].
2.2.Outcome sources of hematological malignancies
Summary-level data for these outcomes GWAS summary statistics were extracted from FinnGen_R8 GWAS results and the IEU Open GWAS database, a publicly available summary statistics based on individuals of European ancestry. The FinnGen project is a joint research effort between the public and private sectors, the legacy samples from Finnish biobanks and Finnish health registry data on digital health records are combined with imputed genotype data. As of August 2020, 412,737 samples were collected and 224,737 samples were analyzed[26]. GWAS summary datasets for ML (582 cases and 271,463 controls), LL (129,9 cases and 271,463 controls), FL (955 cases and 271,463 controls), HL (690 cases and 271,463 controls), MM and related plasma cell neoplasms (108,5 cases and 271,463 controls), MPN (166,0 cases and 340,638 controls ) were obtained from the FinnGen_R8 database (https://r8.finngen.fi/); while dataset for DLBCL (209 cases and 218,583 controls ) was obtained from the IEU Open GWAS database (https://gwas.mrcieu.ac.uk/). All participants were of European ancestry.
2.3.Selection of the Genetic Instruments
In this study, 211 GM taxa were categorized into five taxonomic levels, namely phylum, family, order, class, and genus. After excluding 15 unknown GM taxa, a total of 196 GM taxa were considered for further analysis. A range of quality control procedures were used to filter certified genetic instrumental variables (IVs) that satisfied the three key MR assumptions[31]. Firstly, we selected single nucleotide polymorphisms (SNPs) with a more widespread criteria (P<1E-05) as IVs, because the number of eligible IVs (P<5E-08) was extremely minimal. Secondly, to ensure each IV’s independence, we performed a linkage disequilibrium (LD) analysis (R2 <0.001, clumping distance = 10Mb) and eliminated any SNPs that failed to meet the criteria. Thirdly, SNPs having a minor allele frequency (MAF) of less than 0.01 were excluded. Fourthly, we coordinated the exposure and outcome data sets to eliminate palindromic SNPs and non-existent SNPs in the outcome. Finally, we estimated the F statistics to eliminate biases arising from weak instruments in the data[32]. Instruments with F-statistics below 10 are considered weak and will not be included in MR analyses[33].
2.4.Statistical analysis
If the IVW approach found a causal relationship for each GM taxon, further MR methods such as MR-Egger, weighted median, and weighted mode would be used to supplement the IVW result[31, 34]. Specifically, the random-effects IVW method is the most widely used in MR analysis and can provide robust causal estimates in the absence of directional pleiotropy[35]. If more than 50% of the weight is obtained from genuine SNPs, the Weighted median technique can offer consistent estimates of causal effects, while its statistical power is significantly lower to the IVW method[34]. The MR-Egger typically used for direction evaluation, however it can exhibit low precision and be susceptible to outlying genetic variants[36]. All results are expressed as OR values and 95% confidence intervals. The statistical power was calculated with the assistance of the mRnd website (https://shiny.cnsgenomics.com/mRnd/)[37]
Bonferroni correction was applied to the primary MR results to determine significance for multiple testing at each feature level (p<0.05/n, where n represents the number of GM taxa included in each feature level). The resulting multiple testing significance values for phylum, order, family, class, and genus were 5.56×10−3, 2.5×10−3, 1.56×10−3, 3.13×10−3, and 4.2×10−4, respectively. Additionally, for the MR estimates, a nominal significance level at p<0.05 was taken into account.
2.5.Pleiotropy and Sensitivity Analysis
In all MR methods, only exposure-outcome pairs with the same direction were considered causal. To evaluate the reliability of the causal connection, we conducted further sensitivity analyses. Firstly, the intercept term of MR-Egger regression model was conducted to test horizontal pleiotropy[36]. Secondly, Cochran's Q statistic was used to evaluate heterogeneity[38]. Thirdly, the MR-PRESSO test was conducted to detect and correct linear regression outliers in IVW. It has three functions: (1) identify horizontal pleiotropy, (2) remove outliers to adjust for horizontal pleiotropy, and (3) determine whether the causal effects are significantly different before and after the outlier is removed[39]. Meanwhile, we also performed the “leave-one-out” sensitivity analysis to estimate the causal effect of outlying SNPs by removing a different SNP in each iteration, and to identify whether the results were affected by removing specific SNPs.
2.6.Procedures of MR analysis
As shown in Figure1, the MR analysis is executed step-by-step as outlined in a flow chart. In our investigation, we started by running the MR analysis using all of the SNPs that were previously chosen as IVs. We shall delete the outlier variants (i.e., SNPs with P < 0.05) if the MR-PRESSO analysis revealed a significant horizontal pleiotropy. After MR-PRESSO outliers were removed, if the heterogeneity persisted (P value for Cochran's Q statistic < 0.05), we would eliminate the SNPs with P < 1.00 and perform MR analysis again. Furthermore, we should draw cautionary conclusions if potential influencing SNPs were identified in the "leave-one-out" sensitivity analysis.
All statistical analyses and result visualization were performed using R statistical software (version 4.2.2, https://www.R-project.org) with the "TwoSampleMR", "LDlinkR", "forestplot" and “MRPRESSO” Packages. A P-value < 0.05 was considered statistically marked with regard to the results of the MR analyses and sensitivity analyses.