Research Structure
To eliminate the interference of confounders and reverse causation, we performed a bi-directional MR analysis by teasing apart the causal effect of RA on the risks of IBD, UC and CD and the reverse causal effect, respectively. The MR analysis was first conducted in one direction (RA to IBD, UC and CD) and then conducted in the opposite direction (IBD, UC and CD to RA) with the genetic variants robustly associated with each disease in the separate GWAS. The MR approach should be based on three core assumptions: genetic variants used as IVs are associated with the exposure (assumption 1); genetic variants are not associated with any confounders (assumption 2); genetic variants influence the risk of outcome only through exposure, not through any alternative pathways (assumption 3) (Figure 1).
Data Sources of RA
Summary statistics for RA were extracted from a genome-wide association study (GWAS) meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 cases and 73,758 controls).10 This GWAS meta-analysis based on European and Asian ancestries contains data from two consortia including the Rheumatoid Arthritis Consortium International for Immunochip (RACI) consortium and the Genetics and Allied research in Rheumatic diseases Networking (GARNET) consortium(Table 1). All RA cases fulfilled the 1987 criteria of the American College of Rheumatology or were diagnosed by a professional rheumatologist. The GWAS identified 102 single nucleotide polymorphisms (SNPs) genome-wide significantly associated with RA (identified as P < 5×10-8). However, 11 of the RA-related SNPs was unavailable in the International IBD Genetics Consortium, leaving only 91 SNPs as IVs. The details of instrumental SNPs could be seen in eTable 1.
Data Sources of IBD, UC and CD
Summary-level data were extracted from a recent GWAS replication and meta-analysis based on the European GWAS and Immunochip in the International IBD Genetics Consortium, involving 213,658 participants including 38,155 IBD cases and 48,485 controls, 17,647 UC cases and 47,179 controls, and 20,550 CD cases and 41,642 controls (Table 1).11 The diagnosis of IBD was based on the accepted clinical criteria of radiologic, endoscopic, and histopathologic evaluation. This GWAS meta-analysis identified 27 SNPs for IBD, 4 SNPs for UC and 7 SNPs for CD (identified as P < 5 × 10−8 in European ancestry or log10 Bayes factor > 6 in the combined trans-ancestry association analysis) and 2 SNPs associated with IBD was unavailable in the RACI and GARNET consortium, leaving 25 SNPs for IBD, 4 SNPs for UC and 7 SNPs for CD is as IVs (eTable 3).
Statistical Analysis
Assessment of Linkage Disequilibrium
To meet assumption 1 and 2, the IV must be associated with RA or IBD, UC and CD, and not with other confounders. We assessed the linkage disequilibrium (LD) correlation among SNPs of IBD in order to select independent genetic variants (r2<0.1). We chose the one with the lowest P value associated with IBD, UC and CD, if genetic variants are in LD (r2>0.8). F statistic was used to test the strength of genetic variants, and F > 10 was identified as meeting assumption 1.
Mendelian Randomization and sensitivity analysis
We used the inverse variance weighted method in the MR analysis as the main analysis method, which presents a combined estimate of the causal estimate from each SNP.12 To conduct MR analysis with this method, we had to assume that there was no SNP had horizontal pleiotropy. Then we performed sensitivity analyses for the IVs significantly related to exposure with the simple median, weighted median and MR Egger regression methods. Given that not all IVs were valid, the simple median was used to provide with a consistent effect estimate when at least 50% of IVs were valid.13 Compared to the simple median, the weighted median could provide a consistent effect estimate when at least 50% of the weight was from valid IVs, presenting the robustness in the causal estimate of the exposure-outcome effect.13 As for MR Egger regression method, the zero intercept result was considered nearly no pleiotropy effect;14 thus showing the pleiotropy-corrected effect.15 To deeper examine the effect of influential or pleiotropy variants ,we performed leave-one-out analysis, where one variant was neglected at one time.16 Power calculations in the MR analysis were conducted based on the website: mRnd (http://cnsgenomics.com/shiny/mRnd/).
All analyses above were performed in the R version 3.6.1 computing environment (http://www.r-project.org) using the TwoSampleMR package (R project for Statistical Computing). This package harmonized effect of exposure and outcome data sets including combined information on SNPs, effect alleles, non-effect alleles, effect estimates, standard errors for instrumental SNPs. By the way, we assumed that all alleles are presented on the forward strand in harmonization. Therefore, the bi-directional results took the full set of instrumental SNPs into account. In addition, we applied a conservative approach considering multiple testing and set a significant threshold as 0.017 (0.05 divided by 3) after Bonferroni’s correction. P≤0.05 but above the Bonferroni corrected significance threshold was considered as a suggestive significant association.
LD Score regression
To evaluate the genetic correlation between RA and IBD, UC or CD, we performed LDSC by using GWAS summary statistics from two previous trait-specific GWASs. LDSC estimated genetic correlations between the true causal effects of RA and IBD, UC or CD (ranging from -1 to 1) using GWAS summary statistics of Hapmap3 SNPs. SNPs with high LD region would have higher χ2 statistics than SNPs with a lower LD region for each disease, and a similar relationship held if single-study test statistics were replaced with the product of the z scores from two studies of traits with some correlation.17