Assessment of Causal Relationship Between Rheumatoid Arthritis and Inammatory Bowel Disease: A Bi-Directional Mendelian Randomization Study

Objective: Observational studies have shown that rheumatoid arthritis (RA) was associated with a higher risk of inammatory bowel disease (IBD), and vice versa. However, if such associations reect causality remains unclear. We aimed to examine the bidirectional causal associations of RA with IBD, ulcerative colitis (UC), and Crohn’s disease (CD). Method: A Bi-directional Mendelian randomization (MR) analysis and Linkage Disequilibrium Score regression (LDSC) were used to investigate the causality between RA and IBD subtypes. Summary data was extracted from the Rheumatoid Arthritis Consortium International for Immunochip (RACI) consortium and the Genetics and Allied research in Rheumatic diseases Networking (GARNET) consortium (29,880 cases of RA and 73,758 controls); and the International IBD Genetics Consortium (38,155 cases of IBD and 48,485 controls; 17,647 cases of UC and 47,179 controls; 20,550 cases of CD and 41,642 controls). Results: Genetically predicted RA was associated with higher risks of IBD (per 1-unit higher log odds: odds ratio: 1.13; 95% condence interval: 1.04-1.21; P=0.002) and UC (1.16; 1.07-1.25; P=1.5 × 10 −4 ) after a Bonferroni correction (P<0.05/3). In addition, the weighted median method showed a suggestive association of RA with a higher risk of CD (1.16; 1.05-1.28; P=0.003). However, there was no evidence showing a causal relation of IBD, UC or CD with RA risk. Conclusion: Our ndings provide novel evidence supporting a signicant causality of RA with IBD or UC, whereas IBD is unlikely to increase the risk of RA, indicating the importance of keeping gut microbiota composition healthy for RA to IBD.


Introduction
Rheumatoid arthritis (RA) and in ammatory bowel disease (IBD) are two in ammatory diseases involved with innate and adaptive immune components that change the metabolic state of their respective target tissues. 1 Accumulating evidence has suggested that an abnormal overall composition of intestinal microbiota plays an important role in the pathogenesis of RA and IBD, 2,3 indicating overlaps between pathogenic pathways. Therefore, guring out the bidirectional causal relationship between RA and IBD may facilitate early diagnosis and prevention of complications.
Compelling evidence from observational studies such as case-control studies [4][5][6] and large crosssectional study 7 has consistently shown that RA is signi cantly associated with a higher risk of IBD, ulcerative colitis (UC) and Crohn's disease (CD). In fact, these observational studies largely relying on selfreported information are susceptible to measured or unmeasured confounders and reverse causation bias, thus di cult to infer causality among RA and IBD, UC and CD.
Mendelian Randomization (MR) analysis is a sort of natural randomized control trial method at the genetic level, when randomization occurs during meiosis and conception, and genotype is hardly likely to be affected after birth. 8 Thus, genetic instrumental variables (IVs) are relatively independent from any environmental factors or other developed diseases, ensuring that MR method can largely avoid confounders and reverse causation bias using IVs. 9 To our knowledge, no MR analysis of the association between RA and IBD thus far have been conducted.
Hence, in the present study, we for the rst time performed a bi-directional MR analysis using summerylevel statistics from large genome-wide association study (GWAS) consortium to assess the bidirectional causal associations between RA and IBD, UC or CD.

Research Structure
To eliminate the interference of confounders and reverse causation, we performed a bi-directional MR analysis by teasing apart the causal effect of RA on the risks of IBD, UC and CD and the reverse causal effect, respectively. The MR analysis was rst conducted in one direction (RA to IBD, UC and CD) and then conducted in the opposite direction (IBD, UC and CD to RA) with the genetic variants robustly associated with each disease in the separate GWAS. The MR approach should be based on three core assumptions: genetic variants used as IVs are associated with the exposure (assumption 1); genetic variants are not associated with any confounders (assumption 2); genetic variants in uence the risk of outcome only through exposure, not through any alternative pathways (assumption 3) (Figure 1).

Data Sources of RA
Summary statistics for RA were extracted from a genome-wide association study (GWAS) meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 cases and 73,758 controls). 10 This GWAS meta-analysis based on European and Asian ancestries contains data from two consortia including the Rheumatoid Arthritis Consortium International for Immunochip (RACI) consortium and the Genetics and Allied research in Rheumatic diseases Networking (GARNET) consortium (Table 1). All RA cases ful lled the 1987 criteria of the American College of Rheumatology or were diagnosed by a professional rheumatologist. The GWAS identi ed 102 single nucleotide polymorphisms (SNPs) genomewide signi cantly associated with RA (identi ed as P < 5×10 -8 ). However, 11 of the RA-related SNPs was unavailable in the International IBD Genetics Consortium, leaving only 91 SNPs as IVs. The details of instrumental SNPs could be seen in eTable 1.

Data Sources of IBD, UC and CD
Summary-level data were extracted from a recent GWAS replication and meta-analysis based on the European GWAS and Immunochip in the International IBD Genetics Consortium, involving 213,658 participants including 38,155 IBD cases and 48,485 controls, 17,647 UC cases and 47,179 controls, and 20,550 CD cases and 41,642 controls (Table 1). 11 The diagnosis of IBD was based on the accepted clinical criteria of radiologic, endoscopic, and histopathologic evaluation. This GWAS meta-analysis identi ed 27 SNPs for IBD, 4 SNPs for UC and 7 SNPs for CD (identi ed as P < 5 × 10 −8 in European ancestry or log10 Bayes factor > 6 in the combined trans-ancestry association analysis) and 2 SNPs associated with IBD was unavailable in the RACI and GARNET consortium, leaving 25 SNPs for IBD, 4 SNPs for UC and 7 SNPs for CD is as IVs (eTable 3).

Assessment of Linkage Disequilibrium
To meet assumption 1 and 2, the IV must be associated with RA or IBD, UC and CD, and not with other confounders. We assessed the linkage disequilibrium (LD) correlation among SNPs of IBD in order to select independent genetic variants (r 2 <0.1). We chose the one with the lowest P value associated with IBD, UC and CD, if genetic variants are in LD (r 2 >0.8). F statistic was used to test the strength of genetic variants, and F > 10 was identi ed as meeting assumption 1.

Mendelian Randomization and sensitivity analysis
We used the inverse variance weighted method in the MR analysis as the main analysis method, which presents a combined estimate of the causal estimate from each SNP. 12 To conduct MR analysis with this method, we had to assume that there was no SNP had horizontal pleiotropy. Then we performed sensitivity analyses for the IVs signi cantly related to exposure with the simple median, weighted median and MR Egger regression methods. Given that not all IVs were valid, the simple median was used to provide with a consistent effect estimate when at least 50% of IVs were valid. 13 Compared to the simple median, the weighted median could provide a consistent effect estimate when at least 50% of the weight was from valid IVs, presenting the robustness in the causal estimate of the exposure-outcome effect. 13 As for MR Egger regression method, the zero intercept result was considered nearly no pleiotropy effect; 14 thus showing the pleiotropy-corrected effect. 15 To deeper examine the effect of in uential or pleiotropy variants ,we performed leave-one-out analysis, where one variant was neglected at one time. 16 Power calculations in the MR analysis were conducted based on the website: mRnd (http://cnsgenomics.com/shiny/mRnd/). All analyses above were performed in the R version 3.6.1 computing environment (http://www.rproject.org) using the TwoSampleMR package (R project for Statistical Computing). This package harmonized effect of exposure and outcome data sets including combined information on SNPs, effect alleles, non-effect alleles, effect estimates, standard errors for instrumental SNPs. By the way, we assumed that all alleles are presented on the forward strand in harmonization. Therefore, the bidirectional results took the full set of instrumental SNPs into account. In addition, we applied a conservative approach considering multiple testing and set a signi cant threshold as 0.017 (0.05 divided by 3) after Bonferroni's correction. P≤0.05 but above the Bonferroni corrected signi cance threshold was considered as a suggestive signi cant association.

LD Score regression
To evaluate the genetic correlation between RA and IBD, UC or CD, we performed LDSC by using GWAS summary statistics from two previous trait-speci c GWASs. LDSC estimated genetic correlations between the true causal effects of RA and IBD, UC or CD (ranging from -1 to 1) using GWAS summary statistics of Hapmap3 SNPs. SNPs with high LD region would have higher χ 2 statistics than SNPs with a lower LD region for each disease, and a similar relationship held if single-study test statistics were replaced with the product of the z scores from two studies of traits with some correlation. 17 Table 1 shows the sample size, population and publication year of GWASs for RA, IBD, UC and CD. We discarded variants from the association analysis which were not available and with no proxy in the outcome GWASs. We tested whether the selected SNPs were affected by LD, and ultimately chose the of RA on IBD, UC and CD, and found that the powers were strong enough (more that 80%). In the other direction, this MR power calculation showed that we have 98% power to test the signi cant (P=0.05/3) causal relationship (OR=1.3) of IBD on RA. However, when detecting the causal relationship of UC or CD on RA, we found the powers were both less than 80%.

Characteristics of instrumental variables
Causal effect of RA on IBD, UC or CD We found that genetically predicted higher RA was signi cantly associated with higher odds of IBD and UC. Per unit higher log odds of RA, the odds ratio (OR) of IBD were 1.13 (95% con dence interval (CI),  (Figure 2). The intercepts of MR Egger regression method for each outcome, which were centered at the origin with a CI including the null, further suggested no horizontal pleiotropy in ated effect between those variants. (Figure 2). Regarding CD, the weighted median method (1.16; 1.05-1.28; P=0.003) and the simple median method (1.18; 1.07-1.30; P=0.001) indicated a causal association of RA with CD ( Figure 2). However, the IVW method (1.08; 0.97-1.20; P=0.154) showed a similar trend, but not a signi cant association (Figure 2). In the leave-one-out analysis, we did not identify any in uential variants or outlier in the associations of RA with IBD or UC (eTable 5). As for SNP effect size upon IBD and UC versus that upon RA, we plotted two graphs to present the estimated results through different MR methods (eFigure 1 and eFigure 2). Moreover, MR estimate of IVs versus instrument precision was also plotted to examine evidence of directional pleiotropy (eFigure 1 and eFigure 2).
Causal effect of IBD, UC or CD on RA In the reverse-direction MR analysis, genetically predicted higher IBD, UC and CD was not associated with the risk of RA. The OR were 0.86 (0.72-1.03; P=0.10) per unit higher log odds of IBD, 0.91 (0.74-1.12; P=0.37) per unit higher log odds of UC, and 1.12 (0.88-1.43; P=0.37) per unit higher log odds of CD ( Figure 3). Likewise, in the sensitivity analyses, both the weighted median method and the simple median method showed the similar results for IBD, UC and CD (Figure 3). In addition, the leave-one-out analysis results showed that the OR ranged from 0.83 (0.70-0.98; P=0.029) when we excluded the genetic variant (rs 4703855) to 0.83 (0.70-0.99; P=0.036) when we excluded the variant (rs 4692386) (eTable 6).

Discussion
In the present study, we systematically assessed the causal relationships and genetic correlation between RA and IBD subtypes. We found strong evidence supporting causal associations of genetically predicted RA and higher risks of IBD and UC and signi cant genetic correlation of RA and UC, whereas IBD is unlikely to increase the risk of RA, suggesting RA patients may have higher risk of developing IBD. Our ndings indicated that keeping an optimal healthy gut microbiota composition might have impressive clinical bene ts for RA patients to prevent the development of IBD.
Growing evidence from observational studies has suggested that RA is associated with a higher risk of IBD. 3,7 For example, IBD was observed with greater frequency among patients with RA in a previous observational study (n = 63,005). 7 Our results were also in line with the evidence from a recent small cohort study (n = 102) presenting a potential links to gut microbiota composition between RA and UC. 18 Furthermore, a large cross-sectional study with a total of 47,325 IBD cases and 92,839 healthy controls suggested that IBD patients tended to have co-occurring RA problems. 3 However, Vanessa et al 6 conducted a case-control study (n = 3,284) showing that IBD was associated with RA at any time but even more so in the period before RA diagnosis. In addition, another case-control study (n = 4,532) using paediatric-speci c data to evaluate the prevalence of RA among children with IBD showed no signi cant association between RA and UC, but observe a strong association between RA and CD. 19 In fact, these small observational studies (sample size ranging from 102 to 4,532) are di cult to interpret due to small sample size and bias from unmeasured confounding factors, short-term effect and/or reverse causation. Our bi-directional MR study, for the rst time, provided strong evidence for the bidirectional causal associations between RA and IBD, UC and CD as well as revealed the genetic basis of RA and IBD by leveraging summery-level statistics from large GWAS consortium which was in line with an association direction observed from epidemiological studies above, whereas IBD is unlikely to increase the risk of RA.
Many biologically plausible pathways linking RA to a higher risk of IBD have been identi ed. One possible pathway is through gut dysbiosis which occurs in the in ammatory process and plays a crucial role in the pathogenesis and development of both RA and IBD. 2,20−24 The alterations of intestinal microbiota composition in RA have a considerable overlap with those in IBD, 25,26 among which some bacterial taxa, such as Pseudomonas, have showed consistent trends of changes. 27 Interestingly, a comparative study suggested that the gut microbial communities in RA and UC were most similar among immune-mediated in ammatory diseases, while the gut microbiota of CD was most different from other immune-mediated in ammatory diseases. 18 Taken together, these ndings might at least partially support our results. In addition, previous studies revealed that gut ora might be involved in the development of RA and immune disorders by activating Th17 cell. 28-30 Correspondingly, the role of intestinal microbiota-reactive Th17 cells is also identi ed as mediating the pathogenesis of IBD, 31 further con rming the potential causal pathway linking RA to IBD. However, further investigations are required to elucidate the underlying mechanism of gut microbiota-host crosstalk in both RA and IBD.
There are several strengths in our study. To the best of our knowledge, we for the rst time assessed the causal relationship between RA and IBD using summary-level data from large GWAS, which are not generally susceptible to reverse causation and confounding. Importantly, the results from four different MR methods are consistent, showing the robustness of this study. In addition, the power calculated from summary-level data was su cient, thus proving the reliable causal estimation. However, several limitations merit consideration. First, although we have exploited the large genome-wide summary-level data and carefully selected SNPs, the possibility of potential weak instrument bias couldn't be ruled out.
To address this, F-statistic can reliably detect the strength of IVs 32 which is a proof of our strong instrument strength. Second, it is a great challenge for all MR analysis to exclude pleiotropy and other alternative direct causal pathway, especially for those diseases determined by genetic variants, 33 although we applied additional sensitivity analyses to validate the robustness of our results. Finally, the association between RA and UC could not distinguish share genetic basis from causal relationship because human genome in uenced both RA and IBD.
In summary, our MR analysis provides strong evidence supporting associations direction from genetically predicted RA to higher risks of IBD and UC, suggesting RA patients might have intestinal microbiome variation and higher risk of developing IBD. However, IBD is unlikely to increase risk of RA. Our ndings have clinical signi cance for preventing IBD for patients with RA, and underscore the importance of maintaining an optimal healthy intestinal microbial population composition. Further studies assessing the clinical signi cance of intestinal microbiome variation in RA patients are required.

Declarations
Contributions XZ, ZZ and TH designed the research. XZ and TH had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. XZ, ZZ and TH wrote the paper and performed the data analysis. All authors contributed to the statistical analysis, critically reviewed the manuscript during the writing process, and approved the nal version to be published. XZ and TH are the guarantors for the study.
Data Availability.
All data used in the present study were obtained from genome wide association study summary statistics which were publicly released by genetic consortia.

Ethical approval and consent to participate
Contributing studies received ethical approval from their respective institutional review boards. Informed consent was obtained from all participants of contributing studies.

Funding
The study was supported by grants from the Peking University Start-up Grant (BMU2018YJ002), Highperformance Computing Platform of Peking University. The funding organization had no role in the preparation of the manuscript.

Disclosure Statement
All authors declare: no support from companies for the submitted work; no relationships with companies that might have an interest in the submitted work in the previous three years; no spouses, partners, or children have no nancial relationships that may be relevant to the submitted work; no non-nancial interests that may be relevant to the submitted work.   Odds ratio for association of genetically predicted RA with IBD, UC and CD, using four different Mendelian randomization methods. OR: odds ratio; CI: con dence internal; RA, rheumatoid arthritis; IBD, in ammatory bowel disease; UC, ulcerative colitis; CD, Crohn's disease. OR (95% CI) means risk of IBD, UC and CD per each 1-unit higher log odds in genetically predicted rheumatoid arthritis.

Figure 3
Odds ratio for association of genetically predicted IBD, UC and CD with RA, using four different Mendelian randomization methods. OR: odds ratio; CI: con dence internal; RA, rheumatoid arthritis; IBD, in ammatory bowel disease; UC, ulcerative colitis; CD, Crohn's disease; RA, Rheumatoid Arthritis. OR (95% CI) means risk of RA per each 1-unit higher log odds in genetically predicted IBD, UC and CD.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.