MR design
In this study, we aimed to investigate the causal relationship between higher glycated hemoglobin (HbA1c) levels and the risk of rheumatoid arthritis (RA) using a two-sample Mendelian randomization (MR) approach. The key assumptions of MR design are as follows:1. Genetic variants are strongly associated with the exposure of interest (HbA1c levels) at a genome-wide significance threshold (p < 5 × 10^(-8)).2. Genetic variants are not associated with any potential confounders including smoking, BMI, and osteoarthritis that may affect the relationship between HbA1c levels and RA risk. 3. Genetic variants influence the outcome (RA risk) only through the exposure of interest (HbA1c levels). The Mendelian randomization study design model is depicted (Fig. 1).To ensure the validity of our MR analysis and address potential violations of these assumptions, we employed complementary statistical methods, including pleiotropy analysis and sensitivity analysis.
Data Source
In this study, we utilized summary-level single-nucleotide polymorphism (SNP)-phenotype association data for glycated hemoglobin (HbA1c) levels from the largest within-sibship genome-wide association study (GWAS) to date, involving 178,076 siblings from 19 studies and covering 25 phenotypes[14]. This within-sibship GWAS accounts for potential demographic and indirect genetic effects, providing more accurate genetic associations for phenotypes affected by assortative mating, population stratification, and indirect genetic effects. Rheumatoid arthritis (RA) data were sourced from the FinnGen study's data freeze 5 (DF5), released on May 11, 2021. This dataset consists of 218,792 participants, with 16,962,023 analyzed variants and 2,803 disease endpoints[15]. HbA1c levels were measured using fasting or nonfasting whole blood samples and analyzed with National Glycohemoglobin Standardization Program (NGSP)-certified methods[16]. Professional rheumatologists made diagnoses of and overall RA following the 1987 RA diagnosis criteria of the American College of Rheumatology[17]. Furthermore, RA was further classified into seropositive rheumatoid arthritis (SPRA) and seronegative rheumatoid arthritis (SNRA), defined by the presence or absence of autoantibodies including rheumatoid factor (RF) or anti-citrullinated protein antibodies (ACPAs)[18]. Based on previous studies, we identified BMI[19], smoking initiation[20, 21], and years of education[22] as potential risk variables. All data mentioned above were obtained from genome-wide association studies (GWAS) in the Integrative Epidemiology Unit (IEU) OpenGWAS database [23]. SNPs associated with exposure (HbA1c), outcome (RA), and confounders (BMI, tobacco smoking and years of education) are summarized in Table 1.No specific ethical approval or written informed consent was necessary.
Table 1
Summary of GWAS Variables, Traits, and Populations for Mendelian Randomization Analysis
Type of variable | GWAS ID | Year | Traits | Consortium | Population | cases Size | controalsize |
Exposure | ieu-b-4841 | 2022 | HbA1c | Within family GWAS | European | 17,724 | 178,086 |
Outcome | finn-b-RHEUMA_SERONEG | 2021 | SNRA | Finn | European | 1,937 | 172,834 |
finn-b-RHEUMA_SEROPOS | 2021 | SPRA | Finn | European | 4,596 | 172,834 |
finn-b-M13_RHEUMA | 2021 | Overall RA | Finn | European | 6,236 | 147,221 |
Confounders | ieu-a-835 | 2015 | BMI | GIANT | European | 322,154 | 322,154 |
ebi-aGCST005814 | 2018 | Overall OA | MRC-IEU | European | 50,508 | 15,845,511 |
ieu-a-962 | 2018 | Ever versus never smoked | TAG | European | 32,066 | 74,035 |
Abbreviations:GWAS: Genome-wide association study, HbA1c: Glycated hemoglobin,OA: Osteoarthritis, GIANT: Genetic Investigation of Anthropometric Traits, MRC-IEU: Medical Research Council Integrative Epidemiology Unit, TAG: Tobacco and Genetics consortium |
Instrument Variables selection
To identify suitable instrumental variables (IVs) that meet the three assumptions of MR analysis, we implemented a series of rigorous quality control procedures. Firstly, we selected independent SNPs associated with HbA1c levels at genome-wide significance (P-value < 5×10^-8) as potential IVs. Secondly, we employed a clumping strategy with a threshold of r^2 < 0.001 and a window size of 10,000 kb to account for linkage disequilibrium (LD)[24].
We then used the PhenoScanner tool to detect pleiotropic SNPs associated with known RA risk factors, such as BMI, osteoarthritis, and smoking and removed these SNPs if present[25]. To evaluate the strength of the selected SNPs, we calculated the F statistics using the formula F = R^2 (N–k–1)/[(1–R^2)k], where R^2 is the proportion of variability explained by each SNP, N is the sample size of the GWAS, and k is the number of SNPs[26]. If the F-statistic is < 10, it indicates that the IV is weak and potentially unsuitable for our analysis[27].
Subsequently, these independent instrumental SNPs were harmonized with each outcome summary statistics. We excluded SNPs with a minor allele frequency (MAF) < 0.01, palindromic SNPs, and those potentially associated with the outcome[P < 0.05].
Statistical Analysis
The "TwoSampleMR" R package (version 0.5.6, Stephen Burgess, Chicago, IL, USA) was utilized for two-sample MR analysis to investigate the relationship between HbA1c levels and three outcomes: seropositive rheumatoid arthritis (SPRA), seronegative rheumatoid arthritis (SNRA), and overall RA. The inverse-variance weighted (IVW, random effects) method served as the primary analysis approach, while supplementary analysis methods included MR-Egger, weighted median, simple mode, and weighted mode.
The IVW method assumes all included SNPs can be used as effective IVs, providing accurate estimates[27]. When pleiotropy is present, MR-Egger regression can detect and adjust for it, albeit with potentially lower estimation accuracy[28]. The weighted median method is accurate under the assumption that at least 50% of the IVs are valid[29]. The simple mode method offers robustness against pleiotropy, even though it might be less powerful than the IVW method. Conversely, the weighted mode method is sensitive to the challenging bandwidth selection for mode estimation[30]. The results are expressed as odds ratios (ORs) and 95% confidence intervals (CIs) for RA risk corresponding to a unit standard deviation (SD) change in HbA1c levels.
Pleiotropy and heterogeneity Analysis
The mRnd tool was employed to calculate the power to detect causal effects[31]. Scatter plots of MR analysis were generated to visualize the relationship between genetic associations with exposure and outcomes for each SNP, illustrating the consistency of effects across different SNPs and providing insight into potential pleiotropy or other biases.
Cochran's Q test was used to assess the heterogeneity of IVs in both the IVW method and MR-Egger regression. MR-Egger regression identifies potential pleiotropy and evaluates its impact on risk estimation in the intercept test. Additionally, the MR pleiotropy residual sum and outlier (MR-PRESSO) test was applied to detect possible horizontal pleiotropy and mitigate pleiotropy impacts by removing outliers[32].
To further explore the source of heterogeneity and assess the robustness of the results, a leave-one-out method was implemented, which involves sequentially removing one SNP at a time and recalculating the MR estimates. Funnel plots were created to visually evaluate any asymmetry in MR estimates, which could suggest potential pleiotropic effects, publication bias, or other issues. In a symmetric funnel plot, the distribution of MR estimates should be evenly dispersed around the overall MR estimate, indicating no major biases or pleiotropic effects.