Research design
In this study, we first evaluated the observational linear and non-linear associations between BMI and IGF-1. We then used MR analyses to test whether the association was causal or not. For linear MR analysis, the two-stage least-square (TSLS) analysis was used, and for non-linear MR analysis, the piecewise linear MR method was used. MR design was conducted according to three basic assumptions: the genetic variants selected as IVs should be robustly associated with the exposure; the selected variants should not be associated with any confounders; and the variants should not be associated with the outcome, except by way of exposure. Finally, we stratified the sample by sex and age to refine the causal association. The flow diagram of the research design is shown in Fig. 1.
Study participants
The UKB cohort is a large biomedical database containing genetic and health information from more than 500 000 UK participants aged 37–73 years recruited from 22 assessment centers between 2006 and 201014. Participants provided basic information about their physical condition and lifestyle through questionnaires at baseline. Biological samples and genetic data were collected and generated by the UKB. All participants signed the written informed consent. This study (application number 41542) was covered by general ethical approval for the UKB cohort.
We restricted our analysis to unrelated participants of the European population. To select unrelated participants, we applied KING software to genome-wide genotype data15. Individuals with diabetes and all kinds of cancer were excluded to mitigate the possibility of associations being driven by the diagnosis or management of this disease. Samples with missing values are also not included.
Phenotypes
In this study, we explored the causal association of BMI with IGF-1. BMI was taken as exposure. It was calculated from height and weight measured by the Tanita BC418MA body composition analyzer. The outcome trait was serum IGF-1. IGF-1 concentration (nmol/L) was measured in a serum sample collected at the initial assessment visit (2006–2010), using the DiaSorin Liaison XL, a chemiluminescent immune assay.
Outliers were determined by the inter-quartile range (IQR) approach and were removed if present. After quality control, a total of 244 991 participants were eligible for analysis.
Covariates
The covariates we considered included age, sex, assessment center (factor with 22 values), smoking status, alcohol intake frequency, physical activity, Townsend depressive index (TDI), and the top 10 genetic principal components.
For smoking status, three levels were included: never, previous, and current. For alcohol intake frequency, it was categorized as never, special occasions only, one to three times a month, once or twice a week, three or four times a week, daily, or almost daily. Physical activity was estimated by summed metabolic equivalent task minutes per week (MET, min/week) for all activities, including walking, moderate activity, and vigorous activity. TDI was calculated immediately before the participant joined UKB and aims to estimate area deprivation. Principal components were used only in the MR analysis.
Choice of instrumental variables
We constructed the BMI polygenic risk score (PRS) from 97 instrumental variables according to the GIANT study, which included 322 154 individuals of European descent and 17 072 individuals of non-European descent16. All these IVs were independent of each other (r2 ≤ 0.1, > 250kb apart). One IV (rs12016871) was not available in the UKB and was disregarded, resulting in 96 left IVs for PRS calculation. For genotype data, we excluded samples with ambiguous sex and those with a high rate of genotype missingness. The PRS for BMI was constructed using PRSice2 software, by computing the weighted average of the number of BMI-increasing alleles for an individual17. The weights used were the effect values given in the GWAS published by the GIANT consortium.
Observational association analyses
The linear observational association between BMI and IGF-1 was examined by fitting a linear regression model. The non-linear observational association was examined by the restricted cubic spline regression model. We adjusted these analyses for covariates including age, sex, smoking status, alcohol drinking status, MET, assessment center, and TDI.
Linear Mendelian randomization
Linear genetic association between BMI and IGF-1 was examined by the two-stage least-squares method. In the first stage, the BMI was regressed on the PRS and relevant covariates (including age, sex, assessment center, and the top 10 genetic principal components). In the second stage, the IGF-1 was regressed on the predicted BMI from the first stage and the same covariates.
Non-linear Mendelian randomization
We used the piecewise linear method to capture the potential non-linear association between the genetically predicted BMI and serum IGF-1, adjusted for covariates including age, sex, assessment center, and the top 10 genetic principal components. Specifically, we regressed BMI on its PRS and generated BMI residuals, which were supposed to be free of the influence of IV. We then divided the total sample into three stratum according to BMI residuals. The reason for using BMI residuals instead of raw BMI was that the latter may lead to misleading results18. Within each stratum, we calculated the local average causal effect (LACE) as the ratio of the effect of PRS to IGF-1 to the effect of PRS to BMI. P values were reported for two non-linear tests: Cochran's Q test, which assesses heterogeneity in LACE estimates, and the quadratic test, which assesses whether the non-linear model was better suited for LACE estimates than the linear model. Note that each LACE estimated the causal effect of BMI on IGF-1 within each particular stratum defined by BMI adjusted by PRS19. Because PRS explained less than 2% of BMI variation, the LACE could be roughly interpreted as within each BMI stratum.
Stratified analysis by sex and age
To assess whether the association between BMI and IGF-1 was modified by sex and age, the total sample was stratified according to sex and the following three age groups: 39–50, 50–60, and 60–73. Association analyses were repeated within each stratification. The linear and non-linear MR analyses in each sex and age stratification were performed as previously described.
Sensitivity analyses
To avoid the potential violation of MR assumptions, we further assessed the validity of the genetic variants by testing the associations of potential confounders (smoking status, alcohol drinking status, TDI, and MET) with the BMI PRS. For the potential confounders with significant correlation, we added it to the covariates of MR analysis. Then, we calculated the SNP-exposure and SNP-outcome associations and conducted radial MR analyses to identify outliers20. Once an outlier was identified (P < 0.05/96), the PRS were reconstructed after removing the outlier. The association analyses were then repeated with new covariates and PRS.
Robustness verification
As previously mentioned, we used the residual stratification method to divide the sample into three strata for nonlinear causality testing. However, this method relies on the strong assumption of the constant genetic effect assumption19. To verify the robustness of the nonlinear MR results and to ensure that the results are not biased by violating the assumption, we performed the following analysis for further verification. Firstly, we applied the new doubly-ranked stratification method to the dataset by repeating the nonlinear MR analysis and the stratified analysis. This new method is thought to relax the parametric assumption21. Secondly, we applied the most recent recommendation from Burgess to reduce heterogeneity in the genetic effect on the exposure22. We log-transformed and normalized the exposure and outcome variables, and then repeated the nonlinear MR analysis.
All analyses were conducted in R (version 4.2.2), using the "rms" package for restricted cubic spline regression, the "ivreg" package for linear MR, the "nlmr" package for non-linear MR, and the "SUMnlmr" package for robustness verification. P < 0.05 was considered to indicate a statistically significant association. The relevant code can be publicly available.