2.1 Data
In this study, we used a combination of survey data and registry-based data. The five primary data sources used are presented in Figure 1.
Data on BMI, sociodemographic factors, and genetic information
The HUNT study has been described in detail elsewhere [30, 31]. In brief, the HUNT study is a population-based cohort study. All inhabitants of Nord-Trøndelag (a rural geographical region in Central Norway) who turned 20 years of age during the study period were invited to participate. So far, four waves have been conducted, and we have used data from the second (HUNT 2) and third (HUNT 3) waves. The first wave did not contain genetic information, and the results from the fourth wave were unavailable at study initiation. Of those that were invited to take part in the study, 69.5% (n=65 261) of adults in HUNT 2, and 54.1% (n=50 809) of the adults in HUNT 3 participated. Data were collected between August 1995 and June 1997 (HUNT 2), and between October 2006 and June 2008 (HUNT 3). The studies consisted of questionnaires, clinical measurements, and collection of biological materials for DNA genotyping (performed using Illumina HumanCoreExome arrays). We used information about participants’ measured height and weight, socioeconomic variables (including age, sex, marital status, urbanity, and smoking status), genetic variants related to obesity, and genetic principal components. If the same individuals had participated in both HUNT 2 and HUNT 3, we used data from HUNT 3 (Figure 1) since this was closer to the time at which healthcare costs were measured.
Data on healthcare costs
The Norwegian healthcare system is divided into primary- and specialist care. All registered inhabitants have a right to a regular GP, and approximately 99% of inhabitants are registered within the GP system [32]. GP services are part of the primary care sector, and GPs have a gatekeeping function so patients need a referral from primary care to access specialist care. Norway has a universal and publicly financed healthcare system, which is free for all children up to age 16 (18 for mental health services). For adults outpatient health services are subsidized and inpatient care is free. Nearly all specialist care and GP costs are included in the Norwegian Patient Register (NPR) and The Norwegian Control and Distribution of Health Reimbursement Database (KUHR). The NPR also contains information about utilization of private institutions and specialists that are contracted to the public healthcare system [33]. Some healthcare costs are not included in our data, and these include for example: physiotherapy, nursing home costs, and dental care.
We used a de-identified key to link the data from HUNT with data from the NPR and the KUHR database. Norwegian hospitals are partly reimbursed via activity-based financing, and these reimbursements are based on the Diagnosis Related Group (DRG)-system. Our specialist care data included data on somatic, psychiatric, and substance abuse-related inpatient and outpatient contacts that occurred from 2009 to 2016. For episodes of care for which DRG-weights were available, we multiplied the DRG-weight pertaining to each particular episode by the average price of a DRG [34-41] (i.e. average patient cost) for the year during which the DRG was registered. Where DRG-weights were not available (such as for psychiatric contacts) we used data on the average cost of similar contacts for that particular year (details are available in the Additional information). We then summed the costs for each year per patient for the years that the patient was alive and living in Norway, and adjusted the yearly costs to 2016 price levels [42, 43]. Finally, we calculated the average yearly specialist cost per patient during the study period.
The KUHR database contains all electronic patient claims made by general practitioners, and includes information about reimbursed amounts, and patient co-payments for all consultations. To compute GP costs we summed reimbursed amounts and co-payments for each patient for each year between 2009 and 2016 that the patient was alive and living in Norway. We then adjusted the costs to 2016 price levels, and calculated the average yearly cost per patient. All costs were converted from Norwegian Kroner (NOK) to 2016 Euros (1.00 € = NOK 9.29) [44].
Data on education, registration status, and relatedness
Next, we linked our dataset with data on educational level from the NUDB database. Age, sex, marital status, urbanity, smoking status, and BMI was measured during the HUNT studies, and were therefore registered at approximately the same time as BMI was measured. For consistency we therefore used education in 1996 for those with a BMI registered in HUNT 2 and educational level registered in 2007 for those whose BMI was reported in HUNT3. The dataset was then linked with data from the Norwegian Population Register on registration status, and individuals were included only during the years that they were alive and living in Norway. Finally, our data were combined with data on relatedness, which allowed us to identify individuals with the same parents.
2.2 Regression models
We investigated the effect of BMI on average yearly GP cost, specialist costs, and total costs using a naïve OLS model, and using a 2SLS IV-regression using genetic variants as instruments. In the first-stage of the 2SLS model, we regressed BMI on the genetic instrument, and in the second-stage, we used the predicted values from the first-stage to estimate the association between BMI and healthcare costs and adjusted the standard errors accordingly.
Since genetic variants are randomly allocated at conception conditional on parental genes, genetic instruments should not be confounded by other variables. However, the relationship between genetic factors and phenotype expression is not fully understood [45], and if adjusting for potential confounders alters the findings then this could indicate that there is a problem with the instrument, and could warrant further investigation. Also, adjusting for confounders might reduce the residual variability of the dependent variable. We explored adjusting for: i) study period (HUNT 2 or HUNT 3), data years (years of data each participant was residing in Norway) (categorical: 1 – 8 years), birth year (categorical: years 1906 – 1989), and sex (categorical), and ii) study period, data years, birth year, sex, educational level (categorical), marital status (categorical), smoking status (categorical), and urbanity (categorical). In the MR analyses, we also explored the effect of adjusting for the first 10 principal components, to adjust for potential population stratification. The statistical power to detect an effect of BMI on GP-, and specialist- cost given our sample of 60 786 individuals was estimated using the mRnd power calculator [46]. Details of the power analyses are available in the additional information and the results are depicted in Figure S1.
2.3 The genetic instruments
AE Locke, B Kahali, SI Berndt, AE Justice, TH Pers, FR Day, C Powell, S Vedantam, ML Buchkovich and J Yang [29] present 97 genetic variants that have been found to be associated with BMI in genome wide association studies (GWAS), and report the strength of the association between each of these variants and BMI. Only one of these variants (rs12016871) were unavailable in our dataset, and for this variant we followed M Brandkvist, JH Bjørngaard, RA Ødegård, BO Åsvold, ER Sund and GÅ Vie [47], and used variant rs4771122 as a proxy. IV-analyses were conducted with three instruments: Instrument 1: An unweighted genetic risk score (GRS) based on the sum of the number of BMI-increasing alleles, out of the 97 BMI-increasing alleles, for each participant. Instrument 2: A weighted GRS which was computed by multiplying the number of BMI-increasing alleles by the respective beta-coefficients from the study by AE Locke, B Kahali, SI Berndt, AE Justice, TH Pers, FR Day, C Powell, S Vedantam, ML Buchkovich and J Yang [29], and then summing the product for each participant. For the sex-stratified analyses we developed sex-specific weighted GRSs using the sex-specific beta-coefficients reported by AE Locke, B Kahali, SI Berndt, AE Justice, TH Pers, FR Day, C Powell, S Vedantam, ML Buchkovich and J Yang [29]. Instrument 3: Including the two genetic variants with the strongest association with BMI (FTO (rs1558902) and MC4R (rs6567160) as dummy variables.
2.4 Instrument validity
Three conditions must be satisfied for an instrument to be regarded as valid.
1) The instrument must be highly correlated with the variables being instrumented, conditional on the other variables in the model (the relevance assumption). In the GWAS that we used to select genetic variants for our instrument, each of the genetic variants were found to be associated with BMI [29], and this has been confirmed in a more recent GWAS [48]. However, the strength of the correlation between each genetic variant and BMI varied between the genetic variants. An instrument can be considered weak if the first-stage F-statistic in the IV-regressions is smaller than 10 (F<10) [49]. A weak instrument will bias the 2SLS estimates towards the OLS estimate [50]. Combining genetic variants into a GRS, as we have done in this study, leads to a higher first-stage F-statistic, and reduced bias [51].
2) There should not be any omitted variables (measured or unmeasured) on the pathway between the instrument and BMI (the independence assumption). The allocation of genes at conception can be regarded as random, and therefore we assume that the independence assumption largely holds. However, there are some potential problems, for example: i) mating can be non-random since individuals with similar phenotypes are more likely to mate (assortative mating), ii) genes are conditional on parents genes (dynastic effects), iii) one or more non-confounding variables can modify the effect of the genetic variants on healthcare costs (effect modification), iv) some allele variants are more likely to be inherited together than one would expect from chance (linkage disequilibrium) and if these overrepresented alleles lead to increased healthcare costs through pathways that are unrelated to BMI, then the study is likely to be biased [51]. And v) there are differences in the allele-frequencies of particular population sub-groups due to differential ancestry (population stratification) [52]. The independence assumption is not fully testable, but some methods, that will be described later, have been developed to assess some of the factors that can lead to violations of the independence assumption.
3) Each genetic variant should only affect healthcare costs via BMI (the exclusion restriction). This condition cannot be verified and will be violated if genetic variants that have been shown to be associated with BMI are simultaneously associated with other phenotypic traits that are unrelated to BMI (horizontal pleiotropic effects), but related to healthcare costs. Since the biological understanding of each of the genetic variants used in our study is limited, the extent of horizontal pleiotropy is unknown. Other possible violations of the exclusion restriction include, but are not limited to: linkage disequilibrium, population stratification, and effect modification. In the next section, we will describe analyses that we have done to investigate the potential violations of the IV-assumptions.
2.5 Sensitivity analyses
To investigate the possibility of bias in our estimates we conducted a series of sensitivity analyses, including two-sample methods, non-linear analyses, sex-, healthcare provider-, and age-specific analyses, within-family analyses, and outlier removal.
2.5.1 Two sample methods
The two-sample MR methods involved combining summary data on the genetic variant-BMI association from AE Locke, B Kahali, SI Berndt, AE Justice, TH Pers, FR Day, C Powell, S Vedantam, ML Buchkovich and J Yang [29] with data on the association between the genetic variants and GP-, specialist-, and total- costs, estimated from our data. The overlap between our sample and the data used by AE Locke, B Kahali, SI Berndt, AE Justice, TH Pers, FR Day, C Powell, S Vedantam, ML Buchkovich and J Yang [29] was minimal, and both studies provide results from samples with similar ancestries (European descent), and adjust for age and sex [29]. When the samples used in two-sample analyses are independent, the two-sample estimates will be biased towards zero, rather than the observational estimate [53].
We began by testing for heterogeneity of the genetic instruments using Cochran’s Q test. If there is more heterogeneity than one would expect from chance, then this might indicate violation of the IV-assumptions, for instance due to horizontal pleiotropy [54].
We further evaluate heterogeneity by applying the following two-sample methods: Inverse variance weighted estimation (IVW), MR-Egger regression (including MR-Egger with a SIMEX (Simulation Extrapolation) correction, weighted median estimation, and weighted mode-based estimation). These methods primarily examine violations of the exclusion restriction, particularly horizontal pleiotropy. The methods rely on different assumptions that can only be partially tested. Thus, comparing findings from the different methods is advantageous as this can reveal various potential threats to the IV-assumptions.
The Inverse variance weighted (IVW) method is a weighted linear regression of the summary SNP-healthcare cost association on the summary SNP-BMI associations, with the intercept constrained to zero [55, 56]. The IVW estimate is similar to the 2SLS estimate, and will be a poor estimate if there is bias due to pleiotropy. The IVW method also requires the InSIDE assumption (Instrument strength independent of direct effects), that any pleiotropic effects on the outcome are independent of the SNP-BMI associations, and the NOME assumption, that there is no measurement error in the SNPs exposure association, to hold.
The MR-Egger method is similar to the IVW method, but the intercept is not constrained to pass through zero [54]. As with the IVW method, MR-Egger also requires that the InSIDE and NOME assumptions hold [55]. Bias due to measurement error was estimated using the regression dilution statistic ( ). If was <90%, we used the SIMEX (Simulation Extrapolation) correction [55]. When this correction is applied the SNP-BMI associations are estimated in repeated simulations before they are combined with the SNP-healthcare cost associations [55]. It is not possible to test the InSIDE assumption, but if it holds the slope estimated from MR-Egger regression can be interpreted as the true estimate under pleiotropy [54], and the intercept can be interpreted as an estimate of the average pleiotropic effect across the instruments [55]. The MR-Egger estimates are less precise and have low power [57].
The weighted median approach uses the median of the inverse-variance weighted ratio estimates [57, 58]. This method is more robust to outliers than IVW and MR-Egger, and provides a consistent estimate if at least 50% of the weight comes from genetic variants that are valid instruments [57].
The weighted mode approach uses the mode of the inverse-variance weighted ratio estimates [58]. This approach is more powerful than the MR-Egger and less powerful than the IVW and weighted-median approaches, and will give a consistent estimate if the largest weights originate from valid genetic variants [59]. This method requires that the zero modal pleiotropy assumption (ZEMPA) (i.e. that the most common effect is a consistent estimate of the true causal effect) holds [59]. This assumption was assessed by constructing density plots and inspecting these fore multiple peaks [59].
To ease interpretation of our findings, we followed A Budu-Aggrey, B Brumpton, J Tyrrell, S Watkins, EH Modalsli, C Celis-Morales, LD Ferguson, GÅ Vie, T Palmer and LG Fritsche [60], and P Dixon, W Hollingworth, S Harrison, NM Davies and GD Smith [27], and transformed the two-sample estimates to natural BMI-units by dividing the estimates by the median standard deviation (4.6) of BMI reported by AE Locke, B Kahali, SI Berndt, AE Justice, TH Pers, FR Day, C Powell, S Vedantam, ML Buchkovich and J Yang [29]. As a result, the estimates from the two-sample methods can be interpreted as the marginal effect of a one-unit increase in BMI on healthcare costs.
2.5.2 Non-linear analyses
The associations between BMI and healthcare outcomes may be nonlinear [25, 61, 62]. Genetic variants explain a relatively small proportion of variance in BMI, and therefore non-linear effects might be challenging to detect. We used a method proposed by JR Staley and S Burgess [63] to assess non-linearity in studies using genetic variants as instruments. The method includes two tests for nonlinearity: a quadratic and a fractional polynomial test. The sample was divided into 10 strata using residual BMI, and then linear IV-regression estimates were calculated for each stratum by dividing the association between the GRS and each healthcare cost outcome by the association between the GRS and BMI. Next, a meta-regression was performed, where the estimated values for each stratum are regressed against the mean of BMI in each stratum using a flexible semiparametric framework.
2.5.3 Stratified analyses
We performed analyses stratified by sex, and specialist healthcare provider (somatic hospital care, psychiatric hospital care, providers of somatic and psychiatric care that were contracted to specialist care, and contacts related to interdisciplinary specialized drug treatment).
2.5.4 Within family analyses
These analyses can provide information about the effect of possible violations of the IV-assumptions due to assortative mating, dynastic effects and population stratification [64]. There are several proposed methodological variations of within-family analyses [64]. We conducted a 2SLS regression with family-fixed effects. These analyses require more power than population-based methods.
2.5.5 Outlier removal
We identified outlying genetic variants from forest plots of the effect of each of the genetic variants on GP- and specialist costs. Next, we used the the PhenoScanner database [65], to check if these genetic variants had been found to be associated with non-BMI related phenotypes. Then we explored the effect of removing the outermost outliers from the analysis. If removal of an outlier substantially alters the estimators, and the removed variant is an invalid instrument, then including this variant in our instrument may have biased the results.
2.6 Ethical approval
The study was approved by the Regional Committee for Ethics in medical research (2016/537/REK midt).
2.7 Software
STATA 15 was used for the regression analyses and R version 3.4.1 for data processing. The MR robust package [66] was used for two-sample MR analyses.