Sample. Data for this project come from the UK Biobank (UKB Application 57923). The full UKB dataset has approximately 500K UK individuals35–37. This sample is substantially larger and more representative of the UK general population than any other publicly available, genetically-informed dataset. For the current analyses, we restrict the sample to individuals of European ancestry for two reasons. First, as allele frequencies vary across ancestral populations, and if samples from multiple ancestral populations are analyzed jointly, population stratification could undermine the validity of the genetic associations38,39. Including ancestry PCs reduces, but does not entirely remove, the impact of population stratification40. Second, existing non-Europeans sample sizes do not provide sufficient power for genome-wide analyses, both in our sample and in other available datasets41,42. The vast sample size provides ample power to detect genomic moderation.
Outcome Variables. Our analyses focus on LDL-c, HDL-c, and TG. All lipid assays were conducted on blood samples, consistent with internationally recognized testing and calibration procedures. Lipid were initially measured in mmol/L and converted to mg/dL to simplify interpretation for American audiences. Outcomes were treated as quantitative traits in the moderated GWAS analyses.
Moderator. Statins are a class of medications that were developed in the 1990s to lower cholesterol, specifically LDL-c. There are numerous statins that are commonly prescribed for hyperlipidemia. We focus on patients who self-reported taking atorvastatin, fluvastatin, pravastatin, rosuvastatin, or simvastatin (or the brand-name equivalents). Patients were coded as statin users if they took any statin regardless of whether they took any other cholesterol-lowering medication (e.g., PCSK9 inhibitors) or used vitamin/dietary supplements to reduce cholesterol (e.g., niacin). Patients were coded as non-statin users if they only used vitamin/dietary supplements. Patients were excluded from all analyses if they ONLY used PCSK9 inhibitors to reduce their cholesterol.
Genotypes. DNA was extracted from blood samples taken from all willing participants. The DNA preparation and extraction was performed at UKB, genotyping was conducted by Affymetrix, and the initial quality control of the data was conducted by the Wellcome Trust Centre for Human Genetics. Genotype imputation was conducted in IMPUTE2 using the UK10K as a reference panel. Imputed genotype data was used in all moderated GWAS analyses. Analyses were restricted to individuals of European ancestry to reduce the impact of population stratification on the results. For some individuals, individual SNPs did not pass quality control thresholds, and were excluded from the analyses for those SNPs. This resulted in a maximum sample size of 389,440.
Covariates. To minimize the impact of known predictors of lipid variability, all the moderated GWAS analyses controlled for biological sex, age, the first 10 ancestry principal components (PCs), as well as interactions between the PCs and statins. The statin-PC interactions further account for the possibility of population stratification specific to the SNP-statin interaction coefficient that is not shared with the genetic main effects.
Genome Wide Gene-Statin Interaction Analysis. All the moderated GWAS analyses of raw data were conducted in R43 using the moderated GWAS functions in GW-SEM 44,45. Moderated GWAS analyses are an extension of standard GWAS where users conduct a series of regression analyses where a phenotype is regressed on each SNP in a genomic assay, adjusting for a set of predefined covariates such as age, sex, and the first 10 ancestry principal components to account for unobserved population stratification46. As such, the standard GWAS model for each variant is:
$$\:{Y}_{i}=\:{\widehat{\beta\:}}_{0}+\:{\widehat{\beta\:}}_{1}{SNP}_{ij}+\:\widehat{\gamma\:}{Covariates}_{i}+\:{ϵ}_{i}$$
Where, in the current case, \(\:{Y}_{i}\:\)is the level of LDL-c, HDL-c, or TG for the ith person, SNPij is the jth genetic variant for the ith person, and Covariatesi are the covariates for the ith person (e.g., sex, age, and PCs). The estimate of the genetic association is \(\:{\widehat{\beta\:}}_{1}\), which is the most interesting parameter for most GWAS. The other estimates, \(\:{\widehat{\beta\:}}_{0}\) and \(\:\widehat{\gamma\:}\), denote the intercept and regression coefficients for the covariates.
Genome-wide moderation analyses expand the standard GWAS model by regressing a phenotype on three independent variables: a) each SNP in the genomic assay, b) each individual’s statin use (the moderator), and the interaction between each SNP and the moderator, in addition to the covariates. To further rule out the possibility that population stratification may bias the interaction coefficients for the individual variants, interactions between the moderator and each of the first 10 ancestry principal components are also included as covariates13. Accordingly, the model for the genome-wide moderation analyses for each variant is:
$$\:{Y}_{i}=\:{\widehat{\beta\:}}_{0}+\:{\widehat{\beta\:}}_{1}{SNP}_{ij}+\:{\widehat{\beta\:}}_{2}{Statin}_{i}+\:{\widehat{\beta\:}}_{3}{SNP}_{ij}{Statin}_{i}+\:\widehat{\gamma\:}{Covariates}_{i}+\:{ϵ}_{i}$$
In the moderated GWAS model, both main effects, \(\:{\widehat{\beta\:}}_{1}\) and \(\:{\widehat{\beta\:}}_{2}\), depend on the interaction effect, \(\:{\widehat{\beta\:}}_{3}\). Importantly, \(\:{\widehat{\beta\:}}_{3}\) provides a test of whether the effect of a SNP on the phenotype changes at different levels of the environment. This parameter is difficult to interpret directly. Therefore, it is advantageous to calculate genetic marginal effects to examine how genetic association vary across levels of an environment.46
Finally, the interaction coefficient provides a formal test of the difference in the marginal genetic effects (for each locus) between the statin-takers and the non-statin-takers. Specifically, the interaction coefficient tells us whether the genetic associations differ in the statin vs non-statin groups. This coefficient must be interpreted with caution as interactions may amplify or diminish the main effect of the SNP on the phenotype (lipid) potentially leading to non-sensical conclusions. For example, if the interaction effect (\(\:{\widehat{\beta\:}}_{3}\)) is in the opposite direction of the main effect (\(\:{\widehat{\beta\:}}_{1}\)), it is possible to have a significant interaction coefficient despite not having significant genetic associations in either the statin or non-statin groups. Accordingly, we restrict our discussion of the significant interaction coefficients to those with meaningful substantive interpretations where the marginal effect is significant for at least the statin users or non-users.
Marginal Genetic Effects. Summary statistics from the moderated GWAS of each of the lipids was used to calculate the marginal genetic effects and standard errors for the statin users and non-users. All calculations were done automatically in GW-SEM45. Genetic marginal effects are the association between a SNP and a phenotype at a specific level of an environment. In a standard GWAS model, the genetic marginal effect of the SNP is the regression coefficient (\(\:{\widehat{\beta\:}}_{1}\)). Effectively, the standard GWAS regression coefficient is the effect of the SNP on the phenotype at the mean level of the moderating environment. In a moderated GWAS, the genetic marginal effect is a function of both genetic and moderating factors 46. To calculate genetic marginal effects (\(\:{\widehat{\beta\:}}_{ME})\), we take the first derivative of the GxE GWAS model with respect to the SNP, leaving:
$$\:{\widehat{\beta\:}}_{ME}=\:{\widehat{\beta\:}}_{1}+\:{\widehat{\beta\:}}_{3}Statin$$
As respondents either took a statin (1) or did not (0), the marginal effect for the no statin group is simply \(\:{\widehat{\beta\:}}_{1}\) and the marginal effect for the statin group is \(\:{\widehat{\beta\:}}_{1}+\:{\widehat{\beta\:}}_{3}\). We can then calculate the standard errors of the genetic marginal effects (SEME) using parameters from the variance-covariance (vcov) matrix of the moderated GWAS model for each SNP by:
$$\:{SE}_{ME}=\sqrt{{\sigma\:}_{{\beta\:}_{1}}^{2}\:+{\sigma\:}_{{\beta\:}_{3}}^{2}{\text{S}\text{t}\text{a}\text{t}\text{i}\text{n}}^{2}+\:2{{\sigma\:}}_{({\beta\:}_{1},{\beta\:}_{3})}\text{S}\text{t}\text{a}\text{t}\text{i}\text{n}}$$
inserting the corresponding value of the statin group (0 or 1) that was used to calculate the genetic marginal effects. Note that for the no statin group, the SE reduces to \(\:\sqrt{{\sigma\:}_{{\beta\:}_{1}}^{2}\:}\) which is just the standard error of \(\:{\widehat{\beta\:}}_{1}\). The additional terms used to calculate the SE in the statin group incorporate any collinearity between the main effect and the interaction effect. After calculating the genetic marginal effect and the standard error, the z-statistic and p-value are easily calculated for use in subsequent analyses. This process is then repeated for each SNP that is analyzed. The process is automated in GW-SEM45, which is the only software platform that currently stores the \(\:{{\sigma\:}}_{({\beta\:}_{1},{\beta\:}_{3})}\) statistic necessary to calculate the standard error of the marginal effects.
Pharmacogenomic Analyses. The aim of the pharmacogenomic analyses was to collate our moderated GWAS results with pharmaceutical information regarding genetic targets for existing drugs using the pharmGKB.org database (the most extensive pharmacogenomic database available)19. We first extracted known gene targets for the five separate statins used as moderators in our analyses: atorvastatin, fluvastatin, pravastatin, rosuvastatin and Simvastatin. The bottom triad of Fig. 4 presents gene-targets for two or more of the statins. A full list of gene-targets is available in SI 7. Second, we queried the pharmGKB.org database for drugs that are known to target the five genes with significant interaction effects: APOB, FADS2, LDLR, SUGP1, and APOE. The corresponding gene-drug combinations for the genes that interacted with statin use are presented in the left triad of Fig. 4. Finally, we queried the pharmGKB.org database for drugs that target the 15 genes that remained significant for people who were taking statins. The corresponding gene-drug combinations for the residual genetic associations in the statin group are presented in the right triad of Fig. 4.