Mendelian randomization informs shared genetic etiology underlying exposure and outcome by interrogating correlated horizontal pleiotropy


 Mendelian randomization (MR) harnesses genetic variants as instrumental variables (IVs) to study the causal effect of exposure on trait/disease using summary statistics from genome-wide association studies. Classic MR assumptions are violated when IVs are associated with unmeasured confounders, i.e., when correlated horizontal pleiotropy (CHP) arises. Such confounders could be a shared gene or inter-connected pathways underlying exposure and outcome. We propose MR-CUE (MR with Correlated horizontal pleiotropy Unraveling shared Etiology and confounding), for estimating causal effect while identifying IVs with CHP and accounting for estimation uncertainty. For those IVs, we map their cis-associated genes and enriched pathways to inform shared genetic etiology underlying exposure and outcome. We apply MR-CUE to study the effects of interleukin 6 on multiple traits/diseases and identify several S100 genes involved in cross-traits etiology. We assess the effects of multiple exposures on type 2 diabetes across European and East Asian populations.


22
In the post-genome-wide association study (GWAS) era, many efforts were made to step beyond IVs with CHP have bi-directional effects and the effects are balanced with a mean of zero, Γ k = β 1 γ k + θ k + α k , k = 1, . . . , p. (1) where β 1 is the causal effect of exposure on outcome; θ k is the UHP effect, and α k is the 112 CHP effect of the k-th IV; and both the IV-to-outcome and IV-to-exposure effects, Γ k and 113 γ k , respectively, can be obtained from GWASs. We assume that all IVs may have UHP 114 effects, θ k , while only a proportion of IVs may also have CHP effects. As illustrated in Fig. 1d 115 right panel, we also assume that for an IV with CHP, it affects the exposure and confounder 116 proportionally, and γ k is the sum of IV-to-exposure effect via and not via confounders. We 117 rescale the IV-to-confounder effect to be 1 and the effect of confounders on exposure is then 118 γ k . In Fig. 1d, the line representing the direct effect from IV to exposure is omitted to avoid 119 over-parameterization since it is assumed to change proportionally with IV-to-confounder effect. 120 By definition, the CHP effect, α k , is correlated with IV-to-exposure effect γ k . Different than 121 existing method 9 assuming CHP effect being always proportional to γ k , here we argue that 122 different IVs may perturb the confounders (U , a gene, a gene set, or multiple pathways) in 123 slightly different ways and may lead to different CHP effects. Therefore, for an IV k with a 124 non-zero CHP effect, we decompose the effect into two components, α k = δγ k + α k . The first 125 component (δγ k ) is the IV-shared confounding effect, and is proportional to IV-strength, γ k .

126
The second component, α k , describes how the IV-specific perturbation on confounders (i.e., 127 the unique effect of the IV on the confounders) may affect outcome, and is independent of γ k . 128 We reparametraize our model as 129 Γ k = β 1 γ k + θ k , if k ∈ IV Set 1 with no CHP β 2 γ k + θ k + α k , if k ∈ IV Set 2 with CHP, where β 1 is the causal effect of interest and can be estimated using IVs without CHP. The 130 parameter β 2 = β 1 +δ is a nuisance parameter capturing both β 1 and δ, where δ is the IV-shared 131 confounding parameter due to CHP. MR-CUE is built on a Bayesian hierarchical model that 132 estimates the parameters from the above model and obtains inference via Gibbs sampling.

133
In Fig. 1b, we illustrate our model using a real data example to assess the causal effect of 134 BMI on TG. When plotting IV-to-BMI effects against IV-to-TG effects in Fig. 1b and L, the number of LD blocks) and the non-informative prior, Beta(1,1). Here, we considered 190 h 2 θ = 0.02 or 0.05, h 2 α = 0.05 or 0.1, and the correlation between α k and γ k being ρ αγ = 0.2.

191
Note that when ρ αγ = 0, only UHP is present. We also considered moderate and strong LD Additionally, we evaluated the performance of MR-CUE and other methods using real data 235 with negative and positive controls 17 , with varying IV selection thresholds. In the analyses of 236 negative control outcomes, we used self-reported tanning ability and hair color as outcome, 237 since both traits were largely determined at birth and were unlikely to be affected by other 238 traits we considered 18 . We considered 16 complex traits and diseases (  shown in the very right column. In Fig. 3a  e.g., COVID19 severity and susceptibility; any stroke (AS), any ischemic stroke (AIS) and 281 cardio-embolic stroke (CES). Those outcomes also presented similar patterns of CHP effects.

282
On the other hand, traits in mild-to-moderate genetic correlations, e.g., bone mass density 283 (BMD), blood urea nitrogen (BUN), major depressive disorder (MDD), bipolar disorder (BIP), 284 and schizophrenia (SCZ), may not share causal effect estimates but could still share CHP effect 285 patterns. CHP effects could be present when there are no causal effects. 286 We further identified the IVs with significant CHP effects, Pr(η l = 1|data) > 0.8, and 287 examined the genes in cis (1MB distance) and being associated with those IVs (p-value < 0.05).

288
The identified genes and gene sets may shed light on the shared pathways between IL-6 and 289 the examined complex outcomes. In Fig. 3b, we plotted the heatmap of selected cis-genes identified 13 S100 genes encoding S100 proteins located in the chromosome 1q21 region. The 296 S100 proteins belong to a family of calcium-binding cytosolic proteins and have a broad range 297 of intracellular and extracellular functions. The extracellular S100 proteins play a crucial role 298 in the regulation of immune homeostasis, post-traumatic injury, and inflammation 22 . S100 299 proteins trigger inflammation through their interactions with receptors for RAGE and TLR4 23 . 300 S100A12 has been shown to induce the production of pro-inflammatory cytokine IL-6 and IL-8 301 in both a dose-dependent and time-dependent manner 22 . Additionally, S100 proteins play a 302 significant role to the development of chronic inflammatory and auto-inflammatory diseases 24, 25 .

303
MR-CUE also identified some genes in cornified envelope pathway, SPRR family and IVL.

304
These genes together with S100 genes constituted the epidermal differentiation complex that are 305 essential for epidermal differentiation, building the first-line defense against external assaults 306 and protecting our bodies from dehydration 26 . Genes in ATPase complex were identified to 307 play a shared role as well. Existing literature 27 reported that the overexpression of KAT5 gene 308 potentiated transcription of downstream antiviral genes including IL-6. Other works 28 reported 309 that histone methyltransferase ASH1L suppresses TLR-induced IL-6 production.

310
The above analysis also showed that different IVs with CHP effects may be involved in identifies IVs with CHP at the desired confidence levels. 396 We studied the causal effects of IL-6 on multiple outcomes. By further examining the IVs 397 with significant CHP effects and their cis-associated genes, we highlighted multiple genes that 398 may be shared (also served as confounders) between IL-6 and some examined traits/diseases.

399
Those suggested genes included multiple S100 genes and genes in the cornified envelope pathway, The proposed MR-CUE models the IV-to-outcome effect as a function of IV-to-exposure, and uncorrelated and correlated horizontal pleiotropic effects: where θ k and α k capture UHP and CHP effects, respectively. The UHP effect is i.i.d. as 439 θ k ∼ N (0, σ 2 θ ). The IV-to-exposure effect (γ k ) and the CHP effect (α k ) are correlated, and 440 i.i.d. with a bivariate normal distribution: where ρ αγ is the correlation between γ k and α k .

460
Accounting for LD 461 We expand the MR-CUE model to allow for correlated IVs by modeling their LD structure. We 462 model the estimated effect sizes in both exposure and outcome diseases/traits with approximated 463 multivariate normal distributions 53 as follows, where R (l) emp ∈ R p l ×p l was the empirical correlation matrix for the l-th block in the panel data 490 and λ ≥ 0 was a shrinkage parameter. By obtaining all R (l) s, l = 1, . . . , L, we could further ob- Here we fixed the shrinkage parameter λ at 0.85 8 . group-level spike-slab prior as follows: where η l = 1 implies the IVs within the l-th block having nonzero projected CHP effects and 501 η l = 0 means the projected CHP effects being all zero for IVs in the block. Here, η l is a 502 Bernoulli random variable with probability ω being 1, η l ∼ ω η l (1 − ω) 1−η l .

503
Considering IVs in LD, we have the following mixture distribution for Γ lk that is similar to 504 Eqn. (8):
Since the estimated LD matrix is block-diagonal, the resulting Gibbs sampler can be performed Materials.

520
Generation of summary statistics in the simulation studies 521 We generated the summary statistics using simulated individual-level data. We first simulated 522 genotypes G x ∈ R nx×p , G y ∈ R ny×p and G r ∈ R nr×p for both exposure and outcome as well as 523 for an independent reference data, respectively, where n x , n y , and n r were the corresponding 524 sample sizes and p was the total number of IVs. We set the number of blocks L to be 100 or 525 200, and the number of IVs within a block to be 10, respectively. Correspondingly, the number 526 of IVs was either 1,000 or 2,000. For all simulations, we considered n x = 50, 000, n y = 50, 000 527 and n r = 4, 000.

528
We then generated a data matrix from a multivariate normal distribution N (0, Σ(r)), where U x ∈ R nx×q and U y ∈ R ny×q are the matrices for q confounders for exposure and 534 outcome, respectively; ψ x ∈ R q×1 and ψ y ∈ R q×1 are the corresponding vector of coefficients; 535 ǫ x ∈ R nx×1 and ǫ y ∈ R ny×1 are the random errors; and β 1 is the causal effect of interest. In all 536 simulations, we considered q = 50 and each column of U x and U y was sampled from a standard 537 normal distribution.The coefficients of these confounders, ψ x and ψ y , were sampled from a 538 bivariate normal distribution N (0, Σ ψ ), where Σ ψ was a two-by-two matrix with diagonal 539 elements of 1 and off-diagonal elements of 0.8. For CHP effects, we assumed γ k and α k following 540 a bivariate normal distribution N (0, Σ(ρ αγ )). We considered α k to be sparse, i.e., only 10% of 541 α k was sampled from the bivariate normal distribution and the others were zero. For UHP, we 542 assumed θ k to be dense and follow an independent normal distribution, N (0, σ 2 θ ). 543 We further performed the single-variant analysis to obtain summary statistics, { γ k , s γ k } 544 and { Γ k , s Γ k }, ∀k = 1, . . . , p, for both exposure and outcome, respectively. In the simulation 545 study, we controlled the magnitudes for γ, α and θ using h 2 γ = var(β 1 G 1 γ) var(y) , h 2 α = var(G 2 α) var(y) 546 and h 2 θ = var(G 2 θ) var(y) , respectively. We considered h 2 γ = 0.1 and varied h 2 θ ∈ {0.02, 0.05} and h 2 α ∈ {0.05, 0.1} to evaluate the performance of MR-CUE in selecting/identifying IVs with 548 CHP effects and in the control of type I error rates. To further examine the power, we varied 549 h 2 γ in a sequence of values from 0 to 0.1 while fixing other parameters.

550
Generation of summary statistics for reverse causation analysis 551 We considered the following structural model to generate individual-level data that is similar 552 to existing work 9 : where γ and θ are from two independent normal distributions.   The reverse causation estimation of TG on BMI confounded by CHP. When estimating the effect of BMI on TG, some IVs (red) are affected by CHP. Adjusting those IVs would lead to a significant effect estimate of BMI on TG,β 1(BMI→TG) = 0.262, and an insignificant reverse causal effect estimate from TG to BMI, β 1(TG→BMI) = 0.008. In this example, CHP would induce genetic confounding and introduce a significant and negative bias. Using estimated IVs with CHP, one may obtain significant causal and reverse causal effect estimates,β 2(BMI→TG) = −0.655, andβ 2(BMI→TG) = −0.443. (d) An illustration of the MR-CUE model. MR-CUE decomposes IVs into two sets, those not affected by CHP (left, η k = 0) and those affected by CHP (right, η k = 1). MR-CUE allows all IVs to have potential non-zero UHP effect, θ k . In (d) right panel, we assume that the IV affects the exposure and confounder proportionally, with a sum of IV-to-exposure effect of γ k . We rescale the IV-to-confounder effect to be 1 and the effect of confounders on exposure is then γ k (yellow line). The line representing the direct effect from IV to exposure is omitted since it is assumed to change proportionally with IV-to-confounder effect. The red line represents the decomposed and projected confounder-to-outcome effect and is also proportional to IV-strength, γ k . The IV-specific perturbation of confounders may induce an IV-specific bias, α k , which has a mean of zero.  The heatmap of a partial list of cis-genes that were significantly associated with at least one IV affected by CHP across multiple outcomes, with color indicating the strength of the most significant association for each gene. Cis-associations were assessed using blood tissue samples from the Genotype-Tissue Expression (GTEx) project for IVs with estimated CHP effect.