Genome Wide Pleiotropic Analysis to Identify Novel Variants and Improve Genetic Risk Score Construction

Systolic and diastolic blood pressure (S/DBP) are highly correlated and modiable risk factors for cardiovascular disease (CVD). We report here a bidirectional Mendelian Randomization (MR) and GWAS pleiotropy analysis of S/DBP summary statistics from large published BP GWAS and construct a composite genetic risk score (GRS), capturing respectively 21%, 11%, and 227% more of SBP, DBP and PP heritability than achieved with the traditional GRS. The composite GRS improves the prediction of hypertension and CVD in persons of European as well as African and Asian descent. We identied and conrmed 120 novel BP pleiotropic variants that are not in linkage disequilibrium with known variants, including 17 novel BP loci. We further observed signicant age-modulated genetic effects on BP, hypertension and CVD in both Europeans and Asians. Our study provides further insight into BP regulation and provides a novel way to construct a GRS for correlated traits. and pleiotropic Genetic risk scores (PGRS). We constructed a traditional genetic risk score using independent 1,616 genome wide signicant BP variants from UKB-ICBP. We rst constructed SBP and DBP weighted GRSs and then derived a single BP GRS as the average of SBP and DBP GRSs. This approach was used in the literature 6 to estimate the combined effect of the BP variants on BP and risks of hypertension and CVD. In addition, we constructed a pleiotropic genetic risk score PGRS using the 906 pleiotropic variants in a similar way. We rst constructed BP pleio1 and BP pleio2 weighted PGRSs and next derived a single PGRS as the difference of SBP and DBP GRSs. The composite GRS was constructed by joint model of GRS and PGRS in a linear regression. We also performed a linear regression by jointly modeling GRS and PGRS, as well as the interaction of age*GRS and age*PGRS on blood pressure in the UKB data. Similarly, we performed logistic regression of GRS, PGRS, age*GRS and age*PGRS on risk of hypertension and cardiovascular events at baseline in the UKB data. We examined whether PGRS is able to predict additional variations of BP, hypertension and CVD after accounting for GRS. We also examined the age-varying effects of GRS and PGRS by testing the interaction effects. Our analysis included 386,752 unrelated individuals of European ancestry with phenotypes measured at baseline, respectively. To assess the association of GRS, PGRS and their interactions with age on BP, risk of hypertension and CVD, we performed the regression analysis, with adjustment for sex, age, BMI, geographical region and 10 genetic principal components. CVD was dened in unrelated participants in UKB data on the basis of self-reported medical history and linkage to hospitalization and mortality data 6 . We assessed the association of the GRS, PGRS and their interactions with age on blood pressure in unrelated Africans (n = 7,904) and South Asians (n = 8,509) from the UKB to see whether blood pressure–associated SNPs identied from GWAS predominantly in Europeans are also associated with blood pressure in populations of


Introduction
Poorly controlled blood pressure (BP) accounts for a large portion of the excess risk for cardiovascular disease (CVD), stroke, and heart failure 1 . Understanding biological mechanisms for BP regulation could potentially help improve BP control and lead to a reduction in the burden of CVD. BP is characterized as systolic and diastolic blood pressure (SBP/DBP) and these measurements are long-standing risk predictors for CVD. Genome-wide association studies (GWAS) have to date been performed on BP traits separately by focusing on main effects. In studies that included subjects of diverse ancestry over 1,000 BP-associated loci have been identi ed [2][3][4][5][6][7][8][9][10][11][12][13][14] . Genome-wide search of gene-environment interactions on BP traits have been recently conducted, however limited new associations have been identi ed, in part because of low statistical power 13; 14 . Although many variants are shared among SBP and DBP as correlated traits, a portion of the variants are associated only with SBP or DBP separately, suggesting among the true positive associations there may be evidence of trait-speci c biological mechanisms. It has also been reported that a joint analysis of SBP and DBP leads to the identi cation of BP variants that are missed by analyzing SBP or DBP separately 9 . However prior studies have not addressed the mechanism underlying the relationship of SBP and DBP, which may re ect arterial stiffness or arterial compliance 15 . Dissecting the causal relationships of BP variants on SBP and DBP, in particular, whether the variants affect SBP and DBP through the same (mediation) or different (pleiotropic) paths, and how many pleiotropic variants contribute jointly to these highly correlated traits, may advance our understanding of the biology of BP regulation.
Genetic risk scores (GRS) are usually constructed as linear combinations of individual variant effects estimated from GWAS to predict individual-level risk of common disease. For BP, a nal GRS is the average of SBP-and DBP-speci c GRS 3; 6 and the GRS predicts the risk of CVD. However, published BP GRS's have explained no more than 6% of the heritability, and have limited predictive power for HTN and CVD. GWAS of gene-age interaction analysis have identi ed genetic variants with age-dependent effect sizes, including BP 16; 17 , lipid levels 18 and BMI 19 . A recent study based on a proportional hazards model reported age-varying risk pro les in nine diseases, including HTN 20 . However, these studies were under powered because the interactive contribution of a variant by age is often weak.
In this study we address the underlying mechanistic relationship between SBP and DBP by performing a bidirectional Mendelian Randomization (MR) 21 and GWAS pleiotropy analysis using the summary statistics from >750,000 subjects of European ancestry from the UKB and ICBP consortium 6 , followed by the summary statistics of 318,891 subjects of European ancestry from the Million Veteran Program (MVP) 12 . We searched for novel BP variants with pleiotropic effects and constructed a composite GRS using variants with and without pleiotropic effects, and studied the age-varying effects of GRS on prediction of BP, HTN and CVD in groups of European, African and Asian descent.

Results
We present a bi-directional MR analysis of SBP and DBP using the 1,125 and 1,183 independent genome-wide signi cant variants for SBP and DBP (P< 5×10 -8 ) as genetic instrumental variables (IVs) obtained from the UKB-ICBP GWAS 6 and software IMRP 22 and MRmix 23 (Online Methods). We standardized SBP and DBP and obtained an identical causal effect of SBP on DBP and DBP on SBP (0.864 ± 0.005 and 0.862 ± 0.005 by IMRP, respectively, Supplementary We next extended the pleiotropic effect analysis to search for variants with evidence of pleiotropy for SBP and DBP across the whole genome (OnlineMethods). The Manhattan and QQ plots for testing pleiotropy are presented in Fig 1. LD score regression analysis 24 estimated that 8.7% of the heritability was due to pleiotropic variants. We observed 906 independent variants (r 2 <0.05) reaching genome-wide signi cance level in pleiotropy tests (P<5×10 -8 ), 234 of which had not been detected by the univariate GWAS analysis of SBP, DBP or PP in the original UK Biobank+ICBP consortium 6 (Supplementary Table 2). Among the 234 variants, 201 variants were novel and were not reported in any previous BP GWAS. In the set of associations, 163 variants in 124 loci were within a 1Mb region of previous reported known BP loci but were not in LD with known BP variants (r 2 <0.05); the remaining 38 variants were 1Mb away from the previous reported known BP loci, and resided in 35 loci; the corresponding locus zoom plots are presented in Supplementary   Table 2), suggesting the identi ed novel pleiotropic signals are not false positives. Functional analysis. We performed expression quantitative trait locus (eQTL) analysis using GTEx data. Among the 35 novel loci, we identi ed 26 novel loci with expression quantitative trait locus (eQTL) (Supplementary Table 4) and 11 novel loci with Splicing Quantitative Trait Loci (sQTLs) (Supplementary Table   5). The eQTLs were most often enriched in arterial tissues, followed by adipose, heart and nerve tibial tissues. SNP rs17713879 is an eQTL affecting expression of the SH3YL1 and ACP1 genes in 34 tissues and is also a sQTL affecting splicing of these two genes in 50 tissues. SNP rs112500920 is an eQTL affecting expression of several genes, including EFL1 and AB3B2, in multiple tissues, notably adipose and arterial tissues. SNP rs12478520 is an eQTL affecting expression of multiple genes, including C2orf72, HTR2B, ARMC9 and PSMD1.
We de ned mediation variants as those variants associated with SBP or DBP with P< 5 × 10 -8 but not signi cant in a test for pleiotropy, and pleiotropic variants as the variants reaching genome wide signi cance (P< 5 × 10 -8 ) in a pleiotropy test using the UKB-ICBP data. We assessed tissue enrichment of BP loci using DEPICT 27 at a false discovery rate (FDR) < 5% but separated mediation from pleiotropic variants in the analysis. There were 1,324 independent mediation variants and 906 independent pleiotropic variants. DEPICT analysis identi ed enrichment across 42 and 72 tissues and cells using mediation and pleiotropic variants, respectively (Supplementary Table 6). The enriched tissues are highly similar (correlation=0.78) but notable differences were also observed (Supplementary Figure 2a). Enrichment was greatest for arteries in the cardiovascular system for both mediation and pleiotropic variants (P= 2.8 × 10 -4 and 2.41 × 10 -11 , respectively). In general, enrichments observed for mediation variants were also observed for pleiotropic variants, but not vice versa. For example, heart ventricle, heart valves, heart atria and the atrial appendage were enriched for pleiotropic variants (P< 1.23 × 10 -3 ) but not for mediation variants (Supplementary Table 6). Pathway enrichments for mediation variants and pleiotropic variants were less well correlated (correlation=0.478, Supplementary  Figure 2b). Pleiotropic variants were enriched in many molecular pathways that were missed by mediation variants, including oxygen levels, basement membrane, and renal system development (P<2.23×10 -7 for pleiotropic variants, Supplementary Table 7). Evaluation of enriched mouse knockout phenotype terms pointed to the importance of abnormal kidney morphology, impaired wound healing and increased body weight, etc (P< 3.28 × 10 -8 ) for pleiotropic variants, although these were absent for mediation variants. Protein-protein interaction subnetwork enrichments were also substantially different; using the pleiotropic variants we identi ed the HSPG2, DDR1, MMP9, COL4A4, TGM2 and COL4A5 subnetworks (P<1.90× 10 -7 ) which were missed relying solely on mediation variants (Supplementary Table 7).
Improved prediction of BP, hypertension and CVD by including pleiotropic variants. It is known that polygenic scores derived from multiple related traits can improve prediction of outcomes 28-31 but these approaches do not use the pleiotropic variants directly. Here we constructed a traditional BP GRS using all independent 1,616 BP variants from UKB-ICBP. We further constructed a pleiotropic genetic risk score (PGRS) using the 906 pleiotropic variants (Online Methods). We jointly modeled GRS and PGRS adjusting for age, gender, BMI and 10 principal components, and observed PGRS signi cantly predicted BP traits, as well as risk of HTN and CVD conditional on GRS in all the models (Table 2A and Figure 2) in UKB European ancestry subjects. The variance explained by GRS only was 5.91%, 6.09% and 2.23% for SBP, DBP, and PP, increased to 7.13%, 6.75% and 7.27% when including PGRS, respectively, suggesting independent additive predictive power for the PGRS. Interestingly the PGRS had opposite directional effects for SBP and DBP. We observed odds ratios (ORs) 1.64 and 1.15 for individual GRS and PGRS on risk of HTN (P < 1 × 10 -300 ), respectively. The observed ORs of individual GRS and PGRS for CVD were 1.21 and 1.06 (P = 1.71 × 10 -231 and P = 2.74 × 10 -25 ), respectively, and increased to 1.66 and 1.19 (P = 9.83 × 10 -97 and P = 1.28 × 10 -13 ) when comparing between the upper and lower quantiles of the GRS and PGRS, respectively. However, the odds ratios were further increased to 6.78 and 2.44 (P < 1 × 10 -300 and P = 1.15 × 10 -39 ) for HTN and CAD, respectively, when comparing the top decile and quintile with bottom decile and quintile of GRS and PGRS ( Figure 2). We observed a clear advantage of including PGRS over GRS only ( Figure 2). After including the interactions of age and GRS and PGRS the main effects for GRS and PGRS on BP, HTN and CVD were unchanged. However, we observed a signi cant interaction effect of age and GRS for all BP traits and HTN, but not for CVD. The interaction of age and PGRS signi cantly contributed to all BP traits, HTN and CVD (P value between 3.0×10 -2 and 7.74×10 -29 , Table 2A).
Extension to other ancestries. We examined associations with BP and CVD of the above de ned GRS and PGRS in unrelated African (N=7,904) and South Asian (N=8,509) subjects in the UKB (Table 2 B, C). Although sample sizes were much smaller than among UKB European subjects, the GRS was signi cantly associated with SBP, DBP, PP, HTN and CVD in both UKB African-and Asian-descent subjects. PGRS was associated with a 1.21, -0.35 and 1.56 mmHg higher for SBP, DBP and PP (P =8.34 × 10 -8 , 1.09 × 10 -2 and 2.05 × 10 -27 ) in Africans, a 1.25, -0.64 and 1.89 mmHg higher (P =3.21 × 10 -10 , 1.69 × 10 -8 and 1.50 × 10 -46 ) in Asians, respectively. The ORs of PGRS for HTN and CVD were signi cant (P = 9.27 × 10 -5 and P = 2.81 × 10 -2 ) in Asians but not in Africans. The effects of PGRS on DBP were also negative as observed in UKB Europeans. Signi cant interactions of age and PGRS were again observed for BP traits and CVD in UKB Asians (Table 2 C).
Mendelian Randomization of BP on CAD, MI and stroke. We downloaded the published GWAS summary statistics for CAD, MI and stroke and performed MR analysis of BP on CAD, MI and Stroke, using mediation and pleiotropic variants as IVs separately (Online Methods). Both SBP and DBP causally contributed to CAD, MI and stroke using mediation as the IVs. The new trait -BP pleio -also causally contributed to CAD, MI and stroke using pleiotropic variants (Online Methods, Table 3). The estimated ORs ranged from 1.66 to 1.85 per SD unit increase in BP on the three outcomes using mediation variants, and ranged from 1.13 to 1.48 using pleiotropic variants. Our analysis identi ed 10 to 22% IVs demonstrating pleiotropic effects for BP and the three clinical outcomes (Table 3).

Discussion
Our analysis of the summary statistics derived from over 1 million subjects describe important aspects of the genetic architecture of the two principle highly correlated BP traits. The bi-directional MR analysis of SBP and DBP demonstrated that 1 SD unit increase of SBP leads to 0.86 SD unit increase of DBP, and vice versa, indicating SBP and DBP share 74.8% of the variation, presumably attributing to common factors and common biological mechanisms. This shared causal contribution is substantially higher than 55.6% estimated by phenotype correlation analysis between SBP and DBP. Most of the BP variants identi ed through SBP or DBP univariate association were mediation variants and these variants contribute to the shared causal contribution and would be expected to be found in either the SBP or DBP GWAS. The pleiotropic variants are associated with the rest of the BP variation, which is conferred through different biological pathways for SBP and DBP. The pleiotropic variants often demonstrated an effect size that was opposite in direction for SBP and DBP and contributed 8.71% heritability of the new BP trait. The pleiotropy de ned trait BP pleio was highly correlated with PP (ρ ≥0.66) in UKB whites, taken as an indicator of arterial stiffness and considered as an independent risk factor for CVD 32 . Thus, the BP pleio can be an independent risk factor for CVD. This is consistent with the nding that SBP, DBP and BP pleio all contributed to risk estimation of CAD, Stroke and MI in the MR analysis (Table 3).
In addition to a large number of known genetic variants that contribute to SBP and DBP reported by the original UK Biobank+ICBP consortium 6 , we identi ed 906 independent variants demonstrating pleiotropy evidence (P<5×10 -8 ), 201 were undetected by the univariate GWAS analysis of SBP, DBP or PP. Replication analysis in the MVP con rmed 120 of the 201 novel variants, including the 17 novel loci.
In addition to the traditional BP GRS calculated in literature 3; 6 , we de ned a composite genetic risk score consisting of GRS and PGRS. The joint model of GRS and PGRS suggested both genetic risk scores independently predict BP, HTN and CVD outcomes (Table 2) in the UK Biobank European ancestry subjects. Additionally, including the PGRS led to substantial increments in heritability for BP traits (Table 2 and Figure 2). Although we observed consistent opposite directional effects of PGRS for SBP and DBP in UK Biobank Europeans, Africans and Asians, the prediction of HTN and CVD risk was signi cantly improved by including PGRS (Figure 2 C and D). The GRS and PGRS de ned on European participants both consistently and signi cantly predict BP, HTN and CVD in UK Biobank Africans and Asians, suggesting that PGRS is able to improve prediction accuracy across ethnic populations. Recent studies have suggested that current GRS models alone have modest improvement of predictive accuracy for CAD [33][34][35] . The principle outcome of this set of analyses, therefore, demonstrated that adding PGRS signi cantly improves the prediction model over the GRS alone. This approach of constructing polygenic risk score is conceptually different from the existing approaches using multiple related traits [28][29][30][31] and could be generalized to other diseases by incorporating multiple disease-related traits through pleiotropy analysis.
Our analysis avoided an examination of the interaction of individual variants and age because of insu cient power. We were able to observe the interaction effects of both GRS and PGRS with age in UKB Europeans for SBP, DBP, PP and HTN although the interaction for CVD was only signi cant for PGRS ( Table 2). The age-PGRS interactions were also replicated in Asians despite a substantially smaller sample size. We observed that the interaction contribution to phenotype variation was consistently small (range from 0.1% to 0.3% BP heritability in both UKB Europeans and Asians). The negative interaction contributions of both GRS and PGRS on DBP may partially explain the decline of DBP after 60 years older 36 . In comparison, the interaction of age and GRS was positive for SBP, suggesting genetic effect on SBP increases in older individuals. As noted, the GRS interaction effects in UKB Europeans could not be observed in UKB Africans or Asians, likely as a result of small sample sizes. In comparison, the age-modulated interaction of PGRS were observed in both UKB Europeans and Asians, indicating stronger PGRS interaction effects than for the GRS. In our functional analysis, we observed a wider range of BP related tissues and biological pathways for the BP pleiotropic variants than mediation variants, which may imply that the pleiotropic variants are in uenced by a wider range of environmental factors and therefore continue to make an evolving genetic contribution over the age span. Our study also supports an omnigenic model for complex traits [37][38][39] . In fact, it could be inferred that a pleiotropic variant acts on multiple peripheral genes to impact the expression of a core gene in different biological pathways. As a result, the pleiotropic variants have weak effects on a phenotype and are more di cult to detect in a traditional BP GWAS that focuses on single trait analysis, as we observed in the current analysis. In comparison, mediation variants may be more likely to occur in core genes, although it is clearly important to detect the putative set of core genes as well. We acknowledge that the data presented here can only provide suggestive evidence for that hypothesis.
In conclusion, the new ndings we bring forward here include the 906 independent BP pleiotropic variants -201 of which were previously not identi ed in traditional BP GWAS -and a novel way to construct polygenic risk score represent a substantial advance in understanding the genetic architecture of highly  12 The MVP summary statistics were used for replication analysis.
Mendelian Randomization analysis.We applied the iterative Mendelian randomization and pleiotropy analysis (IMRP) 22 and MR mixture model (MRmix) 23 for bi-directional MR analysis of SBP and DBP, as well as to estimate the causal contributions ofBP on CAD, MI and STROK. IMRP is an iterative approach by combining the pleiotropy test and the MR analysis. The iteration starts by performing MR-Egger analysis 40 to estimate causal effect of an exposure to outcome, following by inverse variance weighted (IVW) 41; 42 analysis until the causal effect estimation converges.At each iteration step, IMRP perform pleiotropy test to update which genetic instrument variants showing pleiotropy evidence (P<0.05) by performing the test , where and are the estimated effect sizes of an IV on the exposure and the outcome, respectively, and is the causal estimate which is updated at each iterative step. IMRP takes the advantages of MR-Egger, which is less bias, and IVW, which is e cient. IMRP can be applied to GWAS summary statistics of an exposure and an outcome obtained with overlapped or non-overlapped samples. To ensure the causal estimate is robust, we also applied a substantially different MR approach MRmix 23 , which is an estimating equation approach that assumes follow a normal mixture model.MRmix usually shows a good trade-off between bias and variance even with more than 50% invalid IVs 23 .
GWAS of pleiotropy analysis for SBP and DBP. We rst perform a bi-directional MR analysis by IMRP to estimate the causal effect using genome wide signi cant independent variants associated with SBP and DBP separately in UKB-ICBP summary statistics. We then perform the pleiotropy test to all the 7.1M SNPs by xing the causal effect estimated in the IMRP analysis. This is equivalent to performing GWASs for two new traits: BP pleio1 =DBP SBPand BP pleio2 =SBP DBP, where and are the estimated causal effects of SBP on DBP and DBP on SBP, respectively. We noted BP pleio1 and BP pleio2 were highly correlated and represented essentially one phenotype BP pleio . We declared a variant is signi cant when its P-value is less than 5×10 8 . In the replication analysis in using MVP summary statistics, we applied the causal effect estimate obtained in UKB-ICBP data. Thus, the replication analysis had the same analytic model as in the UKB-ICBP.
Genomic in ation and confounding. We applied the LD score regression method 24 to test for genomic in ation in the GWAS pleiotropy analysis. It is expected that BP pleio will have large genomic control in ation coe cient because of large sample sizes and dense genetic variants in high LD 43 . The GC lambda was 1.533 and LDSR intercept was 1.057 (0.013), with in ation ratio 4.23%, suggesting little in ation in the pleiotropic analysis.
Novel locus de nition. Novel loci were de ned as the variants reaching genome wide signi cance and are 1Mb away from known BP variants as well as LD r 2 <0.05 with any known BP variants. Novel signals at known loci are variants within 1 Mb region of known BP variants and reach genome-wide signi cance level, as well as not being in LD with any known BP variants (r 2 < 0.05). The 1000G European ancestry data was used as the reference genetic data for LD calculation.
Functional analyses. We evaluated all sentinel SNPs at novel loci for evidence of mediation of expression quantitative trait loci (eQTL) and alternative splicing quantitative trait loci (sQTL) in all 44 tissues using the Genotype-Tissue Expression (GTEx) database. Following the method in Evangelou et al. 6 , a locus is annotated with a given eGene(sGene) only if the most signi cant eQTL(sQTL) SNP for the given eGene(sGene) is in high LD (r 2 ≥ 0.8) with the sentinel SNP.
We performed overall enrichment testing but using the mediation and pleiotropic variants separately. We used DEPICT 27 (Data-driven Expression Prioritized Integration for Complex Traits) to identify tissues and cells that are highly expressed at genes within the BP mediation loci, as well as BP pleiotropic loci. We also used DEPICT to test for enrichment in gene sets associated with biological annotations including GO ontology, mouse knockout phenotype studies and protein-protein interaction subnetworks. We reported signi cant enrichments with a false discovery rate < 0.05. Analysis was done using the platform: Complex-Traits Genetics Virtual Lab 44 .
Genetic risk score (GRS) and pleiotropic Genetic risk scores (PGRS). We constructed a traditional genetic risk score using independent 1,616 genome wide signi cant BP variants from UKB-ICBP. We rst constructed SBP and DBP weighted GRSs and then derived a single BP GRS as the average of SBP and DBP GRSs. This approach was used in the literature 6 to estimate the combined effect of the BP variants on BP and risks of hypertension and CVD. In addition, we constructed a pleiotropic genetic risk score PGRS using the 906 pleiotropic variants in a similar way. We rst constructed BP pleio1 and BP pleio2 weighted PGRSs and next derived a single PGRS as the difference of SBP and DBP GRSs. The composite GRS was constructed by joint model of GRS and PGRS in a linear regression. We also performed a linear regression by jointly modeling GRS and PGRS, as well as the interaction of age*GRS and age*PGRS on blood pressure in the UKB data. Similarly, we performed logistic regression of GRS, PGRS, age*GRS and age*PGRS on risk of hypertension and cardiovascular events at baseline in the UKB data. We examined whether PGRS is able to predict additional variations of BP, hypertension and CVD after accounting for GRS. We also examined the age-varying effects of GRS and PGRS by testing the interaction effects. Our analysis included 386,752 unrelated individuals of European ancestry with phenotypes measured at baseline, respectively. To assess the association of GRS, PGRS and their interactions with age on BP, risk of hypertension and CVD, we performed the regression analysis, with adjustment for sex, age, BMI, geographical region and 10 genetic principal components.
CVD was de ned in unrelated participants in UKB data on the basis of self-reported medical history and linkage to hospitalization and mortality data 6 .
We assessed the association of the GRS, PGRS and their interactions with age on blood pressure in unrelated Africans (n = 7,904) and South Asians (n = 8,509) from the UKB to see whether blood pressure-associated SNPs identi ed from GWAS predominantly in Europeans are also associated with blood pressure in populations of non-European ancestry. All the analyses were performed using the residuals after adjusting for sex, age, BMI, geographical region and 10 genetic principal components.
Cross-trait lookups of novel loci: We supplied the index SNPs at the novel loci observed in UK Biobank-ICBP pleiotropic analyses to FUMA 26