Incorporating functional annotation with bilevel continuous shrinkage for polygenic risk prediction

Background: Genetic variants can contribute differently to trait heritability by their functional categories, and recent studies have shown that incorporating functional annotation can improve the predictive performance of polygenic risk scores (PRSs). In addition, when only a small proportion of variants are causal variants, PRS methods that employ a Bayesian framework with shrinkage can account for such sparsity. It is possible that the annotation group level effect is also sparse. However, the number of PRS methods that incorporate both annotation information and shrinkage on effect sizes is limited. We propose a PRS method, PRSbils, which utilizes the functional annotation information with a bilevel continuous shrinkage prior to accommodate the varying genetic architectures both on the variant-specific level and on the functional annotation level. Results: We conducted simulation studies and investigated the predictive performance in settings with different genetic architectures. Results indicated that when there was a relatively large variability of group-wise heritability contribution, the gain in prediction performance from the proposed method was on average 8.0% higher AUC compared to the benchmark method PRS-CS. The proposed method also yielded higher predictive performance compared to PRS-CS in settings with different overlapping patterns of annotation groups and obtained on average 6.4% higher AUC. We applied PRSbils to binary and quantitative traits in three real world data sources (the UK Biobank, the Michigan Genomics Initiative (MGI), and the Korean Genome and Epidemiology Study (KoGES)), and two sources of annotations: ANNOVAR, and pathway information from the Kyoto Encyclopedia of Genes and Genomes (KEGG), and demonstrated that the proposed method holds the potential for improving predictive performance by incorporating functional annotations. Conclusions: By utilizing a bilevel shrinkage framework, PRSbils enables the incorporation of both overlapping and non-overlapping annotations into PRS construction to improve the performance of genetic risk prediction. The software is available at https://github.com/styvon/PRSbils

to the heritability of complex traits, and recent studies have shown that the incorporation of 48 functional annotation can improve the predictive performance of PRS (Hu et al., 2017;49 Marquez-Luna et al., 2020). In addition, for certain phenotypes where only a small 50 proportion of variants are causal, PRS methods with Bayesian continuous shrinkage 51 framework have been proposed to account for such sparsity (Ge et al., 2019). It is possible 52 that the annotation group level effect is also sparse. Although existing PRS methods improve 53 prediction accuracy through utilizing GWAS summary statistics and accounting for the 54 potential sparsity of the genetic architectures, the number of PRS methods that incorporate annotation information and apply continuous shrinkage on effect sizes is limited. For 56 example, the sparsity assumption for the underlying genetic architectures is generally made 57 either on a global level (Ge et al., 2019) or by partitioning variants into bins with similar sum 58 of squared posterior mean effect sizes (Chun et al., 2020;, and 59 there is a lack of study on addressing sparsity across the annotation groups. where ∼ (0, 2 ), ( 2 ) ∝ −2 , i.e., | , , 2 follows a multivariate Normal 78 distribution with mean and covariance matrix 2 . 79 For high dimensional genetic data, the number of genetic variants is much larger than the 80 number of individuals , and it is often assumed that the genetic effect vector is sparse, 81 meaning that only a small amount of the variants are associated with the outcome 82 phenotype. Under this sparsity assumption, the prior distribution of can be chosen to be 83 either a discrete or a continuous mixture of Normal distributions. The discrete mixture type 84 of prior is also known as the spike-and-slab prior (George and McCulloch, 1993), and is a 85 combination of a point mass at 0 and a density for the non-zero part. 86 The continuous mixture type of prior assigns with a continuous distribution centered at 0. 87 One commonly used set of priors is the global-local shrinkage priors (Polson and Scott, 2010), 88 which utilizes both a global shrinkage parameter 2 and local marker-specific parameters 89 standard half-Cauchy distribution can be decomposed into a scale mixture of inverse 132 Gamma distributions (Makalic and Schmidt, 2015 = ∑ { : = }̃ is the group-wise score, and is the group-specific weight for 155 group . We obtain the estimates for 1 , . . . , through 10-fold cross-validation using a 156 separate validation data, which includes individual-level genotype and phenotype data. We 157 use this group-wise combination approach based on the observation that signal-enriched 158 annotations are more informative for prediction, and weighting each partition differently 159 can further improve PRS performance (Chun et al., 2020). 160 We also investigated a hybrid method that is a combination of PRSbils and conventional 161 PRS methods. In this case, we construct the PRS score by 162 where ̃′ denotes the genetic effect size generated from conventional PRS methods. 164 PRSbils with overlapping annotation assignment Under the situation where a variant 165 belongs to multiple annotation groups, for example, one variant can be involved in multiple pathways, we account for such a variant separately in each of the annotation groups it 167 belongs to. The assumption underlying this is that variants with more annotations have 168 potentially larger contribution to the heritability and therefore will be shrunk less and have 169 larger posterior effect sizes in general (Chen et al., 2020). The total number of variants in the 170 overlapping setting will become ′ = ∑ ∑ =1 =1 ( ∈ ) , and the local shrinkage 171 parameters will be assigned to each of the ′ variants: 172 For all genetic data in the UK Biobank, MGI and KoGES, NCBI Build 37/UCSC hg19 was used 222 for genomic coordinates. We further restricted our analysis to HapMap3 SNPs with minimum 223 MAF > 0.01, HWE p-value > 1 × 10 −6 , variant call rate > 95%, individual missing rate < 1%, 224 and LD-pruning R2<0.99. LD information from 503 European samples in the 1000 Genomes 225 reference panel for the UK Biobank and MGI data, and 1KG East Asian reference panel was 227 used for the KoGES data. 228

Simulation studies 229
We conducted simulation studies to compare the performance of PRSbils to PRS-CS. We also 230 evaluated a hybrid method of PRSbils with PRS-CS, in which the scores from PRS-CS was 231 combined with the one from the proposed method. We also compared the predictive 232 performance with LDpred-funct for non-overlapping annotation groups. A total of = 233 125,000 SNPs were sampled from the UK Biobank data with above-mentioned quality 234 control filters, with 1KG as LD reference panel. The sampled variants were then assigned to 235 different annotation groups, which explain 1 , . . . , % of the total heritability ℎ 2 236 respectively. For each annotation group , the proportion of causal variants is denoted as . 237 Genetic effect sizes were generated from a mixture of point-Normal models specified as: 238 We investigated five simulation settings with different , and with non-overlapping 240 annotation groups, i.e., each variant is mapped to one and only one annotation ( Table 1). For 241 settings 1-4, we fixed the number of annotation groups to 4 ( = 4), the proportion of causal 242 variants in each group to 0.5%, 1%, 1.5%, 2% respectively, and vary the proportion of the 243 total heritability explained by each group from a relatively sparse scenario ( = 244 (0,0,10%, 90%)) to a more balanced scenario ( = (25%, 25%, 25%, 25%)). For setting 5, we changed the number of annotation groups to = 10, and considered a situation with 246 more group-wise sparsity where only two of the groups contribute to the total heritability. 247 In addition, we simulated two settings (settings 6 and 7 in Table 1) with different 248 overlapping patterns (Supplementary Figure 1). For both settings, we used four annotation 249 groups contributing 0,0,10%,90% to the total heritability. Overlapping pattern I was used in 250 setting 6, where the intersection over union metric (IOU) was higher among the annotation 251 groups with low heritability contribution. For setting 7, overlapping pattern II was used, 252 where IOU was higher among the annotation groups with high heritability contribution. 253 We then simulated the phenotypes using the sum of all SNPs weighted by their 254 corresponding genetic effect sizes, together with a Normal random error term to fix the 255 heritability at ℎ 2 = 0.7. 256 To obtain the summary statistics, we performed GWAS to calculate the marginal genetic 257 effect size estimates ̂ using SAIGE 20 version 0.44.3, which is a computationally efficient 258 method that controls for case-control imbalance as well as potential sample relatedness, 259 among = 50,000 simulated individuals. The summary statistics were used as input 260 for PRSbils and PRS-CS. 261 The prediction performance was evaluated for both methods on a separate test set consisting 262 of = 24,000 simulated individuals. AUC and 2 were used to measure the prediction 263 accuracy. To obtain AUC, we binarize the phenotypes assuming those with top 10% highest 264 phenotype values as the true at-risk population.
We analyzed both binary and quantitative traits for three data sources, i.e., the UK Biobank 267 data, the MGI data, and the KoGES data. For the UK Biobank and the MGI data, we studied 268 type II diabetes as a binary trait, and BMI and LDL as quantitative traits. For the KoGES  Biobank. Two types of group information were used for PRSbils. The first was 186 pathway 290 annotations from the Kyoto Encyclopedia of Genes and Genomes (KEGG), which includes the 291 networks for metabolism, genetic information processing, environmental information 292 processing, cellular processes, organismal systems, human diseases and drug 293 development (Kanehisa and Goto, 2000). Variants were first mapped to Ensembl genes by 294 position, and then from genes to KEGG pathways. The mapping of genetic variants to KEGG 295 annotations is not unique, which means each variant can have multiple KEGG annotations. 296 The second one was six non-overlapping Refseq gene-based functional annotations (i.e. 297 exonic/splicing; ncRNA; UTR5/ UTR3; intronic; upstream/ downstream; intergenic) 298 obtained using ANNOVAR (Wang et al., 2010). For variants without any available 299 annotations, PRSbils assigned them to a separate group for no annotations. 300 For each trait, we adopted a 10-fold cross-validation approach where the parameters were 301 estimated using a random sample of 9/10 of the test set, and the performance was validated 302 on the rest 1/10 of the samples in terms of and 2 . For the binary trait, we used Efron's 303 pseudo 2 (Efron, 1978) instead of the ordinary least square 2 . For the continuous traits 304 BMI and LDL, we calculated the AUC by binarizing them with thresholds of 25 (threshold for 305 overweight) and 4.1 mmol/L (or 160mg/dL, threshold for high LDL) respectively. 306 307

Simulation study results 309
We evaluated prediction performance of PRSbils, PRS-CS and the hybrid method in seven 310 simulation settings (Table 1). For the settings with non-overlapping annotation groups 311 (Settings 1-5), we also evaluated the performance of LDpred-funct. Since the estimated per-312 SNP heritability might not be stable due to the relatively small sample size, we used the true 313 stratified heritability values for LDpred-funct. 314 Simulation Settings 1-4 consider a fixed number of total annotation groups = 4. In Setting 1 and Setting 5, two groups contribute 10% and 90% to the total heritability, but 331 the total number of annotation groups differ ( = 4 in Setting 1 and = 10 in Setting 5). 332 PRSbils yields similar performance in these two settings, but has a slightly larger variability 333 in Setting 5, which yielded with an average AUC of 0.574 (95%CI: [0.552,0.595]) (Figure 1). 334 This is likely because the proposed method only shrinks groups with no heritability 335 contribution to a small value but not exactly to zero, and therefore the large proportion of 336 no heritability annotation groups, the noisier the PRS value will be, making the performance 337 to fluctuate more. 338 In Settings 6 and 7, we investigated the influence of using different overlapping patterns of 339 annotation groups on the performance (Figure 2). PRSbils yielded higher predictive 340 performance compared to the benchmark method in both settings, with an average 8.4% 341 gain in AUC for Setting 6 and 0.8% for Setting 7. An explanation for the difference in the 342 performance gain is that the high IOU between annotation group 3 and group 4 in Setting 7 343 resulted in a higher correlation in the group-wise shrinkage parameters, which reduced 344 PRSbils's ability to differentiate between these groups with different heritability 345 contribution. These results suggest that under the overlapping annotation scenario, 346 choosing an annotation mapping which better separates the potential heritability-347 contributing sets may improve the prediction accuracy. 348

Biobank data analysis 349
We used the summary statistics as the training data to obtain the posterior genetic effect 350 estimates, and evaluated the performance of both the proposed and existing PRS methods 351 using the UK Biobank data (Figure 3, Supplementary Figure 2), the MGI data (Figure 4, Supplementary Figure 3) and the KoGES data (Supplementary Figure 4) as the test data. 353 Each box plot contains the 10 results from the 10-fold cross validation using the test data. 354 For the UK Biobank data, PRSbils with KEGG annotation outperformed PRS-CS for both type 355 II diabetes and BMI. PRSbils yielded a 9.9% improvement in AUC over PRS-CS for type II 356 diabetes on average, and a 1.5% average improvement for BMI. When gene-based 357 annotations derived from ANNOVAR were used, from PRSbils was on average 3.8% 358 higher than PRS-CS for BMI, yet the predictive performance for type II diabetes was similar 359 among the methods. For only one scenario with LDL trait using gene-based annotation, 360 PRSbils yielded lower prediction performance than PRS-CS. We also note that the prediction 361 performance varies across different populations for different annotation information. For 362 the MGI and KoGES data, PRSbils did not yield better prediction performance than the 363 benchmark method for the traits analyzed. One potential factor that can lead to this 364 difference in performance is the cohort difference: The UKB is population-based and consists PRSbils can be applied to both non-overlapping and overlapping annotations. When the 391 annotation categories overlap with each other (i.e., one SNP can belong to multiple 392 annotation categories), the posterior effect size is calculated and incorporated into the PRS 393 separately for each category a variant belongs to. The underlying assumption for this framework is that SNPs belonging to more annotation categories are prioritized for genetic 395 risk calculation as they are more likely to be causal. It is similar to the idea of penalizing the 396 SNPs with multiple annotations less than those with only one annotation category in a 397 penalized regression framework (Chen et al., 2020). As has been illustrated in the simulation 398 studies, sparsity of the underlying heritability enrichment from each annotation group is a 399 key factor for the predictive performance of PRSbils. When multiple groups are included in 400 an annotation categorization, PRSbils is expected to yield a larger performance improvement In addition, the application of the proposed method using the KEGG annotations explores the 412 pathway-level knowledge of polygenic risk for complex diseases, and provides an alternative 413 way to stratify genetic liability in addition to the commonly used functional annotation, 414 which adds to existing literature's ongoing investigation into the use of pathway PRS as a 415 more informative way for patient stratification and treatment response prediction (Choi et 416 al., 2021). The average shrinkage for each KEGG annotation group can provide biological or clinical interpretations such as how different pathways weigh in terms of their relative 418 importance for disease risk prediction. We illustrate this with the group-wise average 419 shrinkage results from the UKB analysis (Supplementary Figure 7). 420 The shrinkage parameters in PRSbils control the degree of shrinkage across annotation 421 categories, and are automatically learned from the summary statistics in this study. An 422 alternative way to specify the values for is to fix them using prior knowledge about the 423 annotation-level sparseness of the genetic architecture. If the sample size of the training set 424 to train is small, we expect the latter approach to yield higher predictive performance, 425 because the current fully Bayesian approach would generate less stable estimates for the 426 shrinkage parameters under this situation. Indeed, with additional simulations we 427 confirmed that the performance of the current PRSbils approach would have a higher 428 variance when the sample size for the training set was small (Supplementary Figure 8). 429 We also explored the influence of annotation misclassification on prediction performance 430 through additional simulation studies. Genotype and phenotype data were generated using 431 the same settings as in Settings 1-5 (Table 1) with "true" corresponding annotation group 432 assignment. Then, the variants were assigned a random "observed" annotation group with 433 equal probability to train and test for prediction performance. The results showed that 434 misclassification of annotation groups have negative impact on the predictive performance 435 of PRSbils, especially when there is larger group sparsity in each annotation group's 436 contribution to heritability (Supplementary Figure 9). Thus, to achieve good performance, 437 it is important to ensure that the group annotations well depicts the underlying heritability 438 structure. 439 We note several points that can be further improved in future studies for PRSbils. Firstly, 440 from the predictive performance of PRSbils evaluated in UK Biobank, MGI, and KoGES data, 441 we noted that the results were not consistent across different data sources, which indicates 442 the influence of cohort difference over predictive performance when annotation information 443 is incorporated. It is thus critical to investigate the difference in the underlying architecture 444 of the annotation groups across populations and make the method more robust for   Table 1. Summary of parameter settings in the simulation study. A total of = 125,000 538 SNPs were sampled from the UK Biobank data with quality control filters, with 1KG as LD 539 reference panel. The sampled variants were then assigned to different annotation groups, 540 which explain 1 , . . . , % of the total heritability ℎ 2 respectively. For each annotation group 541 , the proportion of causal variants is denoted as . Overlapping pattern I was used in 542 setting 6, where the intersection over union metric (IOU) was higher among the annotation 543 groups with low heritability contribution. For setting 7, overlapping pattern II was used, 544 where IOU was higher among the annotation groups with high heritability contribution. 545