Background: Technological advances and decreasing costs have led to the rise of increasingly dense genotyping data, making feasible the identification of potential causal or candidate markers. Custom genotyping chips, which represent a cost-effective strategy to combine medium-density genotypes with a custom genotype panel, can capitalize on these candidates to potentially yield improved accuracy and interpretability in genomic prediction. A particularly promising model to this end is BayesR, which divides markers into four effect size classes (null, small, medium, and large). The flexibility of BayesR has been shown to yield accurate predictions and promise for quantitative trait loci (QTL) mapping in real data applications, but an extensive benchmarking in simulated data is currently lacking.
Results: Based on a set of real genotypes, we generated simulated data under a variety of genetic architectures, phenotype heritabilities, and polygenic variances, and we evaluated the impact of excluding (50k genotype data) or including (50k custom genotype data) causal markers among the genotypes. We define several statistical criteria for QTL mapping using BayesR output (maximum a posteriori rule, non-null maximum a posteriori rule, posterior variance, weighted cumulative inclusion probability), including several based on sliding windows rather than individual markers to account for linkage disequilibrium. We compare and contrast these statistics and their ability to accurately prioritize known causal markers. Overall, we confirm the strong predictive performance for BayesR in moderately to highly heritable traits, particularly for 50k custom data; in cases of low heritability or weak linkage disequilibrium with the causal marker in 50k genotypes, QTL mapping is a challenge, regardless of the criterion used.
Conclusion: BayesR is a promising approach to simultaneously obtain accurate predictions and interpretable classifications of SNPs into effect size classes. Although QTL mapping is unsurprisingly easiest for highly heritable phenotypes and large QTLs, we illustrated the performance of BayesR in a variety of simulation scenarios, and compared the advantages and limitations of each. Among those considered, the weighted cumulative inclusion probability appears to provide the best mapping results, even under less favorable conditions. Finally, we quantify the advantage that can be gained by incorporating causal mutations on a custom genotyping chip.