Multi-omics-data-assisted genomic feature preselection improves the accuracy of genomic prediction

DOI: https://doi.org/10.21203/rs.3.rs-26522/v1

Abstract

Background: Presently, multi-omics data (e.g., genomics, transcriptomics, proteomics, and metabolomics) are available for genomic prediction. Omics data not only offer new data layers for genomic prediction but also provide a bridge between organismal phenotypes and genome variation that cannot be readily captured at the genome sequence level. Therefore, using multi-omics data to select feature markers is a feasible strategy to improve the accuracy of genomic prediction. In this study, simultaneously using whole-genome sequencing (WGS) and gene expression level data, four strategies for single-nucleotide polymorphism (SNP) preselection were investigated for genomic predictions in the Drosophila Genetic Reference Panel.

Results: Using genomic best linear unbiased prediction (GBLUP) with complete WGS data, the prediction accuracy values were 0.208±0.020 (0.181±0.022) for the startle response and 0.272±0.017 (0.307±0.015) for starvation resistance in the female (male) lines. Compared with GBLUP using complete WGS data, both GBLUP and the genomic feature BLUP (GFBLUP) did not improve the prediction accuracy using SNPs preselected from the complete WGS data based on the results of genome-wide association studies (GWASs) or transcriptome-wide association studies (TWASs). Furthermore, by using SNPs preselected from the WGS data based on the results of the expression quantitative trait locus (eQTL) mapping of all genes, only the startle response had greater accuracy than GBLUP with the complete WGS data. The best accuracy values in the female and male lines were 0.243±0.020 and 0.220±0.022, respectively. Importantly, by using SNPs preselected based on the results of the eQTL mapping of significant genes from TWAS, both GBLUP and GFBLUP resulted in a greater accuracy and smaller bias of genomic prediction. For the startle response, the best accuracy values were 0.258±0.019 (0.237±0.019) for GBLUP and 0.265±0.018 (0.245±0.020) for GFBLUP in the female (male) lines. For starvation resistance, the best accuracy values were 0.437±0.015 (0.427±0.015) for GBLUP and 0.419±0.016 (0.390±0.014) for GFBLUP in female (male) lines. Compared to the GBLUP with complete WGS data, the best accuracy values represented increases of 60.66% and 39.09% for the startle response and 27.40% and 35.36% for starvation resistance in the female and male lines, respectively.

Conclusions: Overall, multi-omics data can assist genomic feature preselection and improve the performance of genomic prediction. The new knowledge gained from this study will enrich the use of multi-omics in genomic prediction.

Full Text

This preprint is available for download as a PDF.