An empirical comparison between polygenic risk scores and machine learning for case/control classification

doi:10.21203/rs.3.rs-1298372/v1

Download PDF

Research Article

An empirical comparison between polygenic risk scores and machine learning for case/control classification

https://doi.org/10.21203/rs.3.rs-1298372/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

We compared the procedure to calculate polygenic risk scores and machine learning for simulated data, devised a way to compare machine learning results with PRS, and highlighted the required files formats for PRS calculation and machine learning model training. For PRS calculation, we used three tools: Plink, PRSice, and Lassosum, and for the machine learning algorithm, we used artificial neural networks.

Results

Based on our survey, we cannot say machine learning is better or polygenic risk scores because it depends on the phenotype under consideration. The average classification AUC of PRSice, Plink, Lassosum, and Machine learning was 0.27, 0.3, 0.35, and 0.87 on simulated data.

Conclusion

This article presents the comparison method in an automated way, ultimately assisting in various analyses. For instance, datasets with different heritability or genetic variations can be generated, and the effect on machine learning algorithms' accuracy and PRS's accuracy can be studied. Such analyses may require the generation of multiple datasets, calculation of PRS, and training machine learning model, which can be done quickly using the code segments and scripts provided in this manuscript. Apart from that, we compared the steps of PRS calculation with machine learning and found some steps are optional in machine learning.

polygenic risk scores

genotype-phenotype prediction

genetics

bioinformatics

applied machine learning

No competing interests reported.

SupplementaryInfo.pdf

Download PDF

Version 1

posted

You are reading this latest preprint version

An empirical comparison between polygenic risk scores and machine learning for case/control classification

Status:

Version 1

Abstract

Full Text

Additional Declarations

Supplementary Files

Status:

Version 1