Impact of Pre and Post Variant Filtration Strategies on Imputation
Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.
Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, table 1,2 is only available as a download in the Supplemental Files section.
This is a list of supplementary files associated with this preprint. Click to download.
Supplementary document
Posted 21 Dec, 2020
On 11 Jan, 2021
Received 10 Jan, 2021
Received 10 Jan, 2021
Received 10 Jan, 2021
On 28 Dec, 2020
On 28 Dec, 2020
On 28 Dec, 2020
On 28 Dec, 2020
On 28 Dec, 2020
Invitations sent on 24 Dec, 2020
On 18 Dec, 2020
On 18 Dec, 2020
On 18 Dec, 2020
On 14 Dec, 2020
Impact of Pre and Post Variant Filtration Strategies on Imputation
Posted 21 Dec, 2020
On 11 Jan, 2021
Received 10 Jan, 2021
Received 10 Jan, 2021
Received 10 Jan, 2021
On 28 Dec, 2020
On 28 Dec, 2020
On 28 Dec, 2020
On 28 Dec, 2020
On 28 Dec, 2020
Invitations sent on 24 Dec, 2020
On 18 Dec, 2020
On 18 Dec, 2020
On 18 Dec, 2020
On 14 Dec, 2020
Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.
Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Due to technical limitations, table 1,2 is only available as a download in the Supplemental Files section.