Snp Characteristics and Validation Success in Genome Wide Association Studies

doi:10.21203/rs.3.rs-704899/v1

Download PDF

Research Article

Snp Characteristics and Validation Success in Genome Wide Association Studies

https://doi.org/10.21203/rs.3.rs-704899/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 03 Jan, 2022

Read the published version in Human Genetics →

You are reading this latest preprint version

Genome wide association studies (GWASs) have identified tens of thousands of single nucleotide polymorphisms (SNPs) associated with human diseases and characteristics. A significant fraction of GWAS findings can be false positives. The gold standard for true positives is an independent validation. The goal of this study was to identify SNP features associated with validation success. Summary statistics from the Catalog of Published GWASs were used in the analysis. Since our goal was an analysis of reproducibility, we focused on the diseases/phenotypes targeted by at least 10 GWASs. GWASs were arranged in discovery-validation pairs based on the time of publication, with the discovery GWAS published before validation. We used four definitions of the validation success that differ by stringency. Associations of SNP features with validation success were consistent across the definitions. The strongest predictor of SNP validation was the level of statistical significance in the discovery GWAS. The magnitude of the effect size was associated with validation success in a non-linear manner. SNPs with risk allele frequencies in the range 30-70% showed a higher validation success rate compared to rarer or more common SNPs. Missense, 5’UTR, stop gained, and SNPs located in transcription factor binding sites had a higher validation success rate compared to intergeneic, intronic, or synonymous SNPs. There was a positive association between validation success and the level of evolutionary conservation of the sites. In addition, validation success was higher when discovery and validation GWASs targeted the same ethnicity. All predictors of validation success remained significant in a multivariable logistic regression model indicating their independent contribution. To conclude, we identified SNP features predicting validation success of GWAS hits. These features can be used to select SNPs for validation and downstream functional studies.

Molecular Genetics

SNP characteristic

GWAS

SNP validation

reproducibility

Genome-wide association studies (GWASs) revolutionized the study of genetic control of human phenotypes and diseases. (Tam et al., 2019; Visscher, Brown, McCarthy, & Yang, 2012) GWASs test millions of SNPs in phenotypically different individuals to identify genotype-phenotype associations. Thousands of associations between SNPs and diseases/traits have been detected. (Bosse & Amos, 2018; Gallagher & Chen-Plotkin, 2018; Horwitz, Lam, Chen, Xia, & Liu, 2019; Liang, Ding, Huang, Luo, & Zhu, 2020) Despite using the strict genome-wide threshold for statistical significance (p < 5x10^− 8 or equivalently -log(p) > 7.3), a considerable number of detected SNP-phenotype associations fail independent validation. (Brzyski et al., 2017; Marigorta, Rodriguez, Gibson, & Navarro, 2018) Identifying SNP characteristics predicting validation success (true positives) is important for prioritizing SNPs for targeted validation and downstream functional studies. We and others identified a number of SNP characteristics associated with the validation success. (Gorlov et al., 2014; Merelli et al., 2013; Xu & Taylor, 2009)

Here we present results of an updated analysis of associations between SNP characteristics and validation success.

Data used

We used data from the Catalog of Published GWASs (CPG). (Buniello et al., 2019) The catalog was accessed on May 12, 2021. We retrieved summary statistics for SNPs with the genome-wide level of statistical significance (p < 5x10^− 8) and gray zone SNPs (10^− 5<p < 5x10^− 8). The later were included to test if they are enriched by true positives. We focused on diseases/traits that were targeted by at least 10 studies. A total of 40 diseases/traits were analyzed in the study (Table 1). Diseases/traits’ labels were used exactly how they were reported in the CPG.

Validation attempts

For each disease/trait, GWASs were arranged into pairs according to the publication date. Each pair was considered to be a validation attempt, with the earlier GWAS considered the discovery and the later, validation. The complete list of discovery-validation pairs can be found in Supplementary Table S1. The supplementary table also includes pairwise linkage disequilibrium (LD) for three major ethnic groups: Europeans, Africans and Asians.

Table 1

Diseases/traits with the corresponding numbers of conducted GWASs.
N	DISEASE/TRAIT	Number of studies
1	Type 2 diabetes	69
2	Body mass index	51
3	Breast cancer	39
4	HDL cholesterol levels	38
5	Schizophrenia	36
6	Colorectal cancer	33
7	Prostate cancer	32
8	Height	31
9	Diastolic blood pressure	29
10	Alzheimer's disease	25
11	Asthma	25
12	Rheumatoid arthritis	24
13	Parkinson's disease	23
14	Crohn's disease	22
15	Systemic lupus erythematosus	22
16	Bipolar disorder	21
17	Multiple sclerosis	20
18	Amyotrophic lateral sclerosis	19
19	Major depressive disorder	19
20	Ulcerative colitis	19
21	Hypertension	18
22	Coronary heart disease	16
23	Alcohol dependence	15
24	Glaucoma (primary open-angle)	15
25	Psoriasis	15
26	Type 1 diabetes	15
27	Bone mineral density	14
28	Intraocular pressure	14
29	Lung cancer	14
30	Telomere length	13
31	Adiponectin levels	12
32	Attention deficit hyperactivity disorder	12
33	Fasting plasma glucose	12
34	Glycated hemoglobin levels	12
35	Age-related macular degeneration	11
36	Atrial fibrillation	11
37	Bilirubin levels	11
38	QT interval	11
39	Venous thromboembolism	11
40	Pancreatic cancer	10

Definitions of successful validation

We used four definitions of successful validation that differ by the stringency. Under the strict definition, a SNP was considered validated when the validation GWAS detected the same SNP at the genome-wide level of significance (p < 5x10^− 8). Under the relaxed definition, a SNP was considered validated when the same or a linked SNP (r² > 0.8 in the validation population) was detected at the genome-wide level of significance. LD data were downloaded from LDLink database. (Myers, Chanock, & Machiela, 2020). Under the soft definition, a SNP was considered validated if the original SNP or a SNP in tight LD with it was detected in the validation GWAS at the liberal level of significance of p < 10^− 5. Finally, under the ultra-soft definition of validation success, a SNP was considered validated if the original SNP or a tightly linked SNP reached the GWAS level of significance in at least one out of at least three subsequent GWASs (attempts). Therefore, the principal difference of the ultra-soft definition of the validation success from the other three definitions is that for the latter, the validation success was per single attempt, while under the ultra-soft definition of the validation success, at least three validation attempts are required and the SNP is considered validated if at least one attempt is successful.

Predictors of the validation success

The following SNP characteristics were used as predictors of the validation success: (1) the level of statistical significance in the discovery GWAS expressed as -log(p), where p is the p-value; (2) the effect size (either original odds ratios (OR) or transformed to 1/OR for ORs < 1 to keep them on same scale as ORs > 1); (3) risk allele frequency; (4) the type of the SNP (see below); (5) the level of evolutionary conservation of the site estimated by the PhyloP method (Pollard, Hubisz, Rosenbloom, & Siepel, 2010).

The PhyloP uses the distribution of nucleotide substitutions in an evolutionary tree of 44 vertebrate species to estimate the expected number of substitutions per site under the assumption of neutral evolution. The observed number of substitutions at the site is compared to the expected under the assumption of selective neutrality. A higher PhyloP score means a higher level of evolutionary conservation.

Statistical analysis and visualization

To visualize the associations of quantitative features, e.g. -log(p), with validation success, we stratified predictors by deciles. First we ranked SNPs by the corresponding characteristic and then stratified them into ten groups. Validation success rate was estimated for each group separately. To estimate and compare different types of SNPs by validation success we used SNP types reported by CPG. The list of the most frequent SNP types used in the analysis is as follows: “intron variant”, “intergenic variant”, “missense variant”, “non-coding transcript exon variant”, “3’UTR variant”, “TF binding site variant”, “5’UTR variant”, and “synonymous variant”. To estimate the effect of the same/different ethnicities in the discovery and the validation GWASs we used the CPG data. The most frequently reported ethnicities are Europeans, East Asians, African American, Hispanic/Latino and Ashkenazi Jews.

Initially associations were estimated using univariate analyses. Features significant in the univariate analyses were included in multivariable logistic regression. Validation status was treated as the outcome – validated/not-validated. All significant predictors were included into the model, to evaluate their independent effects. We present the two extreme definitions of validation success: ultra-soft and strict. The results of two other models were similar to the strict and ultra-soft models of validation success. All statistical analyses were performed using STATISTICA (TIBCO Software Inc.) and Origin (OriginLab Corporation, Northampton, MA, USA).

Diseases/traits differ by the average validation success

Figure 1 shows validation success rates across diseases/traits. We observed a higher than an order of magnitude variation among the phenotypes. Those with lowest validation success rates included “Major depressive disorder”, “Attention deficit hyperactivity disorder”, “Bone mineral density”, “Alcohol dependence”, “Coronary heart disease” and “Bipolar disorder”. Diseases/traits with highest validation success rate included “Breast cancer”, “Asthma”, “Venous thromboembolism”, “QT interval’, “Atrial fibrillation”, and “Age-related macular degeneration”.

Validation success rates for different definitions

The overall average validation success rate for SNPs across all phenotypes varied depending on the definition of the validation success: 6.42±0.07% under the strict definition, 6.66±0.07% under the relaxed definition, 7.87±0.08% under the soft definition, and 50.87±0.16 under the ultra-soft definition.

The level of statistical significance in the discovery GWAS is positively associated with validation success

We observed a strong positive association of -log(p) in the discovery GWAS with the validation success under all 4 definitions of the validations success. Figure 2 shows mean validation success rate for SNPs categorized by the deciles of -log(p) in the discovery study. The proportion of validated SNPs is positively associated with the -log(p) in the range of -log(p) 5-7.5; for higher -log(p) deciles the slope is less steep. Similar shapes were observed for all definitions of validation success including the ultra-soft definition that dwarfs validation rate for a single validation attempt (Fig. 2b).

Odds ratios in the discovery GWAS and validation success

Overall negative correlations of the OR with SNP validation success were detected under strict, relaxed, soft, and ultra-soft definitions of the validation success (corresponding Spearman rank order correlations were ρ=-0.03, N=60,166, p=7.1x10^-15, ρ=-0.01, N=60,166, p=2.8x10^-3, ρ=-0.02, N=60,166, p=5.1x10^-5, ρ=-0.1, N=57,352, p=5.6x10^-25). The association between OR and validation success using decile stratification shows a more complex relationship. Highest validation success was for the SNPs with ORs in the range of 1.06-1.3, while the SNPs with reported ORs <1.06 or >1.3 had lower validation success.

Association between the risk allele frequency and validation success

Under strict, relaxed, and soft definition of the validation success there is a tendency for common risk-associated alleles (allelic frequency close to 0.5) to have a higher validation success rate (Fig. 4a). The association is more evident under the ultra-soft definition of the validation success (Fig. 4b). When we used MAF of the reported SNP (Fig. 5), we found that SNPs with MAF 0.3-0.5 tended to be validated more often compared to rarer or more common SNPs.

Different types of SNPs differ by validation success

We compared different types of SNPs by validation success (Fig. 6). Intergenic and intron variants had the lowest validation success rate. Validation success rate of SNPs producing missense mutations, stop gained, located in TF binding sites or 5’ UTR were the highest.

Validation success is higher for SNPs located in evolutionary conserved sites

We found positive correlations between level of evolutionary conservation – PhyloP score and validation success under strict, relaxed, soft, and ultra-soft definitions of the validation success (corresponding Spearman rank order correlations were ρ=0.04, N=125,087, p<10^-²⁵; ρ=0.04, N=125,087, p<10^-25; ρ=0.04, N=125,087, p<10^-25; ρ=0.07, N=117,643, p<10^-25). Figure 7 shows that SNPs with evidence of evolutionary conservation are more likely to be validated.

SNPs are more likely to be validated when the same race/ethnicity is targeted by discovery and validation GWASs

When the discovery and the validation GWASs target the same race/ethnicity, the validation success rate is higher compared to the situation when the ethnicities in the discovery and validation GWASs are different. This is true regardless of the definition of the validation success (Figure 8).

Multivariate logistic regression analysis

We analyzed the predictors simultaneously using binary logistic regression model with validation status as the outcome. All predictors remained significant, for both the strict and ultra-soft definitions of validation (Tables 2 a, b).

Table 2a. Multivariable prediction of SNP validation success in GWASs (strict definition of validation).

Predictor	P-value	OR	95% CI
Predictor	P-value	OR	Lower	Upper
SNP MAF	1.233E-02	1.494	1.091	2.046
-log p-value at discovery	1.670E-60	1.013	1.012	1.015
Different population in validation and discovery*	1.061E-93	.390	.356	.427
PhyloP score	5.647E-41	1.202	1.170	1.235
OR groups stratified by deciles 1, reference
2	4.699E-04	.745	.632	.879
3	1.359E-02	.833	.721	.963
4	3.456E-03	.778	.657	.921
5	8.721E-12	.558	.472	.660
6	8.752E-05	.718	.608	.847
7	2.305E-07	.665	.569	.776
8	3.642E-08	.627	.531	.741
9	6.647E-07	.652	.551	.772
10	2.022E-15	.474	.395	.570
SNP type categories**: likely non-functional, reference
Other	1.435E-01	1.089	.971	1.222
Likely functional	4.233E-22	1.854	1.636	2.101

*Reference, the same population in the discovery and validation

**Non-functional: intergenic, synonymous, intronic; functional: 5’ UTR, missense, nonsense, located in transcription factor binding sites; other – non-coding exonic, 3’ UTR

Table 2b. Multivariable prediction of SNP validation success in GWASs (ultra-soft definition of validation).

Predictor	P-value	OR	95% CI
Predictor	P-value	OR	Lower	Upper
SNP MAF	1.76E-20	2.415	2.005	2.91
-log p-value at discovery	1.81E-188	1.078	1.073	1.082
Different population in validation and discovery*	9.34E-11	0.808	0.757	0.862
PhyloP score	3.00E-54	1.164	1.142	1.187
OR groups stratified by deciles 1, reference
2	0.001	1.172	1.063	1.292
3	1.74E-17	1.491	1.36	1.635
4	0.011	1.146	1.032	1.273
5	3.82E-08	0.771	0.702	0.846
6	0.748	1.017	0.92	1.123
7	3.317E-10	0.741	0.674	0.813
8	6.999E-51	0.481	0.437	0.529
9	8.602E-118	0.294	0.265	0.326
10	1.848E-43	0.488	0.441	0.541
SNP type categories**: likely non-functional, reference
Other	1.170E-13	0.776	0.726	0.83
Likely functional	2.165E-54	2.204	1.994	2.435

*Reference, the same population in the discovery and validation

**Non-functional: intergenic, synonymous, intronic; functional: 5’ UTR, missense, nonsense, located in transcription factor binding sites; other – non-coding exonic, 3’ UTR

Compared to our previous study (Gorlov et al., 2014), the current analysis is based on a larger sample size and includes more predictors. We confirmed the previous associations and added new ones. Validation success rate per single validation attempt was similar for the strict, relaxed and soft definitions in the range of 6–8%. One of the possible reasons for the low validation rate for SNPs could be that our analysis included gray zone SNPs. However, when such SNPs were excluded from the analysis, the validation success increased only marginally for strict definition, from 6.42 ± 0.07% to 6.46 ± 0.08%. A similarly slight increase in validation success after removing gray zone SNPs was observed for the relaxed and soft definitions of validation. It is unlikely, also, that differences in genotyping platforms are a major contributor to the low validation success. By definition, validation GWASs are performed later than discovery, and later GWASs tend to use denser genotyping platforms. Besides, the investigators usually impute SNPs that were detected as significant earlier if they were not on the genotyping platform. (Li, Willer, Sanna, & Abecasis, 2009; S. Shi et al., 2018) We found that having the same genotyping platform increased chances of validation only by 1.1%. Targeting the same ethnicity in the discovery and validation GWASs has a more profound effect on validation success (about two-fold, Fig. 8). This indicates that targeting genetically similar populations is important for successful validation. Note that we have used only major population categories: Europeans, East Asians, African American, Hispanic/Latino and Ashkenazi Jews. The major population groups are genetically heterogeneous. There are, for example, significant genetic differences among European subpopulations, which also can impact reproducibility. (Lao et al., 2008)

Not surprisingly, the level of statistical significance in the discovery GWAS was the strongest predictor of the validation success. The association between validation success and OR was markedly nonlinear. The highest validation success rate was in the group of SNPs with ORs in the range from 1.1 to 1.3, suggesting that “real” ORs tend to be within this range. Compared to these, SNPs with ORs > 2 in the discovery GWAS are validated 40% less likely. This can be because the initial discoveries tend to overestimate the effect sizes - a “winner's curse”. (Lohmueller, Pearce, Pike, Lander, & Hirschhorn, 2003; J. Shi et al., 2016; Xiao & Boehnke, 2011) Validation success rate was highest for most polymorphic SNPs, likely because statistical power is the highest for the SNPs with a frequency close to 0.5. (Hong & Park, 2012; Sham & Purcell, 2014).

Intronic, intergeneic and synonymous SNPs showed lower validation rates compared to the missense SNPs, SNPs located in TF binding sites or in 5’UTR regions. The most likely explanation for this can be that some GWAS-detected SNPs are causal (Caballero, Tenesa, & Keightley, 2015; Schaid, Chen, & Larson, 2018; Wang et al., 2020). Functional SNPs affect the level of expression and/or protein function, including protein folding. Missense SNPs and SNPs located in TF binding sites or 5’UTR regions (often loaded with regulatory elements) are likely to be functional. (Buroker, 2014; Huo, Li, Liu, Li, & Luo, 2019; Lou et al., 2017)

It is accepted that the level of evolutionary conservation of the site reflects its functional importance (O'Connor et al., 2019; Zeng et al., 2018) suggesting that the positive association between the level of evolutionary conservation of the site and replication success that we found is due to the presence of functional causal SNPs among GWAS top hits.

All predictors of validation success detected in univariate analysis remained significant in the multivariate logistic regression analysis. The most significant predictors in multivariate analysis were the level of statistical significance in the discovery, followed by SNP type and PhyloP score (Table 2).

The results of this study suggest that SNP features may help to select SNPs with highest chances to be validated. Indeed, when we selected SNPs based on the five major predictors of validation success, as follows: (1) the SNP is genome-wide significant in the discovery GWAS; (2) the risk allele frequency is between 0.1 and 0.9; (3) the SNP is missense, or is located in a TF binding site or in 5’UTR region; (4) the SNP has a high level of evolutionary conservation, and (5) the same ethnicity in the discovery and validation GWASs, the resulting SNPs showed validation success rate of 32.6%±5.8 under the strict definition, which is much higher than the overall average.

Surprisingly, we found that the validation success rate of the gray zone SNPs (10^− 5 <p < 5x10^− 8) was inferior but still comparable to that of SNPs with genome-wide level of statistical significance in the discovery: 4.19 ± 0.09% versus 12.26 ± 0.08% under the strict definition of the validation success (Fig. 2 first 4 points versus other points). This indicates that gray SNPs are enriched by true positives.

Limitations of the study

Subsequent GWASs targeting the same phenotype were considered in this study as an independent validation. That is not always the case. In some cases the subsequent GWASs include a subset of samples already used in an earlier GWAS, which is likely to inflate the validation success rate. We do not think, however, that this issue substantially affects the findings on associations between the SNP characteristics and validation rate. Besides, based on our experience with lung cancer GWASs and a limited review of published GWASs, we believe that a typical overlap (if exists) does not exceed 20%. We found that the associations were very similar across different definitions of validation success. Another limitation is that we did not handle the meta-analysis studies any differently from standard two-phase GWASs. We formally followed the classification adopted by the Catalog of the Published GWASs because it reflects the current state of knowledge of disease etiology. We acknowledge that disease classification is a moving target and a disease once considered genetically homogeneous may be later reclassified into several distinct diseases as it becomes studied better.

Funding

Partial financial support was received from National Institutes of Health grants U19CA203654, U19CA203654S1, R01CA231141, and P01 CA206980-01A1, Cancer Prevention and Research Institute of Texas grant RR170048. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflicts of interest/Competing interests

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Availability of data and material

The data from A Catalog of Published Genome-Wide Association Studies https://www.genome.gov/catalog-of-published-genomewide-association-studies, UCSC Human Genome Browser https://genome.ucsc.edu, The Ensembl Regulatory Build http://useast.ensembl.org/info/genome/funcgen/regulatory_build.html, and ENCODE https://www.encodeproject.org/, all in the public domain, were used in this project

Code availability

Not applicable

Ethics approval

Not applicable: the study used aggregate statistics from datasets in the public domain

Consent to participate

Not applicable

Consent for publication

Not applicable

Bosse, Y., & Amos, C. I. (2018). A Decade of GWAS Results in Lung Cancer. Cancer Epidemiol Biomarkers Prev, 27(4), 363-379. doi:10.1158/1055-9965.EPI-16-0794
Brzyski, D., Peterson, C. B., Sobczyk, P., Candes, E. J., Bogdan, M., & Sabatti, C. (2017). Controlling the Rate of GWAS False Discoveries. Genetics, 205(1), 61-75. doi:10.1534/genetics.116.193987
Buniello, A., MacArthur, J. A. L., Cerezo, M., Harris, L. W., Hayhurst, J., Malangone, C., . . . Parkinson, H. (2019). The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res, 47(D1), D1005-D1012. doi:10.1093/nar/gky1120
Buroker, N. E. (2014). Regulatory SNPs and transcriptional factor binding sites in ADRBK1, AKT3, ATF3, DIO2, TBXA2R and VEGFA. Transcription, 5(4), e964559. doi:10.4161/21541264.2014.964559
Caballero, A., Tenesa, A., & Keightley, P. D. (2015). The Nature of Genetic Variation for Complex Traits Revealed by GWAS and Regional Heritability Mapping Analyses. Genetics, 201(4), 1601-1613. doi:10.1534/genetics.115.177220
Gallagher, M. D., & Chen-Plotkin, A. S. (2018). The Post-GWAS Era: From Association to Function. Am J Hum Genet, 102(5), 717-730. doi:10.1016/j.ajhg.2018.04.002
Gorlov, I. P., Moore, J. H., Peng, B., Jin, J. L., Gorlova, O. Y., & Amos, C. I. (2014). SNP characteristics predict replication success in association studies. Hum Genet, 133(12), 1477-1486. doi:10.1007/s00439-014-1493-6
Hong, E. P., & Park, J. W. (2012). Sample size and statistical power calculation in genetic association studies. Genomics Inform, 10(2), 117-122. doi:10.5808/GI.2012.10.2.117
Horwitz, T., Lam, K., Chen, Y., Xia, Y., & Liu, C. (2019). A decade in psychiatric GWAS research. Mol Psychiatry, 24(3), 378-389. doi:10.1038/s41380-018-0055-z
Huo, Y., Li, S., Liu, J., Li, X., & Luo, X. J. (2019). Functional genomics reveal gene regulatory mechanisms underlying schizophrenia risk. Nat Commun, 10(1), 670. doi:10.1038/s41467-019-08666-4
Lao, O., Lu, T. T., Nothnagel, M., Junge, O., Freitag-Wolf, S., Caliebe, A., . . . Kayser, M. (2008). Correlation between genetic and geographic structure in Europe. Curr Biol, 18(16), 1241-1248. doi:10.1016/j.cub.2008.07.049
Li, Y., Willer, C., Sanna, S., & Abecasis, G. (2009). Genotype imputation. Annu Rev Genomics Hum Genet, 10, 387-406. doi:10.1146/annurev.genom.9.081307.164242
Liang, B., Ding, H., Huang, L., Luo, H., & Zhu, X. (2020). GWAS in cancer: progress and challenges. Mol Genet Genomics, 295(3), 537-561. doi:10.1007/s00438-020-01647-z
Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S., & Hirschhorn, J. N. (2003). Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet, 33(2), 177-182. doi:10.1038/ng1071
Lou, J., Gong, J., Ke, J., Tian, J., Zhang, Y., Li, J., . . . Miao, X. (2017). A functional polymorphism located at transcription factor binding sites, rs6695837 near LAMC1 gene, confers risk of colorectal cancer in Chinese populations. Carcinogenesis, 38(2), 177-183. doi:10.1093/carcin/bgw204
Marigorta, U. M., Rodriguez, J. A., Gibson, G., & Navarro, A. (2018). Replicability and Prediction: Lessons and Challenges from GWAS. Trends Genet, 34(7), 504-517. doi:10.1016/j.tig.2018.03.005
Merelli, I., Calabria, A., Cozzi, P., Viti, F., Mosca, E., & Milanesi, L. (2013). SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS. BMC Bioinformatics, 14 Suppl 1, S9. doi:10.1186/1471-2105-14-S1-S9
Myers, T. A., Chanock, S. J., & Machiela, M. J. (2020). LDlinkR: An R Package for Rapidly Calculating Linkage Disequilibrium Statistics in Diverse Populations. Front Genet, 11, 157. doi:10.3389/fgene.2020.00157
O'Connor, L. J., Schoech, A. P., Hormozdiari, F., Gazal, S., Patterson, N., & Price, A. L. (2019). Extreme Polygenicity of Complex Traits Is Explained by Negative Selection. Am J Hum Genet, 105(3), 456-476. doi:10.1016/j.ajhg.2019.07.003
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., & Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res, 20(1), 110-121. doi:10.1101/gr.097857.109
Schaid, D. J., Chen, W., & Larson, N. B. (2018). From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet, 19(8), 491-504. doi:10.1038/s41576-018-0016-z
Sham, P. C., & Purcell, S. M. (2014). Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet, 15(5), 335-346. doi:10.1038/nrg3706
Shi, J., Park, J. H., Duan, J., Berndt, S. T., Moy, W., Yu, K., . . . Chatterjee, N. (2016). Winner's Curse Correction and Variable Thresholding Improve Performance of Polygenic Risk Modeling Based on Genome-Wide Association Study Summary-Level Data. PLoS Genet, 12(12), e1006493. doi:10.1371/journal.pgen.1006493
Shi, S., Yuan, N., Yang, M., Du, Z., Wang, J., Sheng, X., . . . Xiao, J. (2018). Comprehensive Assessment of Genotype Imputation Performance. Hum Hered, 83(3), 107-116. doi:10.1159/000489758
Tam, V., Patel, N., Turcotte, M., Bosse, Y., Pare, G., & Meyre, D. (2019). Benefits and limitations of genome-wide association studies. Nat Rev Genet, 20(8), 467-484. doi:10.1038/s41576-019-0127-1
Visscher, P. M., Brown, M. A., McCarthy, M. I., & Yang, J. (2012). Five years of GWAS discovery. Am J Hum Genet, 90(1), 7-24. doi:10.1016/j.ajhg.2011.11.029
Wang, J., Huang, D., Zhou, Y., Yao, H., Liu, H., Zhai, S., . . . Li, M. J. (2020). CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies. Nucleic Acids Res, 48(D1), D807-D816. doi:10.1093/nar/gkz1026
Xiao, R., & Boehnke, M. (2011). Quantifying and correcting for the winner's curse in quantitative-trait association studies. Genet Epidemiol, 35(3), 133-138. doi:10.1002/gepi.20551
Xu, Z., & Taylor, J. A. (2009). SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res, 37(Web Server issue), W600-605. doi:10.1093/nar/gkp290
Zeng, J., de Vlaming, R., Wu, Y., Robinson, M. R., Lloyd-Jones, L. R., Yengo, L., . . . Yang, J. (2018). Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet, 50(5), 746-753. doi:10.1038/s41588-018-0101-4

SupplementaryTable1.txt

Download PDF

Journal Publication

published 03 Jan, 2022

Read the published version in Human Genetics →

Editorial decision: Major revisions
16 Oct, 2021
Reviewers invited by journal
12 Aug, 2021
Editor assigned by journal
10 Jul, 2021
First submitted to journal
09 Jul, 2021

You are reading this latest preprint version

Snp Characteristics and Validation Success in Genome Wide Association Studies

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Materials And Methods

Data used

Validation attempts

Definitions of successful validation

Predictors of the validation success

Statistical analysis and visualization

Results

Discussion

Declarations

References

Supplementary Files

Status:

Journal Publication

Version 1