We analysed 136,318,924 SNPs from 4,397,962 participants across nine different phenotypes (18 GWAS). Of these 136,318,924 SNPs, 6,289 SNPs reached genome-wide significance in the respective discovery GWAS, of which 5,343 were replicated in their replication GWAS (85.0%, 95% Confidence Interval (CI): 84.1–85.8%). Replication rate varied substantially between binary and quantitative phenotypes and it was much lower in the former. Further, replication rate varied across P-value and OR of discovery GWAS SNP. We also found that SNP odds ratios (OR) decreased between discovery and replication GWAS for binary phenotypes, but increased for quantitative phenotypes. Lastly, we developed and then validated a model to predict SNP replication, and found it to be accurate (0.90 (95%CI: 0.89 to 0.91)).
Implications
Our results have implications for the potential validity and utility of GWAS results. First, the SNP replication rate for quantitative phenotypes is very high; implying that quantitative GWAS in the UKBB had likely reached sufficient power to accurately detect all SNPs that were truly associated with a phenotype and that had been discovered by earlier GWAS efforts. The high replication rate observed for quantitative traits may also reflect the precision and relative ease in which quantitative traits can be measured. The converse of this, the likely measurement error and ultimate definition heterogeneity of binary phenotypes, may be one explanation for the relatively low rate of replication in binary phenotypes. For instance, binary phenotypes often represent complex clinical diseases that can have a) broad diagnostic criteria (e.g. angina, and myocardial infarction are often captured under “Coronary Artery Disease”) and b) are defined via an array of data sources, of varying quality. The UKBB, for instance, defines their phenotypes with ICD codes based on linked electronic health records (EHR) 6. While this probably represents the best current method to define phenotypes in large cohorts, EHR data is “messy” and likely to include some “administrative and clinical error” 11. An improvement in the phenotyping in data used for GWAS of binary phenotypes is likely to result in improved SNP replication. This may be even more crucial for phenotypes where we saw low replication rates, e.g. eczema.
While the quality of phenotyping will eventually improve, in the meantime the modest replication rate we observed poses questions about the best way to utilize current binary phenotype GWAS results. On the one hand, it is encouraging that much scientific progress has been accomplished with current binary GWAS. For instance, polygenic risk scores based on current binary GWAS have been shown to accurately predict complex, common phenotypes 12–14. With improved phenotyping, it seems plausible that these scores may continue to improve. Nevertheless, in the meantime there may be other ways to enhance current binary GWAS results for polygenic risk scores. First, our results clearly show a superior replication rate with quantitative phenotypes. These quantitative phenotypes are often more in line with physiological processes (e.g. systolic blood pressure) than clinical diseases (e.g. coronary artery disease). As such, future GWAS that directly use metabolomic data as outcomes (such as protein expression) are likely to, similarly, have higher accuracy than clinical disease phenotypes. Future research merging metabolomic outcomes and GWAS may be a useful addition to our scientific knowledge. Second, almost all SNPs for binary traits with an OR >/= 1.2 were replicated, whereas the majority of SNPs with an OR below 1.2 were not replicated and this may reflect lack of power in the replication dataset. Of note, many of the replication UKBB datasets that we considered here did not use the full UKBB data, and power is likely to improve as complete biobank data are used and many biobanks are combined.
Limitations in comparison to previous literature
We were surprised to find only nine phenotypes where two GWAS had been conducted in truly independent participants and where inclusion or not of UKBB data was a distinguishing feature. It is plausible that further independent GWAS on the same traits exist, although this seems unlikely given the thorough and systematic search we performed of the GWAS atlas 8. It is, however, likely that more GWAS are available, but they contain overlapping samples between GWAS (i.e. two GWAS of the same phenotype are not truly independent as they contain similar cohorts of participants), aren’t of sufficient quality to be included in the GWAS Atlas, are conducted in a non-European population, or have not made their summary statistics available. A earlier study 15 reports building a model for SNP replication using GWAS for over 50 phenotypes, although it is unclear what, if any, measures were taken to determine if these numerous GWAS were truly independent i.e. did not include overlapping participants. Also, this study validated their model in two, small GWAS of one trait. Furthermore, this study didn’t actually quantify a SNP replication rate, nor did they stratify their results by binary and quantitative phenotypes. A further limitation of our study is that we didn’t include other SNP features, ideally we would have liked to include, for instance, minor allele frequency as a predictor in our model. However, this data was sparsely available in the replication (non-UKBB) GWAS. Lastly, it should be acknowledged that large disease-specific consortiums generally qualitatively describe the replication of SNPs as their consortium increases. Our study quantifies this formally and, importantly, quantifies replication across more than one phenotype.
Future research
We have identified a number of future research priorities. First, improving the phenotyping of binary phenotypes seems to be a priority for GWAS. Second, to facilitate an assessment of SNP replication, future independent cohorts are likely required. Many efforts to do this are already underway (e.g. AllofUs cohort and Millions Veteran Program).