False discovery rates for genome-wide association tests in biobanks with thousands of phenotypes

Aubrey Annis (  acannis@umich.edu ) University of Michigan Department of Biostatistics https://orcid.org/0000-0002-4095-9051 Anita Pandit University of Michigan Jonathon LeFaive Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health https://orcid.org/0000-0003-3668-6086 Sarah Gagliano Taliun University of Michigan Department of Biostatistics Lars Fritsche University of Michigan-Ann Arbor https://orcid.org/0000-0002-2110-1690 Peter VandeHaar University of Michigan-Ann Arbor https://orcid.org/0000-0002-8072-9461 Michael Boehnke University of Michigan https://orcid.org/0000-0002-6442-7754 Matthew Zawistowski University of Michigan Department of Biostatistics https://orcid.org/0000-0002-3005-083X Gonçalo Abecasis University of Michigan https://orcid.org/0000-0003-1509-1825 Sebastian Zöllner University of Michigan

phenome-wide biobank data is a substantially larger and more complex endeavor than a classical 23 genome-wide association study. 24 The established genome-wide significance threshold of p≤5×10 -8 arose from Bonferroni 25 correction accounting for the equivalent of ~1,000,000 independent tests across the genome, 11-15 and 26 it is easy to imagine extending this approach to account for multiple testing across the phenome as 27 well. However, given the strong correlations among phenotypes in biobank data, Bonferroni 28 correction for phenotypes is likely unnecessarily conservative. Moreover, as common-variant testing 29 is often more powerful than rare-variant testing, it may be unsuitable to apply the same significance 30 threshold to both variant types. 16 Finally, differences among biobanks in sample size, density of 31 genetic variants, phenotypes and phenotype correlation structure, and analytical choices suggest a 32 one-size-fits-all significance threshold may be inadequate. Permutation-based methods are a common 33 alternative to Bonferroni correction, but typically require thousands or even millions of replicates for 34 each association test. A permutation analysis on this scale is computationally impractical even on a 35 high performance cluster. Here we propose a computationally feasible single-iteration permutation 36 analysis that works well despite potential variability among permutations and provides significance 37 criteria tailored to individual datasets. 38 Our key insight rests on taking advantage of analyzing many phenotypes across the biobank 39 simultaneously. When a large number of phenotypes are analyzed in parallel, a single permutation 40 across all phenotypes followed by genetic association analyses of the permuted data enables us to 41 understand false discovery rates (FDRs) across the phenome. Our FDR estimates in turn help us to 42 interpret genetic associations in a biobank context. 43 The single-iteration permutation analysis is straightforward. We begin by separating each 44 participant's data vector, denoted as primary data, into two subvectors: one containing phenotypes 45 and the other containing genotypes. We then break any true phenotype-genotype associations in the 46 primary data by shuffling the phenotype subvectors among individuals. The shuffling process 47 randomly merges phenotype and genotype subvectors into new data vectors, which constitute the 48 permuted data. Any association identified in the permuted data is consequently due to chance and 49 represents a false discovery. We include age in the phenotype subvector to avoid nonsensical data 50 combinations in our permuted data (e.g. diagnosing a young person with Alzheimer's disease) and to 51 ensure that we properly control for age-specific effects by incorporating age in our regression model. 52 Similarly, we control for sex-specific effects by only permuting within sexes, thereby preventing 53 nonsensical data combinations related to sex-specific phenotypes (e.g. diagnosing a male with 54 ovarian cancer). When there is substantial relatedness in the data, we recommend grouping vectors 55 into blocks (for example, nuclear families or pairs of individuals with similar kinship) and then 56 randomly swapping vectors between blocks (Appendix A). After permuting the data, we perform 57 genetic association studies in both the primary and permuted data and identify independent 58 associations in each dataset by doing p-value clumping in a 1 MB region around each association 59 signal. We estimate the FDR as the ratio of the number of independent associations across all 60 phenotypes in the permuted-data genetic association studies (presumed false due to the phenotype-61 genotype dissociation) to the number of independent associations across all phenotypes in the 62 primary-data genetic association studies (Figure 1a; Online Methods). 63 We applied our permutation method to individuals in two biobanks: ~408,000 "White 64 British'' participants from the UK Biobank 1 (UKB) and ~42,000 European-ancestry participants from 65 the Michigan Genomics Initiative (MGI). The phenotypes in both datasets were derived from ICD 66 diagnosis codes in patient electronic health records (EHRs). 17 We analyzed 1,418 binary EHR-67 derived UKB phenotypes (https://pheweb.org/UKB-TOPMed/) 18 and 1,659 binary EHR-derived 68 MGI phenotypes (http://pheweb.org/MGI-freeze2/) with case counts >50; 1,365 of these phenotypes 69 were common to both datasets (Online Methods). To obtain FDR estimates, we performed 70 association testing in both datasets on the primary and permuted data, in which all associations are 71 generated by chance through shuffling the phenotype vectors. We used SAIGE 10 for the UKB 72 analysis and both SAIGE and fastGWA 19 for the MGI analysis, which allowed for a comparison of 73 different genetic association software (Appendix B). In addition, to assess the precision of a single-74 permutation FDR estimation, we repeated our SAIGE analysis across five independent permutations 75 of the MGI data ( Figure 1b). Although our analysis focuses on binary traits analyzed primarily with 76 SAIGE software, our method is also compatible with quantitative data and with any association 77 software that is well-calibrated for the data being analyzed (Appendix B). 78 To understand FDRs in biobank-scale association studies, we first evaluated the number of 79 signals with p≤5×10 -8 in the primary and permuted data of both biobanks. The association study of 80 the primary UKB data yielded 5,279 independent associations with p≤5×10 -8 while the permuted 81 rare variants: 0.0001≤MAF<0.001). We found that FDRs for significant associations were 91 substantially lower among common variants than among low-frequency and rare variants, with 92 FDR UKB ranging from 2% to 83% and FDR MGI ranging from 21% to 100% (Supplementary Table 1). 93 Concerningly, large FDRs for low-frequency and rare variants persisted even among associations 94 with p-values we would typically consider conclusive (e.g. ~70% FDR at p≤5×10 -9 and 95 0.0001≤MAF<0.001) (Figure 3a, 3b; Supplementary Figure 2). Overall FDR estimates and the FDR 96 estimates in each MAF partition also differed meaningfully between UKB and MGI (e.g. at p≤5×10 -9 97 FDR UKB =5% and average FDR MGI =24%), demonstrating the variability that can exist among datasets 98 due to their specific genotype and phenotype compositions and sample sizes (Supplementary Table  99 2). We believe the majority of FDR variation observed between UKB and MGI is due to greater 100 power in UKB because of its larger sample size; increased power is expected to increase the number 101 of true signals at any significance threshold even while the number of false signals remains constant, 102 thus decreasing FDR. The large FDR among rare variants likely reflects the combination of 103 decreased power among these variants and increased multiple testing burden (since the number of 104 independent rare variants in these imputed datasets greatly exceeds the number of common variants 105 after accounting for linkage disequilibrium). The variability among FDR estimates by dataset 106 emphasizes the value of developing significance criteria tailored to the individual dataset. 107 Each FDR estimate provides an individualized level of confidence for a result by giving an 108 approximate probability of the association being false; consequently we expect a negative correlation 109 between FDRs and replication rates, though naturally this will depend on having sufficient power for 110 replication as well. To assess the correlation between FDRs and replication rates phenome-wide, we 111 performed reciprocal replication analyses of significant independent associations in UKB and MGI. 112 In total, we evaluated 3,285 UKB and 1,010 MGI associations for replication in the other biobank 113 across the 1,365 phenotypes common to both studies (Online Methods). As expected, we observed a 114 gradual decrease in true replication (p≤0.05 and same direction of effect) for signals with higher 115 FDRs ( Figure 4a). In both replication analyses, most associations (~80%) with FDRs 0-50% 116 replicated in direction of effect regardless of p-value (Supplementary Figure 3). Interestingly, in low-117 FDR regions UKB replicated MGI at a much higher rate than MGI replicated UKB, most likely due 118 to a power differential between the datasets that enabled UKB to replicate marginal MGI     context, but also when viewing Manhattan plots in which association signals look equally valid. The 148 solitary signals for the MGI phenotypes "corneal opacity and other disorders of cornea" and 149 "acquired hemolytic anemias" look almost identical, with well-formed peaks clearly exceeding the 150 5×10 -8 threshold. At first glance the associations seem to have comparable chances of denoting a true 151 signal; however, after considering the FDR estimate for the top hit in each peak (corneal opacity and 152 other disorders of cornea: FDR=4%; acquired hemolytic anemias: FDR=72%), we concluded that 153 while we have high confidence in the corneal phenotype association with rs11659764 (TCF4) on 154 chromosome 18 (p MGI =1.9×10 -9 ), our confidence in the hemolytic anemias association with 155 rs760692431 (HS3ST4) on chromosome 16 is attenuated despite having a similar p-value 156 (p MGI =4.3×10 -9 ) that would often be considered sufficient evidence for association. A replication 157 analysis of these two signals in the UKB confirmed the conclusions suggested by the FDRs: the 158 association with "corneal opacity and other disorders of cornea" replicated in UKB (p UKB =2.4×10 -30 ) 159 while the association with "acquired hemolytic anemias" did not replicate (p UKB =0.71). These results 160 are also in keeping with the definition of the two phenotypes: while disorders of the cornea are well-161 recognized as having a genetic component, acquired hemolytic anemias are less heritable. When 162 comparing the two signals, a notable difference between them was the MAFs of the top variant in 163 each peak indicating either a common-variant (MAF rs11659764 =0.05 for corneal opacity) or rare-variant 164 (MAF rs760692431 =0.0002 for acquired hemolytic anemias) association. 165 To address the accuracy of a single-iteration permutation, we performed five permutations of 166 our MGI data and compared the number of independent associations yielded in each permutation. 167 The results indicated that the number of independent associations was similar across all 168 permutations, with the most variability occurring among low-frequency and rare variants ( Figure 6; 169 Supplementary the interpretation of each FDR category (21-24%: likely to be a true association; ~50%: association 176 could be true or false; ≥80%: likely to be a false association), we can easily see that this amount of 177 variability in the FDR estimation achieves our goal of detecting likely reliable vs. unreliable 178 associations and that a single permutation is adequate for estimating FDRs for associations with 179 p≤5×10 -8 . 180 We also assessed the total computation time and cost for a single-iteration permutation 181 analysis of the UKB and MGI data. Computation time for the permuted genetic association studies of 182 1,418 UKB and 1,659 MGI phenotypes using SAIGE were 1,752,160 and 48,221 CPU hours, 183 respectively. Estimated computation costs for the UKB analysis ranged from ~$35,000 on Google 184 Cloud Platform n1-standard machines to ~$47,000 for in-house computing; costs for the MGI 185 analysis were ~$1,000 for both computing options (Table 2). It should be noted that our analysis 186 included only European-ancestry individuals and that more computationally intensive analyses 187 employed to incorporate multi-ancestry data would increase computation time overall. A large 188 number of permutations would be prohibitively expensive and inefficient for analyzing single-or 189 multi-ancestry data in any large biobank, but a single permutation analysis has the same computation 190 time and cost as the primary data analysis and should therefore be feasible. Consequently, we suggest 191 that a single-iteration permutation analysis be performed alongside genetic association studies in a 192 biobank context and that the resulting FDR estimates will be a valuable resource for the proper 193 interpretation of association results. between the frequency of the variant and the FDR, yielding FDR UKB ranging from 4% to 82% and an 202 average FDR MGI ranging from 19% to 85% (Supplementary Table 3). Despite increased consistency 203 across MAC categories, we still observed noticeable variation between datasets (e.g. at p≤5×10 -8 and 204 1,000≤MAC<5,000 FDR UKB =79% and average FDR MGI =57%), once again emphasizing the necessity 205 for calculating FDRs for the specific dataset under investigation (Supplementary Figure 5). 206 Our analysis demonstrates that the current significance threshold (p≤5×10 -8 ) results in an 207 unacceptable number of false positives when testing biobanks with thousands of phenotypes. A better 208 calibrated significance criterion is needed to account for increased testing, genetic and phenotypic 209 variation among datasets, and differing variant frequencies. Our analysis showed that FDRs for low-210 frequency and rare variants were very high in both UKB and MGI at a p-value threshold of 5×10 -8 , 211 whereas at lower p-value thresholds (5×10 -10 or 5×10 -11 ) the FDRs decreased substantially (Figure 3a, 212 3b; Supplementary Figure 2). These results suggest that for these two datasets a more appropriate 213 cutoff for statistical significance among low-frequency and rare variants would be around 5×10 -10 or 214 5×10 -11 rather than 5×10 -8 , which is generally used as the significance threshold for common variants. 215 As shown in the differences between the UKB and MGI FDR estimates, FDRs will likely vary across 216 datasets depending on the variant frequencies, sample size, and correlation structure of each dataset. 217 Instead of recommending a universal significance threshold for biobank studies that does not take 218 into account differences in biobank features, we suggest using FDRs to provide a customized level of 219 confidence for each association given its specific discovery dataset, MAF, and p-value. Since only 220 one permutation is required to achieve a stable FDR estimate, our permutation analysis can be run 221 alongside a primary genetic association study with manageable additional computation time and cost. 222 Moreover, our method is applicable to both binary and quantitative traits, any association software 223 properly calibrated for the data being analyzed, datasets with related individuals, and multi-ancestry 224 datasets, making it useful on a broad spectrum. We believe that publications of genetic association 225 study findings should include the estimated probability of success suggested by FDR estimates along 226 with the primary association study results whenever possible. This process will contextualize genetic 227 association study results for any dataset regardless of its multiple testing context, correlation 228 structures, or proportion of rare variants. 229

UKB 231
Our analysis included data from 407,753 "White British" participants drawn from the full 232 UKB cohort released in July 2017. 1 Participants were genotyped on an Affymetrix Axiom array with 233 820,967 variants. Non-genotyped variants were imputed using the TOPMed imputation reference 234 panel and filtered to remove variants with R 2 ≤0.3 and/or MAF≤0.01% for a total of ~37,000,000 235 variants analyzed across each phenotype. [20][21][22] We specified individuals of "White British" ancestry 236 using the original definitions provided by UKB. 1  and/or MAF≤0.01% for a total of ~32,000,000 variants analyzed across each phenotype. 20, 23 We

Permutation and Association Analyses 264
Overview 265 Both the UKB and MGI analyses utilized genotype and EHR-derived phenotype data for n 266 participants (n UKB =407,753, n MGI =42,167) and p phenotypes having case count >50 (p UKB =1,418, 267 p MGI =1,659). Because the allele frequency filters applied in the association analyses depend on 268 individuals labeled as cases and controls for each phenotype, every phenotype was analyzed with a 269 slightly different set of variants (~37,000,000 for UKB phenotypes and ~32,000,000 for MGI 270 phenotypes). Our permutation method stratifies by inferred genetic sex and then shuffles the 271 phenotype data, along with any phenotypic covariates, to break the association with the genotype 272 vectors and any genotypic covariates. In our analyses, we included only age as a phenotypic 273 covariate and sex, PCs, and chip version as genotypic covariates, but it is possible that specific 274 phenotypes could have additional phenotypic or genotypic covariates (e.g. specific clinical risk 275 factor, batch effects). Our notation allows for refinement of the model to accommodate this scenario. 276

Notation 277
Let n be the number of participants in the dataset, m be the number of genotyped and imputed 278 variants, and p be the number of phenotypes. Let Y ij be the observation for the j th phenotype in 279 individual i where Y is an n × p matrix. Participant outcome data for the j th phenotype is stored in an 280 n-element phenotype vector Y *j , and phenotype data for the i th individual is stored in a p-element 281 individual vector Y i* . Let participant genotype data be stored in G, an n × m genotype matrix. 282 Finally, let covariate data for the j th phenotype be contained in matrices Q j and W j , where Q j is an n 283 × r j matrix with r j genotypic covariates (e.g. sex, PCs, genotyping batch) and W j is an n × l j matrix 284 with l j phenotypic covariates (e.g. age, phenotyping batch). To obtain the permuted data, we shuffle the participants' complete phenotype vectors, 293 thereby permuting case-control status while preserving the correlation structure among phenotypes. 294 Our first step in this process is to separate (Y, W) by sex into (Y, W) M and (Y, W) F to ensure 295 permutation only among individuals of the same sex, which accomodates sex-specific phenotypes 296 such as prostate or ovarian cancer. We then randomly permute the complete phenotype vectors by 297 shuffling rows among participants in each group to obtain permuted complete phenotype matrices for 298 males and females. We recombine the permuted data to make (Y, W) P , a permuted complete 299 phenotype matrix containing both males and females that comprises permuted phenotype matrix Y P 300 and permuted phenotypic covariate matrix W P . 301 Using appropriate association software (SAIGE, fastGWA, etc.), we test for association 302 between genetic markers and case-control status for each phenotype in both the primary and 303 permuted data. When using SAIGE, we employed a logistic mixed model; when using fastGWA, we 304 employed a linear mixed model: 305

SAIGE 306
Primary: where α, β, and γ are l j -length, m-length, and r j -length vectors of the unknown effects of the 312 phenotypic covariates, genotypes, and genotypic covariates respectively, and v j is the n-length 313 random effects vector for the j th phenotype. 314

Clumping and FDRs 315
After completing the association analyses for all phenotypes in both the primary and 316 permuted data, we perform clumping of the summary statistics (using PLINK1.9 27 ) with 500 kb 317 flanks around the most significant signal in that region (--clump-kb 500). This clumping yields 318 independent associations in 1 MB windows for both primary and permuted data. 28,29 We use these 319 results to calculate the FDR for the phenome at a specified significance level L (e.g. p≤5×10 -8 ) with Finally, let covariate data for the j th phenotype be contained in matrices Q j and W j , where Q j is an n 363 × r j matrix with r j genotypic covariates (e.g. sex, PCs, genotyping batch) and W j is an n × l j matrix 364 with l j phenotypic covariates (e.g. age, phenotyping batch). To obtain the permuted data, we find genetically related groups of individuals of the same 373 sex within our dataset and then shuffle the participants' complete grouped phenotype vectors, thereby 374 permuting case-control status among groups while preserving the correlation structure among 375 phenotypes. Our first step in this process is to separate (Y, W) by sex into (Y, W) M and (Y, W) F to 376 ensure permutation only among individuals of the same sex, which accomodates sex-specific 377 phenotypes such as prostate or ovarian cancer. We then use a genetic-relatedness software, such as 378 PLINK 27 or KING 31 , to group participants within each sex with their closest relatives; this process 379 will produce g complete phenotype grouped matrices denoted (Y, W) k each with dimension g k × (p + 380 l), where g is the total number of groups, g k is the number of participants in the k th group, p is the 381 length of the phenotype vector, and l is the length of the phenotypic covariate vector. We then 382 randomly permute the phenotype data by shuffling the complete phenotype grouped matrices within 383 sex to obtain permuted complete phenotype matrices for males and females (N.B. there must be a 384 sufficient number of groups within each sex containing the same number of individuals to 385 accomplish random shuffling between groups). We recombine the permuted data to make (Y, W) P , a 386 permuted complete phenotype matrix containing both males and females that comprises permuted 387 phenotype matrix Y P and permuted phenotypic covariate matrix W P . 388 The association analysis then proceeds in the same manner as the analysis for unrelated 389 individuals (Online Methods). 390

Appendix B: fastGWA Analysis 391
Researchers may wish to use faster and less computationally intensive association analysis 392 software, such as fastGWA 19 , to aid in lessening the computational burden of analyzing two datasets 393 (primary and permuted) phenome-wide; however, they must use care to employ software that is 394 appropriately calibrated to the data being analyzed because improper software choices may yield 395 inaccurate FDR estimates. To illustrate the importance of utilizing software suited to the data being 396 analyzed when calculating FDRs, we repeated our MGI analysis using fastGWA and compared it to 397 our SAIGE results. SAIGE is calibrated to account for binary data and case-control imbalances while 398 fastGWA performs best in datasets with quantitative data or binary data with balanced numbers of 399 cases and controls. Many MGI phenotypes have large case-control imbalances (case-control ratio: 400 mean=0.048, median=0.019), which led to the number of independent associations found using 401 fastGWA for the primary and permuted genetic association studies to be highly inflated (at p≤5×10  Figure 7; Figure 3b). Thus, to get accurate 410 FDR estimates it is important to pair our permutation method with software that works well for the 411 data being analyzed, and careful consideration should be given to data with binary outcomes, case 412 control imbalances, and a large proportion of rare variants. 413

Appendix C: Lack of Replication 414
In our reciprocal replication analyses of all significant independent associations in UKB and 415 MGI, we evaluated 3,285 UKB and 1,010 MGI associations for replication in the other biobank 416 (Online Methods). In both replication analyses most associations (~80%) with FDRs 0-50% 417 replicated in direction of effect, regardless of p-value; a large proportion of these associations 418 (UKB=44%, MGI=71%) also replicated at nominal significance (p≤0.05) (Supplementary Figure 3). 419 Interestingly, in both analyses the associations with FDRs 51-100% replicated in direction of 420 effect (regardless of p-value) less than 50% of the time, the proportion we would expect purely by 421 chance (Supplementary Figure 3). This 51-100% FDR category primarily contains rare variants 422 (0.0001≤MAF<0.001: UKB=65%, MGI=63%). Since most traits have a much lower number of cases 423 than controls (case-control ratio: mean UKB =0.007, mean MGI =0.048; median UKB =0.002, 424 median MGI =0.019), any rare alleles that have no effect on the disease are expected to occur primarily 425 in controls. Under this Null Hypothesis, the only way rare variants can show a highly significant 426 association is by being over-abundant in cases. Thus, when we condition on rare variants having a 427 significant association (i.e. when we attempt to replicate rare variants that have significant p-values 428 in the discovery dataset), we implicitly condition on an increased frequency among cases, which we do not condition on a highly significant p-value when observing these associations in the 435 replication datasets. Thus in the replication datasets, variants that do not affect the trait of interest 436 will follow the Null Hypothesis and the rare allele will occur primarily in controls, resulting in a 437 negative effect size. This combination of an excess of rare-variant tests in noncausal regions and a 438 case-control imbalance in most phenotypes leads to many significant associations having positive 439 effect sizes and many nonsignificant associations having negative effect sizes, resulting in fewer than 440 50% of the replication-dataset associations having the same direction of effect as the discovery-441 dataset associations. 442 We also observed a lower replication (p≤0.05) rate than expected among UKB associations 443 with FDR UKB =0, which we would expect to replicate nearly 100% of the time (Figure 4a) Figure 10); UKB also showed a similar signal near HFE for "disorders of mineral 457 metabolism," but MGI had no significant associations for "disorders of mineral metabolism" and 458 only replicated (p≤0.05) 8% (1/13) of UKB associations with FDR UKB =0 near this signal 459 (Supplementary Figure 11). The phenotypes appear to be closely related in UKB, with associations 460 occurring in the same region of chromosome 6. This signal is also picked up in the association 461 analysis of MGI's "disorders of iron metabolism," but not in the analysis of MGI's "disorders of 462 mineral metabolism." The lack of signal in MGI's "disorders of mineral metabolism" suggests that 463 this association study is unexpectedly capturing a group of participants with an underlying phenotype 464 that is dissimilar to the other three association studies despite the similarity of their phecodes. This 465 discrepancy among phecodes would make replication using MGI's "disorders of mineral 466 metabolism" phenotype virtually impossible. 467 Another lack of replication occurred for the phenotype "benign neoplasm of colon" (phecode 468 208; cases UKB =20,121, controls UKB =384,292; cases MGI =8,083, controls MGI =33,652), which may be 469 due in part to different treatment techniques in the UK and Michigan. The United States has 470 traditionally taken a more aggressive stance towards colon screening than the UK, with the focus in 471 the former being on cancer prevention and in the latter on cancer detection. 32 The preventative colon 472 cancer treatments common in the US would result in not only including people who manifest 473 concerning symptoms in the screening for and removal of benign neoplasms, but also including 474 individuals who, though having no identifying symptoms or genetic predisposition for colon cancer, 475 meet the US criteria for preventative care. These asymptomatic people who had surgery to remove a 476 benign neoplasm of the colon would be included as cases in the MGI analysis. In contrast, the UK's 477 focus on cancer detection would result in largely symptomatic people being included in the UKB 478 analysis. The overall quality of the UKB data, therefore, would be superior to the MGI data for 479 detecting a genetic predisposition towards neoplasms of the colon since the MGI sample is diluted by 480 people who are undergoing routine preventative care. In light of the potentially different populations 481 composing the studies -along with a lack of power arising from unequal sample sizes -it makes 482 sense that only 22% (2/9) of UKB associations with FDR UKB =0 replicated in MGI (Supplementary 483 Figure 12). 484 Finally, many failures to replicate UKB signals are most likely due to a lack of power, which 485 is unsurprising when the replication dataset is ~1/10 the size of the discovery dataset. Using the 486 phenotype prevalences manifested in UKB and the corresponding allele frequencies, case and control 487 numbers, and effect sizes in MGI, we calculated the power of replicating the UKB associations in 488 Figure 1. Single-iteration permutation method and study design. a) Single-iteration permutation method including the permutation process, association studies of primary and permuted data, 1 MB positional clumping, and calculation of the FDR. b) Study design for the single-iteration permutation method including analysis of primary and permuted UKB and MGI data and utilization of the two association software packages SAIGE and fastGWA.      Table 1. Replication of selected significant (p≤5×10 -8 ) independent associations in the UKB and MGI association analyses. Each dataset alternatively acts as a discovery and replication dataset. The rsIDs, case and control counts, MAFs, FDR, and p-values are given for the most significant association within a 1 MB window in the discovery dataset. a) Replication of selected UKB associations in MGI. b) Replication of selected MGI associations in UKB. Table 2. Estimated CPU hours and cost for permutation analyses of 1,418 UKB and 1,659 MGI phenotypes. In-house computing cluster located at the University of Michigan. Web-based computing cluster is the Google Cloud Platform. Estimates for both clusters given for n1-standard machines.