Study Cohorts
The PAGE consortium includes several studies. In brief, PAGE consists of multiple populations grouped by self-identified race and ethnicity, European Americans (EA), African Americans (AA), Hispanic Americans (HA), Native Americans (NAm), East Asians (ASN), and Native Hawaiians/Pacific Islander (NH) (CALiCo-SOL and Fernandez-Rhodes ; Manolio 2009; Matise et al. 2011). All participating sites in PAGE ascertained both males and females except for the females only Women’s Health Initiative (WHI). Analyses were performed in all populations combined via meta-analyses.
The UKBB is a population-based study of citizens of the United Kingdom (Sudlow et al. 2015). Approximately half a million individuals, primarily of European ancestry, were recruited between 2006 to 2010. Genetic and phenotypic characteristics of UKBB individuals used for GWAS were defined previously (Bycroft et al. 2018). We randomly selected a subset of unrelated European ancestry individuals for GWAS to be comparable to our sample sizes in PAGE, thereby removing sample size as a factor contributing to differential results (Sup. Table 1).
Phenotyping And Quality Control
We studied two anthropometric phenotypes: BMI (kg/m2; a measure of overall adiposity) and waist-to-hip ratio adjusted for BMI (WHRBMI-adj), a proxy of central obesity. In 16 of 17 studies that contribute to PAGE, height and weight were measured by study staff at study enrollment, to calculate BMI (weight/height2) (Sup. Table 1). In the remaining Multi-ethnic Cohort (MEC), BMI is based on self-reported height and weight at enrollment. Pilot analyses of BMI in MEC illustrated a comparable distribution to national surveys (Gorber and Tremblay 2010). Waist circumference was measured at the level of the natural waist in horizontal plane to the nearest 0.5 cm (Carty et al. 2014); no waist or hip circumference measurement was available for the BioMe sub-cohort.
Recruitment and data collection in the UKBB sample has been previously described (Biobank 2007; Bycroft et al. 2017). UKBB participants were randomly selected from those study participants that self-identified as European and that clustered within the 1000 Genomes Europeans (EUR) ancestry population when applying a k-means clustering approach to genotype data (n = 451,337). As we grouped study participants by self-reported race /ethnicity, we additionally excluded those UKBB participants that did not self-report as European (n = 32). We excluded women who were pregnant or unsure if they were pregnant. We removed any BMI or WHR measures ± 6 SD from the mean by sex.
Genotyping And Quality Control
In PAGE, approximately 50,000 individuals were genotyped using the Multiethnic Genotyping Array (MEGA) panel, as previously described (Bien et al. 2016). The remaining PAGE individuals were genotyped with Affymetrix arrays. After quality control, genomic imputation was performed using the 1000 Genomes Phase 3 reference population; details are accessible here (Hu et al. 2021).
In the UKBB, genotyping was performed using either the Applied Biosystems UKBB Lung Exome Variant Evaluation (UK BiLEVE) Axiom Array or UKB Axiom Array (Bycroft et al. 2018). The genotypes were imputed using IMPUTE4 with a combination of reference panels: i) the Haplotype Reference Consortium and ii) UK10K and the 1000 Genomes Phase 3 (Bycroft et al. 2017). For this study, we excluded non-autosomal genetic variants, those with poor imputation (R2 < 0.4), effective sample size < 30, or MAF of < 0.05. Criteria used for calculating effective sample size for each variant are defined in Appendix. Approximately 32 million of the 60 million variants had MAF < 0.01 and were removed from analyses.
Gwas And Meta-analyses
In PAGE, GWAS were performed with SUGEN (Lin et al. 2014). In UKBB, GWAS were performed using SAIGE (Zhou et al. 2018). Age, sex (BMI only), study center (PAGE only) and the first 10 principal components (PC) of ancestry were included as covariates. In PAGE, GWAS were initially performed in sex-combined (BMI only) or sex-stratified self-identified race/ethnic group separately, and subsequently meta-analyzed across sex (WHRBMI-adj) and self-identified race/ethnicity groups (both BMI and WHRBMI-adj) using METAL (Willer et al. 2010). In UKBB, GWAS were similarly performed in a subset with approximate sample size to match PAGE (Sup. Table 1). For WHR, we additionally adjusted associations for BMI (WHRBMIadj).
Selection And Configuration Of Genomic Regions For Fine-mapping
For each trait/group combination, variants exceeding GWAS significance level (p < 5x10-8) were extracted from PAGE and UKBB summary statistics and compiled into a single list (Sup. Table 2). GWAS variants were then LD pruned (threshold R2 > 0.1) and grouped into independent clusters each representing a potential functional region. Variants with the lowest p-value in each cluster were selected as the index variant for that cluster (see Box 1), irrespective of whether this association was observed in the UKBB or PAGE results). We further restricted analyses to loci where both PAGE and UKBB overlapping regions harbored at least one variant exceeding GWAS significance level in both populations. The minimum base pair (bp) distance between each pair of adjacent index variants was ≥ 300 kbp. Therefore, each functional locus was defined as the set of variants that were located ≤ 150 kbp from each index variant for that region (Sup. Figure 1). This method was applied to each phenotype and ancestry group combination. Hence, each genomic region probed in UKBB GWAS was similarly fine-mapped in PAGE in the same overlapping genomic region (i.e., in pairs).
Fine-mapping And Sensitivity Analyses
For both fine-mapping and sensitivity analyses, we used summary statistics and SLALOM (Kanai et al. 2022). This method incorporates an Approximate Bayesian Factor (ABF) (Wakefield 2007, 2009) for fine-mapping which estimates a posterior inclusion probability (PIP) for each variant and derives the smallest possible 95% and 99% causal set (CS) (i.e., set of variants whose cumulative posterior probability is 95% or 99%) based on p-value, and LD assuming one functional variant per locus. Additionally, GWAS statistics from PAGE are from a meta-analysis of populations with distinct patterns of ancestry and admixture, genotyped on distinct platforms and imputed separately. To assess if the heterogeneous characteristics of the contributing cohorts (e.g., difference in patterns of admixture and ancestry, sample size, genotyping, or imputation ) may have affected fine-mapping outputs in PAGE, SLALOM performed DENTIST (Chen et al. 2021) based DENTIST-Simplified (DENTIST-S) test to flag loci with suspect GWAS results. According to DENTIST-S, observed statistical significance of a variant is expected to be proportional to its LD with the lead variant (variant with highest posterior probability) assuming both belong to the set representing the same signal. The presence of variant(s) in tighter LD with the lead but higher than expected p-value suggest association outlier(s), and hence the quality of fine-mapping in that locus would be questionable. For fine-mapping, SLALOM inferred LD from gnomAD (Koch 2020) reference African, admixed American, East Asian, and European populations averaged by each populations’ study sample size when testing PAGE loci, but utilized UKBB-specific reference for these regions in UKBB. It also inferred functional annotation from Variant Effect Predictor (VEP) for each fine-mapped variant. VEP is a toolset for prioritization, and functional annotation of genomic variants in coding and non-coding regions. (McLaren et al. 2016)
Assessing Fine-mapping Efficiency
We conducted two stages of analyses as follows:
Stage 1. We extracted the variants with the highest posterior probabilities in each region (termed the lead fine-mapped variant, see Box1) within the 99% CS from both PAGE and UKBB. Using the 1000 Genome multi-ancestry and EUR reference populations, we then estimated pairwise LD between the lead fine mapped variants in the 99% CS and their LD proxies from both PAGE and UKBB, to determine if UKBB and PAGE data were representing the same signal (threshold R2 > 0.1). We also assessed whether the lead fine mapped variants were previously reported in the literature for obesity related traits using the GWAS-Catalogue (Welter et al. 2014) and PhenoScanner (Staley et al. 2016). To further assess the efficiency of fine-mapping, we performed functional annotation of all variants in the 99% CS in each region using the Combined Annotation Dependent Depletion (CADD) tool for scoring deleteriousness of both coding and noncoding variants (Kircher et al. 2014). Negative log CADD score values > 10 suggest a high probability of functionality and > 20 also have experimental evidence for functionality. We defined a best non-lead fine-mapped variant (defined here as a variant present in the 99% CS but not the lead fine-mapped variant, but with the highest CADD, see Box1) for each region. The purpose of this exercise was to compare SLALOM generated functional annotations from VEP to those we generated with LD proxies, and consideration of CADD score.
Stage 2. We performed variant annotation using expression trait quantitative loci (eQTL) evidence from obesity relevant tissues (whole blood, adipose, brain, liver, and skeletal muscle tissues) utilizing GTeX version 8 (gtexportal.org) (Consortium 2020). Complementary to fine-mapping, CADD scores, and VEP annotations, eQTLs were used to narrow and characterize likely causal variants or their close proxies. We extracted eQTL summary statistics for the lead fine-mapped and index variants in each region and assessed if they were included in 99% CS in PAGE or UKBB fine-mapping results. In regions where no significant eQTL existed in obesity relevant tissues, we alternatively searched for splice QTLs (sQTL), eQTL in other tissues, and eQTL from the literature.