Heterogeneous genetic architectures and evolutionary genomics of prostate cancer in Sub-Saharan Africa

Abstract Men of African descent have the highest prostate cancer (CaP) incidence and mortality rates, yet the genetic basis of CaP in African men has been understudied. We used genomic data from 3,963 CaP cases and 3,509 controls recruited in Ghana, Nigeria, Senegal, South Africa, and Uganda, to infer ancestry-specific genetic architectures and fine-mapped disease associations. Fifteen independent associations at 8q24.21, 6q22.1, and 11q13.3 reached genome-wide significance, including four novel associations. Intriguingly, multiple lead SNPs are private alleles, a pattern arising from recent mutations and the out-of-Africa bottleneck. These African-specific alleles contribute to haplotypes with odds ratios above 2.4. We found that the genetic architecture of CaP differs across Africa, with effect size differences contributing more to this heterogeneity than allele frequency differences. Population genetic analyses reveal that African CaP associations are largely governed by neutral evolution. Collectively, our findings emphasize the utility of conducting genetic studies that use diverse populations.


Introduction
3][4] In the United States, age-standardized incidence rates of CaP per 100,000 men are 67.3   for Asian Americans, 88.3 for Native Americans, 123.0 for European Americans and 203.2 for African Americans (AA). 5CaP mortality rates also vary substantially among global populations of African descent.Age-standardized CaP mortality rates per 100,000 men are 20.2 for West Africa, 16.3 for East Africa, 22.0 for South Africa, and 27.9 for the Caribbean. 1KhoeSan ancestry has also been associated with higher CaP risks. 6netics may explain some of the differences in CaP incidence and mortality rates. 7CaP has the highest heritability of all common cancers (51%-63%) 8 and few modi able risk factors. 9Germline genetic variation is associated with CaP incidence, aggressiveness, and prognosis. 10This variation can include private alleles, i.e., population-speci c genetic variants.For example, high penetrance founder mutations in BRCA2 impact CaP risks and responses to treatment in African populations. 11,129][20] However, most genetic studies have focused on individuals of European descent, 21,22 and existing polygenic risk scores (PRS) for CaP have less satisfactory performance when applied to populations of African descent. 23,24 is critical to study disease etiology using data derived from diverse populations, as genetic ndings often generalize poorly across populations. 25,26Clinical errors (i.e., genetic misdiagnoses) can also occur if ndings from one population are applied in a different population. 27These issues are compounded by the substantial genetic diversity found in Africa: what is true for one part of the continent may not be true for other parts of Africa. 28Population-speci c genetic architectures of CaP can arise via locus and allelic heterogeneity. 29Effect sizes in Africa can vary by ancestry, either due to genotype-by-environment interactions or epistatic interactions.31][32] Thus, there is a clear need to study the genetics of CaP at scale in multiple African populations, while also leveraging the emerging genomics capacity within Africa.
To uncover aspects of CaP genetics that are speci c to men of African descent, we conducted a pooled analysis of 3,963 CaP cases and 3,509 controls from Senegal, Nigeria, Ghana, Uganda, and South Africa (Fig. S1).This marks the largest African study of its kind, and the majority of the samples studied were genotyped on African soil.On a continental scale we tested for genetic associations with case/control status as well as CaP aggressiveness and inferred functional genomic characteristics of these hits.To identify regional differences in the genetic architecture of CaP, we compared ndings from West, East, and South Africa.We then examined whether allele frequency or effect size differences drive regional heterogeneity in the genetic architecture of CaP, before inferring the evolutionary history of African CaPassociated variants.

African datasets and population genetics
To maximize statistical power, we combined novel data from the Men of African Descent and Carcinoma of the Prostate (MADCaP) Network with existing data from Uganda and Ghana (Table 1).The MADCaP Network dataset encompasses 2,505 age-matched cases and 2,222 non-cancer controls from urban and suburban locales in Senegal, Nigeria, Ghana, and South Africa. 33MADCaP samples were genotyped using a novel genotyping array optimized for ne-mapping and detecting cancer associations in sub-Saharan Africa. 34An additional 835 cases and 667 controls from the Uganda Prostate Cancer Study (UGPCS) 30 were included, as were 623 cases and 620 controls from the Ghana Prostate Cancer Study. 32While no exclusion criteria were applied based on disease severity, 35% of cases analyzed that were diagnosed with aggressive forms of CaP (Gleason score ≥ 8).After inferring the optimal imputation panel for each dataset (Fig. S2), a total of 19,858,044 variants were tested for genetic associations with CaP.Additional details about sample accrual, data harmonization, and QC can be found in the Methods and Fig. S1.
The individuals studied here comprise a wide range of ancestries (Table 1).To characterize the genetic diversity found in our pan-African dataset, we visualized samples in principal component analysis (PCA) space (Fig. 1a).Individuals clustered into broad groups according to geography.Although some distinction can be made between samples from Senegal, Ghana, and Nigeria, West African samples were revealed by an ADMIXTURE plot.At K = 5 (Fig. 1b), ancestry components were largely strati ed by sampling location: Senegal (blue), Ghana and Nigeria (purple), Uganda (dark and light green), and South Africa (orange).Note that every individual analyzed here contains mixtures of different African ancestries.

Pan-African GWAS
A pooled meta-analysis that combined cases and controls from Ghana, Senegal, Nigeria, Uganda, and South Africa yielded 238 genome-wide signi cant associations with CaP (p-value < 5×10 -8 ).After correcting for population structure, study site, and genotype array technology, the genomic in ation factor from this analysis was negligible (1.006, Fig. S4), and there were no signi cant batch effects.Our pooled meta-analysis yielded 15 independent ne-mapped variants in three loci: 8q24.21,6q22.1, and 11q13.3 (Fig. 2a and Table 2).Six of these ne-mapped CaP associations featured alleles that are private to Africa, (i.e., they are effectively monomorphic in non-African populations in the 1000 Genomes Project; 1KGP). 35or example, rs59825493 at 8q24.21 has a frequency of 0% for the risk-increasing T allele in Europe, East Asia, and South Asia, but a frequency of approximately 26% within Africa.Allele frequencies at rs59825493 vary within the African continent and between cases and controls (Fig. 2b).Four of the fteen ne-mapped associations (p-value < 5x10 -8 ) are not in linkage disequilibrium with any previously known CaP associations: rs114705582, rs7833560, rs61732842, and rs16902043 (Table 2).We also performed a pooled mega-analysis of 3,963 cases and 3,509 controls that incorporated a uniform minor allele frequency lter, which yielded further support for African CaP associations at 8q24.21, 6q22.1, and 11q13.3(Methods and Fig. S5a).The SNP heritability explained by this pan-African study was 16%.The 8q24.21 locus contains 214 genome-wide signi cant variants for case/control status (Fig. 3a).These variants span a 331kb region that contains three recombination hotspots in genomes of African ancestry (chr8:126887955-127219700). 36The six SNPs with the strongest p-values in our study (rs72725854, rs59825493, rs116719898, rs16902003, rs71520637, and rs114705582) are Africa-speci c polymorphisms.Fine-mapping of the 8q24.21region revealed thirteen independent CaP-associated SNPs, while conditional analysis using COJO (conditional and joint analysis) implicated rs72725854 and rs72725879.These differences are due to complex patterns of linkage disequilibrium (LD) at 8q24.21, including pairs of SNPs that have low values of r 2 and high values of D´ (Fig. S6).When we analyzed the lead SNPs at this locus using a haplotype network framework, we found that individuals who inherited haplotype XI (Fig. 3b) had a relatively high risk of CaP (OR = 2.44, 95% CI: 1.90-3.14).Three of the genome-wide signi cant variants at 8q24.21 are not in LD with known GWAS hits (rs114705582, rs7833560, and rs61732842), while a fourth (rs1948915) is in LD with a SNP that has previously been associated with multiple myeloma. 37The nine remaining top hits at 8q24.21 are either exact matches or in LD with previously published CaP GWAS hits.[41] The 6q22.1 locus contains 60 genome-wide signi cant associations, including multiple SNPs that are in high LD with prostate eQTLs (Fig. 3c).The credible interval for this region spans chr6:116772177-116920858.Fine-mapping at this locus implicated a single independent SNP that has previously been associated with CaP (rs339321).Twenty of the top 60 SNPs at 6q22.1 are in the intronic region of RFX6, a gene that is correlated with tumor progression, metastasis, and biochemical relapse of CaP when upregulated by HOXB13. 42Similarly, two of the top 60 SNPs at 6q22.1 (rs6901971 and rs2274911) are in the exons of GPRC6A, a gene that has been shown to accelerate CaP tumor proliferation. 43e 11q13.3 locus contains four genome-wide signi cant associations for case/control status (Fig. 3d).
This locus is proximal to MYEOV, a hominid-speci c oncogene previously implicated in multiple cancers.44 All the associations at 11q13.3 are between MYEOV and RP11-554A11.8,and ne-mapping of the region reveals one independent SNP (rs11228580), which is a known CaP-associated variant.

Functional genetics and tests of replication
Relaxing the pooled meta-analysis p-value cutoff to 10 -5 yielded 604 variants associated with case/control status, of which 90 were independent after LD-pruning (r 2 < 0.2).This set of 90 LD-pruned hits includes some lead SNPs that reach genome-wide signi cance (Table S1).SNPs associated with CaP susceptibility in Africa include a mix of novel ancestry-speci c associations and variants that replicate previous ndings; 23 out of 90 LD-pruned associations are in linkage disequilibrium with previously reported CaP hits (r 2 < 0.2, Table S1).Noteworthy marginal associations include intronic variants in genes related to male infertility (rs4323394 in GALNTL5), 45 breast cancer (rs116541708 in ATP2B2), 46 and prostate cancer cell invasion (rs142311960 in ECE1). 47Reactome 48 pathways showing the greatest enrichment for CaP-associated variants included collagen chain trimerization, vitamin C metabolism, amine ligand-binding receptors, and reduction of cytosolic Ca 2+ levels (Table S2).Many African CaPassociated loci colocalize with ChIP-seq peaks from DNA binding experiments for transcription factors.
Five out of 90 independent CaP associations overlapped with HOXB13 binding regions, 12 overlapped with MYC binding regions, and 11 overlapped with AR binding regions (Table S3).
We also conducted both a Regulome-Wide Association (RWAS) and a Transcriptome-Wide Study (TWAS) to identify regulatory elements that are genetically correlated with CaP risk.RWAS analysis yielded summary statistics for 54,410 features that overlapped variants in our study.Two of these features reached genome-wide signi cance (Fig. S7a).The rst feature is associated with the regulation of prostate adenocarcinoma: PRAD_53245 at 8q24.21 (RWAS p-value = 5.76×10 -14 ).The second feature is associated with regulation of pheochromocytoma and paraganglioma cancer: PCPG_30379 at 6q22.1 (RWAS p-value = 1.81×10 -11 ).Our TWAS identi ed two genome-wide signi cant SNPs: rs339321 and rs2274911 (TWAS p-values = 5.17×10 -9 and 6.28×10 -9 , Fig. S7b).These TWAS hits are located at 6q22.1 and associated with the expression of ZUFSP in the kidney cortex and FAM162B in lung squamous cell carcinoma.We note that some SNPs that are signi cantly associated with regulatory activities need not be signi cantly associated with gene expression, a phenomenon that is at least partially due to the underrepresentation of African ancestry samples in GTEx.The lack of Africa-speci c eQTLs in GTEx, especially at 8q24.21, also likely contributes to why we did not observe any genome-wide signi cant prostate TWAS hits (Fig. S7c).Nevertheless, there is an overlap between RWAS and TWAS hits generated from different continental datasets.Speci cally, European RWAS and TWAS associations with elevated test statistics in Europe were enriched for elevated test statistics in Africa (Fig. S8).Although RWAS and TWAS replication analyses do not consider the direction of effect, this trans-ancestry enrichment is indicative of shared genetic effects between Europe and Africa at the same functional variants.
Using results from our pooled meta-analysis of cases and controls, we tested how well genetic variants from a leading PRS replicated in Africa.The PRS for CaP by Wang et al. 49 contains 451 CaP-associated variants that were ascertained in a multi-ethnic cohort (12.4% of the cases and 7.8% of the controls used to generate this PRS were of African descent).Under a null hypothesis of no-replication, PRS variants are expected to have p-values that are uniformly distributed.However, we observed substantial enrichment for low p-values in our pooled meta-analysis of African cases and controls (Fig. 4a, p-value < 2.37×10 -11 ).
Genetic variants from the Wang et al. 49 PRS were 6.05 times more likely to have a p-value < 0.05 than expected by chance.Further examining the characteristics of CaP associations that replicate across continental ancestries, we found that variants with large PRS weights (i.e., effect sizes) were enriched for low p-values in our pooled meta-analysis of African cases and controls (Fig. 4b).However, LD score differences, minor allele frequencies, F ST between Africa and Europe, and allele age did not have a large effect on whether PRS variants were associated with CaP in sub-Saharan Africa (Fig. 4c-f).

CaP aggressiveness
We performed a case-only meta-analysis of CaP aggressiveness using Grade Group 50 as a classi er of CaP aggressiveness.CaP was classi ed as non-aggressive if cases were in Grade Group 1 (Gleason score ≤ 6, n = 712), mildly aggressive if cases were in Grade Groups 2 or 3 (Gleason score = 7, n = 1390), and severely aggressive if cases were in Grade Groups 4 or 5 (Gleason score ≥ 8, n = 1399).Although no associations with CaP aggressiveness reached genome-wide signi cance, several peaks of marginal signi cance (p-value < 10 -5 ) were found (Fig. 2b).These marginally signi cant hits included rs149639001 and rs78479840, which are adjacent to pseudogenes.A third marginally signi cant aggressiveness hit, rs190761537, is in the intronic region of the SNTG1, a gene that is involved in cell communication.An additional GWAS contrasting severe cases (Grade Groups 4 to 5) with controls yielded no additional genome-wide signi cant associations apart from the ones already implicated in the main case/control analysis (Fig. S5b).Finally, we note that the marginal peaks associated with CaP aggressiveness are in different genomic regions than the peaks associated with case/control status (Fig. 2).This suggests that CaP aggressiveness has a different genetic architecture than case/control status.Subsequent analyses focused on variants that were associated with case/control status.

Regional differences across Africa
To compare the genetic architectures of CaP across Africa, we juxtaposed case/control GWAS results from West (1,780 cases; 1,739 controls), East (835 cases; 667 controls), and South Africa (1,348 cases; 1,103 controls).A strong GWAS peak was observed at 8q24.21 for all three regional GWAS (Fig. S5c-e).
However, the continent-wide peak at 6q22.1 is largely driven by a West African signal, and the continentwide peak at 11q13.3 is due to a combination of marginal associations in each region.The West African GWAS also yielded an additional peak at the 6q21 locus (regional p-value = 3.35×10 -8 ), with variants in the intronic region of MTRES1 which is a mitochondrial transcription regulator (lead SNPs: rs77426886 and rs147055316).Comparisons of Manhattan plots for West (Fig. S5c), East (Fig. S5d), and South Africa (Fig. S5e) reveal differences in the genomic loci that are marginally signi cant in each region.This suggests that locus heterogeneity may contribute to differences in the genetic architecture of CaP across Africa.Notable marginally signi cant (p-value < 1×10 -5 ) region-speci c associations include rs150430268 at the RGS6 gene (an essential tumor suppressor) 51 in West Africa, rs144499050 in the gene PTPN2 (a known regulator of in ammation and cancer) 52 in East Africa, and rs7905960 at the ADAM12 gene (a predictor of chemoresistance and metastasis in ER-negative breast cancer) 53 in South Africa.

Heterogeneity in CaP genetic architecture across Africa
To further explore differences in the genetic architecture of CaP across West, East, and South Africa, we compared the relative importance of different genomic loci by calculating the contribution of the top 90 independent associations (p-value < 1×10 -5 ) to the genetic variance of CaP.For each region, this yielded a set of 90 genetic variance proportions (gvp).Visualizing gvp statistics in a heatmap reveals both similarities and differences in the genetic architecture of CaP across Africa (Fig. 5a).While the 8q24.21locus is a key driver of CaP risk in all three regions of Africa, its relative importance is larger in West and East Africa than in South Africa.Other marginal CaP associations exhibit population-speci city in their gvp, including X-linked variants that contribute more to CaP risk in East Africa than in other regions (Table S1).To formally test whether the genetic architecture of CaP differs across Africa, we incorporated uncertainty in allele frequencies and effect sizes into our gvp statistics.Using this framework, the genetic architectures of CaP in West, East, and South Africa were represented as three clouds of points in a multidimensional space (Fig. S9).Notably, there was no overlap between any of the clouds of points, i.e., each region of Africa has a distinct genetic architecture (ANOSIM F = 0.8865, p-value = 0.001).
We further explored regional differences in the genetic architectures of CaP by comparing allele frequency and effect size estimates across Africa.Pairwise comparisons reveal that allele frequencies of CaPassociated variants are broadly similar in West, East, and South Africa (Fig. 5b-d).By contrast, substantial differences in effect sizes are observed across the continent (Fig. 5e-g).However, effect size estimates tend to be much noisier than allele frequency estimates.Incorporating noise in these estimates, we isolated the impact of effect sizes and allele frequencies on the distances between regional genetic architectures (Fig. S10).This sensitivity analysis con rmed that differences in effect sizes, rather than allele frequencies, drive differences in the genetic architecture of CaP across Africa (t-test p-values < 2×10 - sub-studies, i.e., large I 2 statistics, include rs339321 and rs72725834 (Table S1).

Population and evolutionary genetics of CaP
Evolutionary history informs our understanding of the genetic architecture of CaP.Focusing on independent marginal associations (p-value < 1×10 -5 ), we compared allele frequencies in Europe and Africa.CaP-associated variants ascertained in our pan-African GWAS include polymorphisms that have intermediate allele frequencies across the globe as well as polymorphisms where non-African populations are near monomorphic for either the protective or risk-increasing allele (Fig. 6a).Using GEVA, 54 we found that allele ages of CaP associated variants in Africa have a wide range (Fig 6b), with the youngest having a median age of ~275 generations (rs116785870) and the oldest with a median age of ~59,000 generations in African populations (rs1405425).Allele age and frequency information can be combined to explain why many CaP-associated alleles are private to Africa.Some CaP-associated variants are due to recent mutations that have not had enough time to diffuse into Europe and other continents (Fig. 6b).
Other CaP-associated variants predate the out-of-Africa migration while having negligible allele frequencies in Europe (gray circles with allele ages > 4,000 generations in Fig. 6b).Population bottlenecks and founder effects likely contribute to continental differences in the distribution of these CaP-associated alleles.
We also examined whether natural selection contributes to the heterogeneous genetic architectures of CaP.On a continental scale, this involved testing for recent positive selection via integrated haplotype scores (iHS). 55Normalized iHS values are z-statistics, and SNPs are hypothesized to follow a standard normal distribution if they are not governed by selection.For the 1KGP populations analyzed here (Fig. 6ce) we were unable to reject the null hypothesis of neutral evolution for the set of African-ascertained CaP variants (Shapiro-Wilk test of normality, p-values = 0.893, 0.555, and 0.893 for YRI, CEU, and CHB, respectively).Given this absence of strong selection, the out-of-Africa bottleneck may be one key reason why private African alleles are observed.Within Africa, we calculated population-branch statistics (PBS) for each CaP-associated variant.These statistics identify SNPs with particularly large allele frequency differences across African populations.However, none of the top 90 CaP-associated variants were outliers when compared to genome-wide distributions of PBS statistics (Table S1).Thus, population-level heterogeneity in the genetic architecture of CaP appears to be largely due to genetic drift, as opposed to natural selection.This is consistent with the late age of CaP onset (i.e., past typical reproductive ages) and previous studies that focused on European-ascertained CaP variants. 7,23

Discussion
Our ndings underscore the critical importance of conducting genetic studies of disease etiology in diverse populations.By examining the genetics of CaP across Africa, we identi ed novel associations that could not have been detected in a non-African GWAS.Although the top hits implicated in our study overlap genomic regions that contain known cancer loci (8q24.21,6q22.1, and 11q13.3),we nd evidence of substantial allelic heterogeneity, as many of the lead variants associated with CaP in Africa differ from those found in other continents.Furthermore, the relative importance of each independent CaP association varies across African populations.These differences in genetic architecture are due to multiple evolutionary phenomena, including recent mutations, genetic drift, and population bottlenecks.Our data also revealed novel CaP-associated haplotypes with large effect sizes that are unique to African populations.The existence of private alleles and population-speci c effect sizes therefore necessitate studies that include individuals from a broad range of ancestries.
Until recently there has been limited evidence of germline variants that predispose to aggressive prostate cancer.Few SNPs from GWAS have been associated with higher grade or stage disease.However, men of African ancestry in the top decile of a multi-ancestry PRS including 278 risk variants had a signi cantly higher risk of aggressive CaP (OR = 1.23). 56We similarly did not identify signi cant single variant associations with aggressive disease.This observation is important because our cases had not undergone screening for prostate cancer and were mostly Grade Group 2 or higher.This suggests that prior studies of aggressive CaP did not fail to detect associations because of the nature of their sampling design.Additionally, we note that rs62113212, an intronic variant in KLK3 that has previously been associated with aggressive CaP, 57 is near monomorphic in Africa -which explains why it did not come up in our GWAS of different grade groups.
Our results also have implications for the genetic architecture of CaP (and possibly other diseases) in AA.
Due to the historical legacy of the trans-Atlantic slave trade, AA are genetically most similar to the West African subset of our data.However, one key difference between African and AA populations is that the latter contain admixed genomes with European and Native American ancestry. 58AA also experience substantively different lifestyles and environmental exposures than individuals living in sub-Saharan Africa.These differences may explain why some of our ne-mapped GWAS hits were not implicated in prior studies.While this may in part be due to limited power to detect small effect sizes, other explanations are possible, including epistatic interactions with different genetic backgrounds and genotype-by-environment interactions.Rare variants are also thought to play a large role in the genetic etiology of CaP. 59Because these variants tend to have a recent evolutionary origin and limited geographic breadth, future studies focusing on diverse African populations are likely to bene t from increased sample sizes and/or family-based approaches.
The results presented here advance our understanding of the genetic etiology of a complex disease which has an uneven burden across the globe and within the African continent.It is important to study disease associations in populations that have the highest disease burden, as they may harbor alleles that are missing from other, lower-risk, populations.Because CaP screening is essentially non-existent in Africa, the natural history of CaP (and genetic associations of CaP) can be studied in the absence of early detection.The presence of private alleles helps explain why existing polygenic scores for CaP are less portable to men of African descent.Evolutionary analyses explain why some CaP variants are not seen in European populations, and thus in part explain the higher incidence of CaP in unscreened high-risk African populations.Our study demonstrates the bene ts of conducting genetic studies in diverse understudied populations.Future studies will bene t from polygenic predictions that utilize ancestry-speci c information, helping to remedy existing disparities in genomic medicine.

Declarations Inclusion and Ethics
This manuscript is the result of a multinational collaboration including investigators from the US and Africa.Financial support was obtained from the National Institutes of Health (U01-CA184374).This research was conducted under local IRB approvals at each of the study centers.The authors declare no competing interests.
Fig. 1a.Clustering of individuals in PCA space was indifferent to genotype array technology: Ghanaian samples genotyped on the Illumina Omni 5 Array and MADCaP Array clustered together, as did Ugandan

Figures
Figures

Figure 1 Population
Figure 1

Figure 5 Regional
Figure 5

Table 2 .
Lead GWAS SNPs associated with CaP in African menFifteen independent ne-mapped SNPs reached genome-wide signi cance in our pan-African metaanalysis of CaP cases and controls.Odds ratios include 95% con dence intervals and allele frequencies indicate the frequency of effect alleles in controls from West Africa, East Africa, South Africa, and Europe.