Association of rare, recurrent nonsynonymous variants in the germline of prostate cancer patients of African ancestry

Although men of African ancestry (AA) have the highest mortality rate from prostate cancer (PCa), relatively little is known about the germline variants that are associated with PCa risk in AA men. The goal of this study is to systematically evaluate rare, recurrent nonsynonymous variants across the exome for their association with PCa in AA men.


| INTRODUCTION
Prostate cancer (PCa) is one of the most common types of cancer. 1,2 Men of African ancestry (AA) have the highest risk among all racial and ethnic groups and have a 1.7-fold and 2.1-fold higher risk than men of European ancestry (EA) for developing and dying of the disease, respectively. 2 While the etiology for developing PCa is still largely unknown, inherited risk is one of the strongest known risk factors for PCa, with an estimated heritability of 57%. 3 Major advances in understanding risk conferred by specific inherited variants for PCa were made in the last two decades. 4 Specifically, array-based genome-wide association studies (GWAS) in multiple racial populations have identified more than 269 independent risk-associated single nucleotide polymorphisms (SNPs) for PCa. 5 These risk-associated SNPs are typically common in the population (minor allele frequency >1%) but associated with modest to moderate PCa risk (<1.5-fold). In contrast, sequencing-based studies for identifying rarer mutations with larger effects on risk have had limited success. Most mutations implicated as major PCa risk alleles to date were identified from candidate gene studies and are located in genes involved in hereditary breast and ovarian cancer syndrome (HBOC) such as BRCA2, ATM, and CHEK2. 6-10 These mutations confer stronger risk but are extremely rare. The only rare but recurrent PCa-specific mutation (G84E in HOXB13) was discovered from a genetic linkage study by our collaborative group a decade ago. 11,12 This mutation is present in approximately 0.3% subjects of EA and confers approximately 5-fold risk. 9 Recently, a different mutation (X285K) of this gene was found in approximately 0.3% AA men and was associated with 2.4-fold PCa risk. 13,14 Identification of novel pathogenic mutations for PCa requires systematic evaluation of variants across the exome in a large number of PCa cases and controls. However, due to relatively high cost of whole exome sequencing (WES), the sample sizes of published PCa WES studies in EA and AA men have remained relatively small. [15][16][17] An alternative approach to increase statistical power, at least for control populations, is to leverage a large number of WES data available from population databases. In this study, we combined WES data of two independent AA PCa patient cohorts sequenced by our group with that of African/African American (AFR) control subjects from The Genome Aggregation Database (gnomAD). We adopted several procedures to reduce heterogeneity among different sources of WES data. Our goal is to identify novel rare, recurrent nonsynonymous variants (missense and nonsense with a carrier rate of 0.5%-1%) that have suggestive evidence for association with PCa risk in AA men.

| Study strategy
The primary analyses were association tests of rare, recurrent nonsynonymous variants in the exome with PCa risk in WES data between AA PCa cases and population controls ( Figure 1). We adopted a strategy to maximize the statistical power of identifying Briefly, we first identified all nonsynonymous variants that passed quality control (QC) measures and were present in both case cohorts, with a carrier rate between 0.5% and 1%. We then applied several procedures to further ensure quality of variants for association tests by selecting those with 1) observed and passed QC measures in AFR of the gnomAD, and 2) had similar estimates of carrier rates from WES and whole genome sequencing (WGS) in gnomAD AFR (p > 0.05 after Bonferroni correction). Finally, we performed association tests for these selected variants between cases and controls using Fisher's exact test and identified candidate PCa risk-associated variants with a false discovery rate (FDR) adjusted p-value ≤ 0.05.

| Study subjects
Two self-reported AA PCa patient cohorts were studied. The first was a hospital-based cohort from the Johns Hopkins Hospital Subjects from gnomAD (v2.1) were used as population controls.
WES data from 8128 AFR control subjects were used for the primary association test in this study because it is the same sequencing method as in PCa cases. WGS data from 4359 AFR subjects were used only for the purpose of selecting high-quality variants (similar carrier frequencies between WGS and WES).  Several QC criteria were applied to ensure the quality of called nonsynonymous variants in PCa patients, including call rate ≥90%, mean depth ≥20, and proportion of alternative alleles ≥30%. For variants in gnomAD, in addition to the above criteria, true positive probability ≥0.5 was also used.

| Statistical analysis
Statistical analysis: Considering the rarity of nonsynonymous variants, association of PCa risk with each variant was performed using Fisher's exact test. FDR-adjusted p-value (q-value), implemented by R Stats package, was used to control for multiple tests in the study. 20 The FDR procedure has greater statistical power than other familywise error rate controlling procedures (such as the Bonferroni correction) at the cost of increased numbers of Type I errors. A q-value of 0.05 was used to declare statistical significance, that is, the false discovery rate for variants with q-value ≤0.05 in the study is less than or equal to 5%.
We calculated genetic eigens (principal components) based on 209 ancestry informative markers using PLINK to infer genetic background for each self-reported AA PCa patient. 21 In addition, the percentage of AA was estimated using Admixture (Version 1.3) for each PCa patient.

| RESULTS
Key demographic and clinical information of AA PCa patients from the two case cohorts are presented in Table 1. The percentage of African ancestry was similar between the two case cohorts (81% and 83% in patients from Hopkins and Detroit, respectively). We further evaluated other P/LP and protein-truncating mutations within these two implicated genes and compared their frequency between our PCa cases and gnomAD controls (Table 3).
For GPRC5C, a significantly higher P/LP mutation carrier rate was found in 1707 PCa cases (5 carriers, 0.29%) than that in 8128 controls (6 carriers, 0.07%), OR for PCa was 3.76, p = 0.03. For IGF1R, no P/LP mutation was found in PCa cases.
To confirm these observations in an independent AA population, we evaluated these two missense changes in AA subjects of UKB (132 PCa cases and 1,184 controls). GPRC5C R14Q was found one time each in cases (0.76%), and controls (0.08%), OR for PCa was 9.00 (95%CI 0.56-145.23), p = 0.19 (Table 2). IGF1R R511Q, however, was not found in cases nor controls. For P/LP mutations in these two genes, only one P/LP mutation in GPRC5C (p.Q15X) was found in two AA men in the UKB controls (0.17%).
These two mutations were also found in subjects of European were significantly more common in cases (24.4%) than controls The FDR procedure has greater statistical power than other methods for controlling multiple tests (such as the Bonferroni correction). 20 Because an FDR q-value of ≤ 0.05 was used to declare statistical significance, our findings remain susceptible to false positives, but the chance is less than or equal to 5%. The low falsepositive rate of this FDR procedure is demonstrated in our analyses of all P/LP mutations in large number of European subjects from the UKB. By analyzing 39,302 P/LP mutations in between 5895 PCa cases and 78,145 controls, only HOXB13 G84E was significantly associated with PCa risk at FDR q-value < 0.05.
The two significant variants found in this study had low carrier rate in AA. However, these carrier rates were similar to the recently established pathogenic mutation of HOXB13 X285K in AA. 13,14,25,26 In terminus that seems to be crucial for ligand binding and activation. 29 The mutation (R to Q) is located in the 14th amino acid, in the longer isoform. While this region of the protein containing the mutation is highly basic, its function is unknown.
Whether or not R14Q alters processing of GPRC5C as a transmembrane protein or binding to putative ligands will requires further study.

CONFLICTS OF INTEREST
The authors declare no conflicts of interest.