Differences in somatic TP53 mutation type in breast tumors by race and receptor status

Somatic driver mutations in TP53 are associated with triple-negative breast cancer (TNBC) and poorer outcomes. Breast cancers in women of African ancestry (AA) are more likely to be TNBC and have somatic TP53 mutations than cancers in non-Hispanic White (NHW) women. Missense driver mutations in TP53 have varied functional impact including loss-of-function (LOF) or gain-of-function (GOF) activity, and dominant negative (DNE) effects. We aimed to determine if there were differences in somatic TP53 mutation types by patient ancestry or TNBC status. We identified breast cancer datasets with somatic TP53 mutation data, ancestry, age, and hormone receptor status. Mutations were classified for functional impact using published data and type of mutation. We assessed differences using Fisher’s exact test. From 96 breast cancer studies, we identified 2964 women with somatic TP53 mutations: 715 (24.1%) Asian, 258 (8.7%) AA, 1931 (65.2%) NHW, and 60 (2%) Latina. The distribution of TP53 mutation type was similar by ancestry. However, 35.8% of tumors from NHW individuals had GOF mutations compared to 29% from AA individuals (p = 0.04). Mutations with DNE activity were positively associated with TNBC (OR 1.37, p = 0.03) and estrogen receptor (ER) negative status (OR 1.38; p = 0.005). Somatic TP53 mutation types did not differ by ancestry overall, but GOF mutations were more common in NHW women than AA women. ER-negative and TNBC tumors are less likely to have DNE+ TP53 mutations which could reflect biological processes. Larger cohorts and functional studies are needed to further elucidate these findings.


Introduction
Tumor protein 53, encoded by TP53, is a transcription factor with tumor suppressive activity that regulates genes in response to cellular stress. Pathways regulated by TP53 include cell-cycle check point, senescence, DNA repair, cell metabolism, and apoptosis. Somatic mutations in TP53 are the most common genetic abnormality in multiple cancers. TP53 is mutated in 40-60% of breast cancers [1][2][3]. Mutated TP53 is a negative prognostic factor and is associated with aggressive triple-negative breast cancers (TNBCs) and basal-like breast cancers [4,5].
Over 80% of TP53 mutations are missense mutations with consequences that differ depending on mutation position and amino acid change [6]. Pathogenic somatic mutations in TP53 often disrupt DNA-binding capability, impair transcriptional activity, and result in other 1 3 loss-of-function (LOF) effects. However, a subset of missense somatic variants demonstrate new gain-of-function (GOF) activities. GOF activity is typically mediated by the mutant protein binding to other tumor suppressive or oncogenic proteins or to novel regulatory regions [7]. GOF mutations result in accelerated tumor onset, metastasis, drug resistance, and poorer survival outcomes [8,9]. TP53 missense mutations can also display dominant negative activity (DNE), in which a mutant TP53 protein disrupts the activity of non-mutant protein partners including TP63 and TP73 during tetramerization [10]. DNE is more common of hotspot mutations, sites where approximately 30% of somatic TP53 mutations occur and may contribute to accelerated loss of heterozygosity and tumor progression [11]. Because the importance of TP53 mutations has been well established for decades, there are abundant functional studies identifying LOF, GOF, and DNE activity for specific TP53 mutations.
As TNBCs are more common in breast tumors from AA women than NHW women, and TP53 mutations are more frequently observed in TNBC than other subtypes, it is not surprising that the proportion of all breast tumors with TP53 mutations is 1.5-to 1.6-fold higher in AA than NHW women [22][23][24]. While there has been extensive research about overall TP53 somatic mutation frequency by race, there has been little investigation to determine if there are differences by TP53 mutation type. Given that TP53 mutation effects can impact prognosis, mutation type is an important consideration [7,9,25]. Because of the differences in clinical outcomes between AA women and NHW women, even after accounting for subtype differences, and the literature supporting different outcomes for GOF versus LOF TP53 mutations, we hypothesized that there would be frequency differences in types of TP53 mutations across racial and ethnic groups. To test this hypothesis, we compared the racial distribution of TP53 mutation type in breast cancer using existing published and unpublished datasets.

Summary of data
This study was approved by the Ohio State Cancer Institutional Review Board. Data for this study were ascertained from multiple sources including The Ohio State University Total Cancer Care repository, existing data in publicly accessible databases, existing data in publications, and unpublished data contributed by study authors. A description of all included studies is detailed in Supplementary Table S1.

Study inclusion and exclusion criteria
We included data from women with breast tumors with somatic TP53 mutations and available ancestry information. For inclusion, all studies must have sequenced tumor DNA for TP53 using any method (Sanger sequencing or next-generation targeted, exome, or whole genome sequencing) and at least included exons 5-8 which contain the majority of TP53 mutations [26].
We excluded studies that used immunohistochemistry or other non-DNA-sequencing based methods to infer TP53 mutation genotype. All likely invasive stages, grades, and morphologies of primary breast tumor were considered. Non-invasive ductal carcinoma in-situ tumors were excluded because only ~ 40% of these lesions progress to invasive cancer which could vary by TP53 mutation status [27]. Data were annotated with race and ethnicity by the original authors or were from homogeneous populations. Studies were excluded if they lacked ancestry data or represented a unique population that was underpowered to detect differences. If available, patient age, tumor grade, stage, receptor status, and morphology data were collected. We considered studies of any design that fit the criteria, including population and clinical-based studies.

Data from publicly accessible databases
We identified individuals and studies in databases with somatic mutation information that met inclusion criteria. From the International Agency for Research on Cancer (IARC) [28], 1254 individuals from 78 studies met inclusion criteria, as well as 333 from The Cancer Genome Atlas (TCGA), and 637 from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [29,30] (Supplementary Table S1).

Data from literature review
To obtain additional data from previously published work, we conducted a literature search for studies in which individual level TP53 data and race/ethnicity information were available. A PubMed search using "TP53 and race" identified 277 articles. We excluded articles: (1) with samples already captured from database search; (2) of cancers other than breast; (3) on germline variants or polymorphisms in TP53; and (4) where study inclusion were not met. For studies without individual level race and/or ethnicity information, we contacted authors by e-mail to request this information. All studies included are listed in Supplementary  Table S1.

Categorizing TP53 missense variants
We used a standardized approach to evaluate findings from IARC and other TP53 literature. All missense variant annotations were based on existing functional studies with cell culture, yeast, or animal experiments; we did not consider in-silico testing alone for inclusion. However, in cases of mutations with uncharacterized function, we utilized PHANTM (Broad Institute) to exclude variants with a predicted function close to wildtype TP53 (maximum PHANTM score < 0, ~ 50 variants) [11]. We excluded wellestablished germline polymorphisms, such as p.P72R, and mutations with activity comparable to wildtype TP53. TP53 mutations were categorized by function (GOF or LOF) and dominant negative activity (DNE+ or DNE−) as separate criteria.
We described function as GOF or LOF. GOF mutations resulted in significantly different activities from both TP53 null and TP53 WT proteins such as novel transcript activity, TP73 interference, growth advantage, and facilitation of oncogene activity. LOF mutations had evidence of protein truncation, loss of tetramerization, or activity comparable to TP53 null. When we found reports of both LOF and GOF activities, but not direct contradictions for the same TP53 function, we categorized variants as GOF. Variants with limited functional data available and PHANTM prediction scores that differed from wild type were annotated as unknown. DNE+ variants were those with published evidence that the TP53-mutant protein interfered with TP53 WT function in heterozygous cells. DNE+ mutants formed heterotetramers with TP53 WT units and changed TP53 WT function, causing a dominant GOF or LOF effect. DNE+ mutations may also interfere with TP63 and TP73, and therefore, may have unique biological impact beyond GOF [10]. Transcript-truncating mutations, such as nonsense, splicing, frameshifts, and large deletions, were assumed to be LOF without DNE because these mutations will activate nonsense-mediated decay and result in loss of a functional protein. Hotspot codons in the DNA-binding domain may have different functional properties than missense mutations elsewhere [26,31]. These sites were at positions 175, 245, 248, 249, 273, and 282, and CpG hotspot mutations were defined as C to T transitions at those codons plus R158H and P152L as described [7]. CpG hotspot mutations were studied separately because they may be part of a mutational signature [26,32].
For tumors with multiple TP53 somatic mutations, we considered the sum of multiple predicted effects. If any mutations were DNE+, the tumor was considered DNE+. GOF and LOF mutations were prioritized over unknown or functional mutations. Tumors with both GOF and LOF mutations were called GOF/LOF.

Statistical analyses
A Fisher's exact test for count data was used for comparisons between mutation categories (GOF/LOF, DNE/not DNE, or CpG hotspot/not hotspot) and race, TNBC status, and ER status. For comparisons of mutation categories and age, a Welch Two Sample t test was used. Analyses were run in R version 3.6.3 (2020-02-29) [33]. A comparison-wise p value of 0.05 was considered significant.

Characteristics of study population
The study population is summarized in Table 1. We included somatic TP53 mutation data from 2964 breast cancers from 96 studies for analysis (Supplementary Table S1). Patients were categorized into 4 racial/ethnic groups. The study population was 65.2% NHW (n = 1931), 24.1% Asian (n = 715), and 8.7% AA (n = 258). Two percent (n = 60) of patients had Hispanic or Latina ethnicity with European or undefined race (n = 47 [1.6%] and n = 13 [0.4%], respectively). Populations excluded from analysis due to low representation included Pacific Islander, Ashkenazi Jewish, Southwest Asian/North African, Indian Asian, and Latina AA women.

3
Ages at diagnosis were available for 1969 patients. Across the study population, ages ranged from 21 to 96 years, with a median age of 54 years and an average age of 55 years. By racial/ethnic group, median ages were 49 for AAs, 47 for Asians, 56 for NHW, and 52 years for Latina women.
Only a subset of tumors had receptor data available. ER status was available for 1481 tumors, with 47.5% ER+ (n = 704) and 52.5% ER− (n = 777). A smaller subset had additional tumor information. TNBC status was available for 1221 tumors with 36% classified as TNBC (n = 439) and 64% as non-TNBC (n = 782). Data were collected for morphology, grade, and stage but were not used due to low availability across the datasets and the high number of categories.

Association of race, tumor characteristics, and age with mutation type
To determine if there were associations between race and type of mutation, we conducted Fisher's exact test for racial and ethnic ancestry by mutation categories. No significant associations were identified overall for GOF/LOF status (p = 0.15), DNE (p = 0.62), mutation hotspots (p = 0.32) or CpG sites (p = 0.52), and race (Table 2). However, association of GOF/ LOF status and race was significant when comparing GOF versus LOF in NHW and AA patients only, with NHW patients more likely to have GOF mutations (35.8% versus 29.2%, respectively, p = 0.04). We additionally tested association between ER or TNBC status and mutation type (Table 3). We identified a significant association between DNE and TNBC (p = 0.03) and related ER status (p = 0.005). ER− tumors and TNBCs were less likely to have TP53 somatic mutations that were DNE+. We did not identify associations between ER and GOF/LOF (p = 0.51), with mutation hotspots (p = 0.1514) or with CpG hotspots (p = 0.24). Patients with hotspot mutations were slightly younger, with a mean age of 53.6 years versus 55.0 years for patients with non-hotspot mutations, at a level approaching significance (p = 0.065) (Fig. 2). We did not identify significant associations with age and GOF/LOF, with a mean age of 54.5 years for GOF and 55.0 years for LOF (p = 0.52). We also found no significant association between age and DNE; the mean age was 54.5 years for DNE+, and 55.0 years for DNE− (p = 0.49).

Discussion
The goal of our study was to determine if the type of TP53 somatic mutation (GOF or LOF, DNE− or DNE+, hotspot status, and CpG nucleotide position) varied in frequency between patients of different ancestry. Considering that the overall rate of somatic TP53 mutations in breast cancer differs by race, this is an important concern for study of TP53-mutant breast tumors and differences in outcomes and treatment response by race [14][15][16][17][18]. We identified a modest difference between AA and NHW individuals, with NHWs slightly more likely to have GOF mutations. Our finding that TP53 mutations without DNE activity were associated with TNBC (p = 0.03) and ER-status (p = 0.005) is novel. This association could be due to complex interactions and shared regulation of apoptotic genes  between TP53 and ER. In ER+ tumors, estrogen receptor-beta (ESR2) activity has a pro-proliferative effect on TP53-wildtype tumors, but an anti-proliferative effect on TP53-mutant tumors [34]. DNE+ TP53 mutations may have unique interactions with ESR2 in ER+ tumors that drives higher DNE+ frequency. In this study that only includes TP53-mutant tumors, we observed a higher proportion of ER− and TNBC tumors overall compared to unselected populations. This is consistent with previous studies that identified TP53 somatic mutations in ~ 85% of TNBC versus 40-60% of unselected breast tumors [1][2][3]5]. There has been some debate about the significance of mutant TP53 DNE versus GOF activity, as many common somatic mutations, including hotspot mutations, are both DNE+ and GOF [35]. It is, thus, of great interest that the association with receptor status was only significant for DNE, not GOF/LOF, though functional studies are needed to better understand this phenomenon. Our cohort included somatic TP53 mutation data from TCGA, METABRIC, and IARC databases, studies identified for inclusion from literature, and 351 previously unpublished cases (Supplementary Table S1). The frequency of hotspot mutations observed in our study (20%) was slightly lower than previous studies finding that 28% of TP53 mutations occurred at mutation hotspots [26]. We observed that 36% of tumors from NHW individuals had GOF mutations compared to 29% in AA individuals (p = 0.04). This is opposite of what we expected to find as GOF variants have been associated with poorer prognosis or worse outcomes and breast cancers in AA women have worse outcomes [7]. We considered that this effect may be an artifact of more NHW patients sequenced with earlier technology, such as Sanger, which could bias the TP53 mutation detected to the exons more likely to have GOF mutations. However, there was no difference in use of Sanger vs NGS between these population groups, with 43.6% of NHW patients sequenced with Sanger, compared to 43.8% of AA patients. There also was no difference in the number of exons sequenced; 67.3% of NHW patients had at least exons 2-11 sequenced, compared to 68.6% of AA patients. Additionally, there was no difference in the percentage of unclassified variants between groups (7% in AA versus 9.9% in NHW for GOF/LOF, 14.3% in AA versus 17.3% in NHW for DNE). Thus, this difference is not likely due to mutation detection or classification. This paradox could be due to factors other than the TP53 mutations or those that influence aggressiveness in addition to the TP53 mutations that vary between ancestral groups such as differences in somatic mutation of other key driver genes or methylation pattern differences [23,36]. Further studies of larger numbers of AA and NHW women are warranted to confirm this finding.
Participants with hotspot mutations were younger than those with non-hotspot mutations, with a mean age of 53.61 in hotspots versus 55.04 in non-hotspots, but this was not statistically significant (p = 0.065). Age did not correlate with DNE or GOF/LOF. This finding is somewhat unexpected. Susceptibility to hotspot mutations is likely due to properties of the genetic sequence being vulnerable to mutation, rather than purely selective growth advantage of tumor cells [26]. A high proportion of hotspot mutations are CpG sites, a feature of mutation signature 1, which correlates with age, so it would seem more likely for somatic hotspot mutations at CpG sites to be associated with later age at diagnosis [32,37]. However, a correlation for breast cancer has not yet been reported in the literature of which we are aware. Studies of TP53 hotspot mutations, mutational signatures and breast cancer age of onset may reveal additional insight.
Strengths of this study include the large number of women included from multiple sources, including previously unpublished data. Previous studies characterizing TP53 mutation types have not focused on race or ancestry. We limited the dataset to only include tumors with TP53 somatic mutations and only included participants with race or ancestry data. There are a number of limitations to this study. Many of the studies used self-reported race and ethnicity information, which may not reflect genetic ancestry, and may have been categorized differently by study, such as distinguishing NHW and Ashkenazi Jewish ethnicity. There may be differences in TP53 mutation types between ethnic groups within a racial group, such as between NHW individuals from Greece versus Finland. For studies in countries that are predominantly one racial group and for which detailed racial information was not available, we assumed that the individuals were of that racial group (e.g., Norway and European ancestry; China and Asian ancestry). Few studies only performed analyses of exons 4 through 8 which could miss more LOF variants that occur in other exons compared to GOF or DNE-associated missense variants that predominantly map to these exons. Because of the mixed data sources, this study did not include large TP53 copy number changes or loss of heterozygosity data. There may be undetected effects by gene copy number or loss of heterozygosity, either acting alone or modifying point mutation effects which could impact our findings. Finally, classifications of variants as GOF/LOF and DNE were made based on literature. For some variants, there was discordant information; we used the classifications from studies that tested more variants or included a larger number of assays. It is possible that some of the rarer missense variants were misclassified or have different effects in humans than in the system tested (e.g., yeast).

Summary
In this study, we found that somatic TP53 mutation types did not differ by race overall, but GOF mutations were more common in NHW women when compared to AA women. We uncovered a modest association between DNE-and tumor receptor status. Functional studies are needed to understand this phenomenon. Additional tumor sequencing data from more racially diverse cohorts are needed to followup on these findings.