Somatic copy number alterations modulate tumor suppressor and oncogene expression to drive early onset of breast cancer

Female breast cancer (BC) is the leading cause of cancer-related deaths among women worldwide and higher mortality rates are observed in developing countries. Data from African Americans women are often used to draw inference in African BC and little is currently known about BC traits in indigenous African (IA) women. This study prospectively enrolled 472 female patients with BC from Cameroon and Congo Brazzaville between 2007 and 2018. Patient demographics and clinical phenotypes were documented. Additional BC molecular data was downloaded from the cancer genome atlas (TCGA) and the gene expression omnibus (GEO). Demographic and clinical data from the different IA patient cohorts were analyzed to identify unique BC traits in African women. Molecular data from the TCGA was then analyzed to understand key BC traits identied in IA women. p = 0.008, respectively). Conclusion Early onset of BC in IA women in this study might reect yet unidentied genomic alterations. Dedicated studies are required to establish the genetic underpinning of BC in IA women to support tailored management strategies early onset of breast cancer using publicly available data sets from the TCGA and the gene expression omnibus (GEO). We show that compared with women of other ethnicity, breast cancer development occurs signicantly early in black African women, and that T3 and T4 tumors are the most predominant in black IA women. Furthermore, most women were diagnosed with high-grade tumors and molecular analysis reveal somatic copy number amplication of oncogenes and deletions of tumor suppressor genes in patients with breast cancer onset before or at age 45 years column statistics in Graphpad Prism. The same cut-off was also used for all other cohorts and each cohort was dichotomized into two groups: EOBRCA and LOBRCA. The survminer package was used to determine the gene expression cut-off for survival analyses, while the latter was performed using survival package. Genes with consistency in CNA pattern and gene expression proles were used in a multivariate cox proportional hazards model to nd associations with patient survival in the LOBRCA group. The EOBRCA group was not used for the survival analysis, because the relatively young age of these patients would confound the entire analyses. Relationships between molecular and lifestyle factors and breast cancer phenotypes was assessed by logistic regression. The Kaplan-Meier method was used to compare survival differences and signicance was tested using the log-rank test. Several groups were compared using the Kruskal-Wallis test and proportions were compared using the Fisher’s exact test while setting the signicance threshold to a p value < 0.05. All analyses were performed with the R environment, SPPS or using Graphpad prism software 8.0.0 for Windows, (GraphPad Software, San Diego, California USA). gene


Introduction
Breast cancer (BC) is has become the leading contributor to the global cancer burden. In 2020, with some 2.3 million new cases, representing more than 11% of all cancers worldwide [1] were diagnosed. It is the most prevalent female cancer in 159 countries worldwide and the leading cause of cancer-related deaths in women within 110 countries [1]. Female breast cancer incidence largely exceeds the incidence of other malignancies [1], especially in developing countries. Although incidence rates are higher in industrialized countries, mortality rates are 17% in developing countries [1]. BC has therefore become a major public health challenge in these countries, especially in sub-Saharan Africa. In most sub-Saharan African countries, poverty, poor health infrastructure, lack of adequate training and lack of awareness constitute a lethal cocktail for BC patients. Unfortunately, very few studies have attempted to understand female breast cancer traits in these populations. The ght against BC in these low-income countries will inevitably require an in-depth knowledge of the clinicopathological and molecular traits of breast cancer in these populations.
Studies in African American (AA) women have revealed higher (42%) BC-related death rates compared to white Americans, despite advances in the diagnosis and treatment [2]. Additionally, about 33% of AA women are were reported to be diagnosed with BC before the age of 50 years, against 22% of white women [3]. Furthermore, AA women are twofold more likely to be diagnosed with BC before the age of 35 with relatively larger tumors when compared with white women [4,5]. Recent BC updates, however, have revealed similar mean ages at BC diagnosis between AA and white American women was similar, highlighting signi cant improvement in awareness and management. Most AA diagnosed with BC were reported to be current or former tobacco users, while white American (WA) women with BC reported alcohol use. Time to medical consultation and time to treatment initiation were signi cantly higher in AA women than in WA women. AA women were predominantly diagnosed with advanced stage and triple negative BC [2]. Whether or not similar clinical phenotypes and risk factors are operational among IA women is not known Studies in indigenous African (IA) women have revealed increasing incidence with age up to the age of 45 years, after which a decline is observed [6]. The age-adjusted 5-year overall survival rates are reported to be lower in sub-Saharan Africa compared with North Africa [7]. Although differences in development index might explain the poor outcome, the bottom line remains early age at diagnosis of breast cancer (EOBRCA) in black African women. It is less likely, that socioeconomic factors in uence age at BC onset to such an extent, and family history of breast cancers has been shown not be associated with early onset of breast cancer [8]. The role of intrinsic molecular traits have so far not been explored, as well as possible exposure to risk factors. Nonetheless, in low-income setting, where prevention is the most achievement combat strategy, understanding the molecular basis of early onset of breast cancer is an unmet need, as it will allow for the development of population screening tools and monitoring of high-risk population.
Studies addressing the etiology of EOBRCA have suggested the involvement of toxic environment, disruption of hormonal internal milieu or genetic susceptibility [9]. The underlying genetic susceptibility loci have however remained unelucidated. Similarly, studies on occupational, diet and environmental risk factors have led to very little insight. Furthermore, although mutations in high penetrance genes such as BRCA1 and BRCA2 are associated with very high risk of contralateral breast cancer, these mutations are only seen in a small fraction of patients with EOBRCA [10,11]. Using BC stem cell population, the role of the osteoclast differentiation factor RANKL has been eluded [12]. Dedicated studies addressing the molecular basis of EOBRCA are still missing both in industrialized and developing countries. Most importantly, there is little, if at all any information on EOBRCA in indigenous black African population. Similarly, molecular data from these population is equally lacking, though needed to design tailored screening and monitoring strategies. There is therefore a crucial need for the identi cation of molecular and lifestyle risk factors driving early onset of breast cancer, especially in black African women In the light of the aforementioned, we set out to characterize the clinical features of breast cancer in black IA women from Congo and Cameroon and to identify potential molecular traits supporting early onset of breast cancer using publicly available data sets from the TCGA and the gene expression omnibus (GEO). We show that compared with women of other ethnicity, breast cancer development occurs signi cantly early in black African women, and that T3 and T4 tumors are the most predominant in black IA women.
Furthermore, most women were diagnosed with high-grade tumors and molecular analysis reveal somatic copy number ampli cation of oncogenes and deletions of tumor suppressor genes in patients with breast cancer onset before or at age 45 years

Patient cohorts
Females histologically con rmed with BC and aged >18 years were recruited at the Yaoundé general hospital (Cameroon), the Douala General Hospital (Cameroon) and at the University teaching hospital (CHU, Brazzaville) in Congo. Administrative authorizations were obtained from all local sampling sites and all participants gave written informed consent to the study. Participants were prospectively recruited between 2007 and 2018 during routine medical visits and were monitored during the entire study period or until they were disease free, death or opted out of the study. Lifestyle data was collected using a structured internal questionnaire, while clinical phenotypes were retrieved from the individual patient les from each sampling site. Patient data for the TCGA cohort (>1000 cases) was downloaded from the Genomic data commons (GDC) repository using the TCGAbiolinks Bioconductor package. Additional patient cohort data (> 300 cases) was downloaded from the gene expression omnibus (GSE3494), which had age, subtype as well as TP53 gene mutation status. All patients with missing age and gender information or who participated for less than six (06) months were excluded from further analysis. Therefore, 115 patients were retained from the Congo cohort, 65 patients in the Yaoundé cohort and the rest were recruited at the Douala general hospital.

Database mining and in silico analyses
Gene expression data was downloaded from the gene expression omnibus (GEO) for breast cancer patients (GSE3494) and from the cancer genome atlas (TCGA). Copy number variation data was equally downloaded from the TCGA for the breast cancer project (TCGA-BRCA). All data were downloaded from the GDC legacy (genome assembly hg19) database. For all TCGA data, only data from primary tumors were used, by specifying the "Sample.type" to "Primary tumor". For copy number variation data, mean copy number segment data from the Affymetrix SNP 6.0 data was used, while illumina Hiseq gene expression quanti cation data was downloaded for gene expression analysis using the TCGAbiolinks package. Somatic mutation les for TCGA-BRCA were also download as maf les and processes using the maftools Bioconductor package. All non-mutated samples were excluded from the presented oncoplots (but included in the summary statistics). Both gene expression and copy number variation data were analyzed following the TCGAbiolinks user guide. Brie y, copy number segment data was downloaded and ltered on a cut-off threshold of 0.3 for gain or -0.3 for loss. The ltered CNV segment le was then used to create a CNV object. A probe meta le serving as marker matrix was obtained from the Broad institute (ftp://ftp.broadinstitute.org/pub/GISTIC2.0/hg19_support/) and ltered for common CNVs. The ltered marker matrix was the used to create a marker object and reccurent somatic copy number aberrations were identi ed using the gaia Bioconductor package.
CNVs were annotated using the biomaRt and GenomicRanges Bioconductor packages while circus plots were made with the circlize package. TCGA gene expression data was normalized and ltered using the EDAseq package, while edgeR was used for differential gene expression analyses. Genes with a false discovery rate (FDR) of < 0.05 and a log2fold change of at least 1, were considered as differentially expressed. The affy package was used for processing cell les from the GSE 3494 cohort. Genes whose expression was affected by CNV changes were obtained by ltering the list of recurrent somatic CNV from the EOBRCA group for all CNVs that were equally present in the LOBRCA. The resulting list was then intersected with the list of differentially expressed genes Gene set enrichment was performed using the desktop GSEA application with 1000 permutations on the hallmark gene sets. Only the hallmarks of cancer gene sets were analyzed and gene sets with a FDR < 0.05 and a normalized enrichment score > 1.5 were considered to be signi cantly enriched.

Statistical analyses
The TCGA cohort served as a reference to stratify patients into early and late onsets of BC. Patients were considered to have EOBRCA, if they were rst diagnosed before the age of 45 years (mean-1 SD of the TCGA cohort), or LOBRCA for those diagnosed after the age of 50 years. The mean age at diagnosis of BC was determined by computing the column statistics in Graphpad Prism. The same cut-off was also used for all other cohorts and each cohort was dichotomized into two groups: EOBRCA and LOBRCA. The survminer package was used to determine the gene expression cut-off for survival analyses, while the latter was performed using survival package. Genes with consistency in CNA pattern and gene expression pro les were used in a multivariate cox proportional hazards model to nd associations with patient survival in the LOBRCA group. The EOBRCA group was not used for the survival analysis, because the relatively young age of these patients would confound the entire analyses. Relationships between molecular and lifestyle factors and breast cancer phenotypes was assessed by logistic regression. The Kaplan-Meier method was used to compare survival differences and signi cance was tested using the log-rank test. Several groups were compared using the Kruskal-Wallis test and proportions were compared using the Fisher's exact test while setting the signi cance threshold to a p value < 0.05. All analyses were performed with the R environment, SPPS or using Graphpad prism software 8.0.0 for Windows, (GraphPad Software, San Diego, California USA).

Results
Demographic and clinical data were analyzed from 472 females with BC from different care centers in Cameroon and Congo (indigenous Africans). Gene expression and copy number variation data were obtained for western populations from the TCGA and the gene expression omnibus (GEO). The mean age at BC diagnosis within the TCGA cohort was 58.5 ± 13.2 years while the mean age at BC diagnosis for the IA cohort was 46.5 ± 12.9 years. Early onset of breast cancer was de ned for those diagnosed before the lower limit of the TCGA cohort (58.5 -13.2 years). Based on this classi cations, 244/472 (51.7%) of African with BC were diagnosed with early onset of BC while 228/472 (48.3%) of cases were diagnosed after the age of 45 years. Within the TCGA cases, early onset of BC was found in 159/1018 (15.6%) of all BC cases. Among IA women, as shown in table 1, there were only very slight differences in the tumor size and subtype of BC between patients diagnosed before 45 years compared with those diagnosed later. Similarly, there was no difference in the SBR grade between both groups. However, women who developed BC after the age of 45 years were more likely to have had menarche after the age of 12 years (21% vs 10%, respectively) (OR: 0.41, 95% CI: 0.10-1.42). Furthermore, most of these women had their rst pregnancies before the age of 20 years (47.37% vs 58.42%) and most of them did not use contraceptive pills (30.25% vs 41.41%) and breastfed their children for more than 12 months (64.52% vs 43%).

Early onset of large and high-grade tumors in African patients
Within the TCGA cohort, there was a slight but non-signi cant difference in the mean age at diagnosis for Asia, and Black American women compared with white American women. However, the mean age at diagnosis within the IA cohort was signi cantly lower than any other ethnic origin within the TCGA cohort, and even when compared with African women ( gure 1a). We observed an inverse pattern in the tumor T stage within the two cohorts. Meanwhile more than 80% of tumors from the TCGA cohort were T1 and T2 tumors, 80% of tumors in African women were T3 and T4 (UICC 7 th edition) ( gure 1b). There was no tumor grade information for the TCGA cohort. In the African cohort however, we observed about 60% of all tumors being grade III ( gure 1c), meanwhile there was an equal distribution (about 30%) of luminal A and triple negative breast cancer within the African cohort. Her2+ and luminal b tumors accounted for about 15% each ( gure 1d). To identify lifestyle and molecular features associated with early onset of BC, we performed a multivariate logistic regression on data from the African cohort. Early onset of menarche (≤ 12 years as well as the use of contraceptive pills were associated (but statistically not signi cant) with higher odds of developing BC before the age of 45 years (EOBRCA). There was a signi cant association between EOBRCA and late age at rst full pregnancy, (OR: 0.12, 95% CI: 0.04-0.34, p value < 0.001) ( rst pregnancy at age ≥ 20 years). Family history of breast cancer was signi cantly associated with the development of BC after the age of 45 years (LOBRCA), (OR:4.08, 95% CI: 1.34-13.51, p = 0.016). The use of alcohol, as well as obesity were not signi cantly correlated with development of LOBRCA ( gure 2A). Using molecular data from publicly available Breast cancer studies (GSE3494), we analyzed the relationship between different molecular features and early onset of BC. Of all features analyzed, only TP53 mutational status was signi cantly associated with age-dependent breast cancer development. Wild type TP53 status was signi cantly associated with late onset of breast cancer ( gure 2b). Both low grade tumors and estrogen receptor (ER) positive tumors did associated with late onset of breast cancer but this associated did not reach statistical signi cance. Similarly, Progesterone receptor positivity did correlate with early onset of breast cancer, although not signi cantly.
Higher rates of TP53 mutations are present in EOBRCA In order to further dissect the genetic basis of early breast cancer onset, we used molecular data from the TCGA. The samples were dichotomized into EOBRCa and LOBRCA based on the criteria described above. Analyses of somatic mutations revealed that both TP53 and PIK3CA were the most predominantly mutated genes in EOBRCA and LOBRCA. However, a higher rate of TP53 mutations were observed in patients with EOBRCA, compared with LOBRCA patients (29% vs 26%, respectively gure 3a). PIK3CA mutations were observed at higher rates in LOBRCA compared with EOBRCA (30% vs 25%, respectively gure 3b). Given that these mutations accounted for less than 50% of all EOBRCA cases, we further analyzed copy number alterations in both patient cohorts. As shown in the circus plots in gure 3c and 3d, both EOBRCA and LOBRCA showed similar CNA patterns and there were more CNA events in LOBRCA compared with EOBRCA. However, remarkable differences were observed in certain chromosomes, especially chromosomes 3, 5 and 14 in EOBRCA. In this group, there was a remarkable genomic ampli cation, which was almost absent in LOBRCA. Similarly, on chromosomes 5 and 14 in EOBRCA, there was an obvious deletion, which was not observable in LOBRCA.

EOBRCA is associated with CNA in oncogenes and tumor suppressors
We investigates if genomic CNA affected gene expression and if the affected genes were associated with cancer development. We speci cally focused on genes with CNA exclusively in EOBRCA. To this end, we ltered out all genes with CNA in both EOBRCA and LOBRCA and generated a list of EOBRCA CNAs (supplementary table 1). We performed differential gene expression analysis for all patients in both groups and selected all genes with a FDR < 0.05 and an absolute log2 fold change greater than 1, to constitute a list of differentially expressed genes ( gure 4a, and supplementary genes, 17 genes with copy number ampli cations were upregulated, while 17 genes with copy number deletions were down regulated ( gure 4b). Of note, several cancer associated genes such as CDH6, FOXM1, NAAA and CXCL10 [13][14][15][16][17] were ampli ed and upregulated. Candidate tumor suppressor genes such as TGM3, MYO18B, SH3GL2, DMBT1, SEZ6L and SLIT1 were equally deleted [18][19][20][21][22][23][24][25]. Further investigation in CNA events in EOBRCA and LOBRCA, indeed indicated, that between 30-40 Mb on chromosome 5, in the region where CDH6 is located, there was a strong copy number ampli cation in EOBRCA ( gure 4c), compared with LOBRCA ( gure 4d). There was equally a very pronounced genetic deletion between 60-100 Mb on chromosome 5 of EOBRCA, which was barely seen in LOBRCA ( gures 4c & 4d). These observations, indeed con rmed, that the observed upregulation of CDH6 in EOBRCA was not epigenetically regulated but associated with gene ampli cation. In order to understand the hallmarks of EOBRCA, gene set enrichment analysis was performed on gene expression data of both groups. Hallmarks of MYC targets as well as early estrogen response were highly enriched in EOBRCA ( gure 4e). Other well know hallmarks of cancer such as TGFß and mTORC1 signaling were equally highly enriched in EOBRCA. In LOBRCA, KRAS, WNT/ß-catenin and hallmarks of in ammatory response were enriched ( gure 4f). Comparing the enriched hallmark gene sets in both groups, we observed that there were more cancer hallmark gene sets enriched in the EOBRCA compared with only 4 genes sets (with FDR < 0.05) enriched in LOBRCA. Most importantly, gene sets associated with highly aggressive tumors such as the hallmarks of MYC targets, TGFß as well as mTORC1 were strongly enriched in EOBRCA [26,27].

Altered genes are associated with stemness and tumor suppression
Gene expression analyses revealed that some ampli ed genes were upregulated in EOBRCA, while some deleted genes were downregulated in EOBRCA. We then wanted to investigate if the ampli ed genes had any tumorigenic properties and if the deleted genes had tumor suppressor activities, at least in solid tumors. As seen in gure 5, the cancer hallmark gene, FOXM1 was upregulated in almost all EOBRCA and not in LOBRCA. FOXM1 is a master transcription factor, regulating tumor cell proliferation, self-renewal and tumorigenesis in several human cancers [28]. Similarly, CDH6, another EMT promoting gene was upregulated in most of the EOBRCA cases. Another ampli ed upregulated gene, PPP1R9B, a gene that has been associated with tumor progression and stemness [29] in human tumors was ampli ed an upregulated. Among the deleted and downregulated genes, several of these genes have been reported to have tumor suppressor properties in some solid tumors. In lung cancer, the deleted gene MYO18B was reported as a tumor suppressor [24]. Another deleted gene, TGM3, is also know to play tumor suppressor roles in colorectal cancer by repressing EMT and PIK3/AKT signaling [30]. In urothelial carcinoma, deletion of the gene SH3GL2 is known to promote malignant behavior [22], while DMBT1 has been proposed as a tumor suppressor in brain cancers [23].
Low expression of CXCL10 was contrarily signi cantly associated with poor survival (HR: 2.88, 95% CI: 1.38-6.1, p value = 0.005) meanwhile low expression of the deleted tumor suppressor genes DMBT1 was not signi cantly associated with poor survival (HR: 1.29, 95% CI: 0.44-3.74, p = 0.64). Low expression of GPR26, another EOBRCA deleted gene was associated with poor survival. The expression of other EOBRCA deleted and downregulated genes were not signi cantly associated with survival. Kaplan-Meier survival analysis revealed, that patients with low expression of CDH6 lived signi cantly longer than patients with higher expression log-rank p = 0.01, ( gure 6b). Similarly, patients with low expression of PPP1R9B lived signi cantly longer than those with higher expression logrank p = 0.00035, ( gure 6c), while low expression of SLC1A3 was also associated with better overall survival, log-rank p = 0.015 ( gure 6d).

Discussion
We investigated unique breast cancer traits in IA women by comparing their demographic as well as clinical data with data from the cancer genome atlas. The mean age at diagnosis of BC in African women was more than 10 years earlier (46 years vs 58 years) than that of the TCGA cases. Only about 15% of BC cases were diagnosed before the age of 45 years within the TCGA cases, while more than 50% of African women with BC in our study were diagnosed before the age of 45 years. Meanwhile poor health infrastructure and low socioeconomic standard in African might explain poor outcome, none of these factors are likely to explain the relatively higher rates of early disease onset. Analysis of BC risk factors revealed that even family history of BC was not related with early onset of BC, as postulated in previous ndings [8]. About 60% of BC in African women were T4 stage tumors compared with less than 5% within the TCGA. The absence of comprehensive health systems and poverty might deter African women from seeking medical help at early stages of the disease, leading to larger tumors at disease diagnosis. Such large tumors might also be associated with its highly aggressive nature, as re ected by high number of grade III tumors in African women, thereby implicating molecular determinants. The latter is highly plausible, as we observed higher rates of TP53 mutation in BC patients with early disease onset. In effect, higher rates of TP53 [31] [32] mutations have been reported in highly aggressive breast tumors, and several studies have revealed highly aggressive tumors in African women [33]. Analysis of hallmarks of cancer in patients with early disease onset within the TCGA cases, revealed the enrichment of gene sets associated with tumor aggressiveness such as the hallmarks of MYC targets. In effect, higher MYC expression has been reported in aggressive breast tumors and may explain the molecular features if aggressive breast cancer subtypes [34,35]. Interestingly, meanwhile the aggressive TNBC subtype represents a minor subtype in other populations, with between 10-15% of all BC tumors being TNBC [36], we observed about 2-fold increase rates in TNBC among African women. These observasions, further support the implication of some underlying genetic traits governing disease clinical phenotypes. Among African women with breast cancer diagnosed before 45 years of age, lifestyle risk factors such as lower breastfeeding duration, higher rates of contraceptive use, late age at childbirth and early onset of menarche, which are all BC risk factors might in part explain cancer onset [37]. These factors, however, cannot account for low rates of T2 tumors observed within this group. Oncogene ampli cations and inactivation of tumors suppressors are hallmarks of cancer development [38,39]. The transcription factor FOXM1 is an established master regulator of tumorigenesis across several human cancers [40]. Meanwhile the role of this gene in different aspect of tumor development and progression is undoubtable, it was until now not clear, of this gene is ubiquitously expressed in all tumors of the same cancer entity. Our results here have painted a clear picture of the age-associated expression of FOXM1 in more than 350 primary breast tumors, with almost exclusive expressing in younger patients. These ndings are of key relevance to the eld, as FOXM1 has for so long been a molecular target of interest for the development of antineoplastic drugs [41]. It is therefore compelling, to analyze age-associated discrepancies in cancer development and to adapt clinical trials to take into account such details in cases where it might be relevant. Similarly, the oncogene CDH6 is a well-known oncogene and has been linked with poor outcome in other cancer entities [14]. CDH6 is also known to promote EMT and metastasis in cancer [13]. CDH6 is also responsible for cellular adhesion and invasion in renal and ovarian cancers [42]. It is therefore very likely, that CDH6 ampli cation contributes to early onset on BC. Additionally, DMBT1, a tumor suppressor involved in immune defense and epithelial differentiation as deleted and downregulated in EOBRCA. In effect, the gene was previously shown to be down regulated in BC, although the underlying mechanism remained unclear [43] meanwhile. We now show, that downregulation of this gene is predominantly in EOBRCA and is mediated by copy number deletion. TGM3, a gene that functions as a tumor suppressor and in repressed in several cancer entities [18,30] was also deleted and downregulated. In effect, this gene has been proposed to be a tumor suppressor by repressing EMT and PIK3/AKT pathway in colorectal cancer [30]. Other gene such as SLIT1, SEZ6L and MYO18B, all of which have been reported to be downregulated in other solid cancers were deleted in EOBRCA and consequently downregulated at the gene expression level [21,25,44]. The deletions of these genes in patients with early onset of BC might indeed render patients more susceptible to cancer development upon exposure to other BC risk factors. Meanwhile much still needs to be done to understand the molecular underpinnings of early breast cancer onset in African women, the present study provides a rm background for potential areas that might be exploited and reveals genomic alterations that might be explored for the development of biomarkers for early detection of BC in African women. Furthermore, our data has revealed the possible implication of age-associated molecular differences in treatment outcome and open new horizons that might help in ne-tuning the design of clinical trials.

Conclusion
Early onset of breast cancer is predominant in black indigenous African women and associated with late age at childbearing, early onset of menarche and reduced duration of breastfeeding. Molecular analyses of early onset of breast cancer reveal higher rates of TP53 gene mutation as well as copy number ampli cation of key oncogenes such as FOXM1 and CDH6 as well as deletion of candidate tumor suppressor such as MYO18B, DMBT1 and TGM3 in EOBRCA. Dedicated molecular analysis of tumor material from IA women will further improve our understanding EOBRCA in these populations and ne tune combat strategies. The observed age-speci c molecular alterations of key tumor suppressor and oncogenes plaid in favor of age-related considerations in subsequent studies and clinical trials, as this might signi cantly affect both patient outcome and study endpoints. All participants provided informed consent for the collection and use of clinical data and where necessary molecular data for research purposes. When patients were not able to provide consent directly, their legal representatives (caretakers) provided consent. All informed consent forms included consent for publication.

Consent for publication
Not applicable Availability of data and materials Molecular data used in this study is publicly available from the gene expression omnibus (GEO) with the accession number GSE3494 or from the TCGA under the TGCA-BRCA project. The data supporting this manuscript is provided in supplementary tables 1-4

Competing interests
The authors declare that they have no competing interests Funding Not applicable Authors' contributions PDTW: Data collection, curation and analysis and manuscript drafting, EHMDB: Patient recruitment and data curation, AATZ: data collection and curation, EN: patient recruitment and data curation, SNA: patient recruitment and data collection, EA: patient recruitment and data collection, GS: provided resources for study, AJN: Data curation and study organization, SSL: Study design, data curation and analysis, bioinformatics analysis and manuscript writing Table   Table 1 Patient baseline characteristics  Tumor Size  T1  T2  T3  T4  Tx  T1  T2  T3  T4  Tx  0       The expression of oncogenes and tumor suppressors with somatic copy number alterations in EOBRCA have prognostic value in BRCA.
A) A forest plot showing multivariate cox proportional hazards ratios for the association of genes with somatic copy number alteration and breast cancer patient survival. Data is shown for all patients with LOBRCA from the TCGA. B) Kaplan-Meier overall survival curve for patients with LOBRCA from the TCGA cohort. Patients were strati ed for the expression of the EOBRCA ampli ed gene CDH6. The strati cation cut-off was statistically determined using the survminer package. C) Kaplan-Meier overall survival curve for patients with LOBRCA from the TCGA cohort. Patients were strati ed for the expression of the EOBRCA deleted gene KLRB1. The strati cation cut-off was statistically determined using the survminer package.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.