Demographic and clinical data were analyzed from 472 females with BC from different care centers in Cameroon and Congo (indigenous Africans). Gene expression and copy number variation data were obtained for western populations from the TCGA and the gene expression omnibus (GEO). The mean age at BC diagnosis within the TCGA cohort was 58.5 ± 13.2 years while the mean age at BC diagnosis for the IA cohort was 46.5 ± 12.9 years. Early onset of breast cancer was defined for those diagnosed before the lower limit of the TCGA cohort (58.5 – 13.2 years). Based on this classifications, 244/472 (51.7%) of African with BC were diagnosed with early onset of BC while 228/472 (48.3%) of cases were diagnosed after the age of 45 years. Within the TCGA cases, early onset of BC was found in 159/1018 (15.6%) of all BC cases. Among IA women, as shown in table 1, there were only very slight differences in the tumor size and subtype of BC between patients diagnosed before 45 years compared with those diagnosed later. Similarly, there was no difference in the SBR grade between both groups. However, women who developed BC after the age of 45 years were more likely to have had menarche after the age of 12 years (21% vs 10%, respectively) (OR: 0.41, 95% CI: 0.10-1.42). Furthermore, most of these women had their first pregnancies before the age of 20 years (47.37% vs 58.42%) and most of them did not use contraceptive pills (30.25% vs 41.41%) and breastfed their children for more than 12 months (64.52% vs 43%).
Early onset of large and high-grade tumors in African patients
Within the TCGA cohort, there was a slight but non-significant difference in the mean age at diagnosis for Asia, and Black American women compared with white American women. However, the mean age at diagnosis within the IA cohort was significantly lower than any other ethnic origin within the TCGA cohort, and even when compared with African women (figure 1a). We observed an inverse pattern in the tumor T stage within the two cohorts. Meanwhile more than 80% of tumors from the TCGA cohort were T1 and T2 tumors, 80% of tumors in African women were T3 and T4 (UICC 7th edition) (figure 1b). There was no tumor grade information for the TCGA cohort. In the African cohort however, we observed about 60% of all tumors being grade III (figure 1c), meanwhile there was an equal distribution (about 30%) of luminal A and triple negative breast cancer within the African cohort. Her2+ and luminal b tumors accounted for about 15% each (figure 1d). To identify lifestyle and molecular features associated with early onset of BC, we performed a multivariate logistic regression on data from the African cohort. Early onset of menarche (≤ 12 years as well as the use of contraceptive pills were associated (but statistically not significant) with higher odds of developing BC before the age of 45 years (EOBRCA). There was a significant association between EOBRCA and late age at first full pregnancy, (OR: 0.12, 95% CI: 0.04-0.34, p value < 0.001) (first pregnancy at age ≥ 20 years). Family history of breast cancer was significantly associated with the development of BC after the age of 45 years (LOBRCA), (OR:4.08, 95% CI: 1.34-13.51, p = 0.016). The use of alcohol, as well as obesity were not significantly correlated with development of LOBRCA (figure 2A). Using molecular data from publicly available Breast cancer studies (GSE3494), we analyzed the relationship between different molecular features and early onset of BC. Of all features analyzed, only TP53 mutational status was significantly associated with age-dependent breast cancer development. Wild type TP53 status was significantly associated with late onset of breast cancer (figure 2b). Both low grade tumors and estrogen receptor (ER) positive tumors did associated with late onset of breast cancer but this associated did not reach statistical significance. Similarly, Progesterone receptor positivity did correlate with early onset of breast cancer, although not significantly.
Higher rates of TP53 mutations are present in EOBRCA
In order to further dissect the genetic basis of early breast cancer onset, we used molecular data from the TCGA. The samples were dichotomized into EOBRCa and LOBRCA based on the criteria described above. Analyses of somatic mutations revealed that both TP53 and PIK3CA were the most predominantly mutated genes in EOBRCA and LOBRCA. However, a higher rate of TP53 mutations were observed in patients with EOBRCA, compared with LOBRCA patients (29% vs 26%, respectively figure 3a). PIK3CA mutations were observed at higher rates in LOBRCA compared with EOBRCA (30% vs 25%, respectively figure 3b). Given that these mutations accounted for less than 50% of all EOBRCA cases, we further analyzed copy number alterations in both patient cohorts. As shown in the circus plots in figure 3c and 3d, both EOBRCA and LOBRCA showed similar CNA patterns and there were more CNA events in LOBRCA compared with EOBRCA. However, remarkable differences were observed in certain chromosomes, especially chromosomes 3, 5 and 14 in EOBRCA. In this group, there was a remarkable genomic amplification, which was almost absent in LOBRCA. Similarly, on chromosomes 5 and 14 in EOBRCA, there was an obvious deletion, which was not observable in LOBRCA.
EOBRCA is associated with CNA in oncogenes and tumor suppressors
We investigates if genomic CNA affected gene expression and if the affected genes were associated with cancer development. We specifically focused on genes with CNA exclusively in EOBRCA. To this end, we filtered out all genes with CNA in both EOBRCA and LOBRCA and generated a list of EOBRCA CNAs (supplementary table 1). We performed differential gene expression analysis for all patients in both groups and selected all genes with a FDR < 0.05 and an absolute log2 fold change greater than 1, to constitute a list of differentially expressed genes (figure 4a, and supplementary table 2). Our list of differentially expressed genes was then intersected with the EOBRCA CNA list and resulted in 61 genes (supplementary table 3). From this table, 18 genes showed similar trends in CNA type (amplification & deletion, supplementary table 4) and gene expression patter (upregulation and down regulation). Of these 34 genes, 17 genes with copy number amplifications were upregulated, while 17 genes with copy number deletions were down regulated (figure 4b). Of note, several cancer associated genes such as CDH6, FOXM1, NAAA and CXCL10 [13-17] were amplified and upregulated. Candidate tumor suppressor genes such as TGM3, MYO18B, SH3GL2, DMBT1, SEZ6L and SLIT1 were equally deleted [18-25]. Further investigation in CNA events in EOBRCA and LOBRCA, indeed indicated, that between 30-40 Mb on chromosome 5, in the region where CDH6 is located, there was a strong copy number amplification in EOBRCA (figure 4c), compared with LOBRCA (figure 4d). There was equally a very pronounced genetic deletion between 60-100 Mb on chromosome 5 of EOBRCA, which was barely seen in LOBRCA (figures 4c & 4d). These observations, indeed confirmed, that the observed upregulation of CDH6 in EOBRCA was not epigenetically regulated but associated with gene amplification. In order to understand the hallmarks of EOBRCA, gene set enrichment analysis was performed on gene expression data of both groups. Hallmarks of MYC targets as well as early estrogen response were highly enriched in EOBRCA (figure 4e). Other well know hallmarks of cancer such as TGFß and mTORC1 signaling were equally highly enriched in EOBRCA. In LOBRCA, KRAS, WNT/ß-catenin and hallmarks of inflammatory response were enriched (figure 4f). Comparing the enriched hallmark gene sets in both groups, we observed that there were more cancer hallmark gene sets enriched in the EOBRCA compared with only 4 genes sets (with FDR < 0.05) enriched in LOBRCA. Most importantly, gene sets associated with highly aggressive tumors such as the hallmarks of MYC targets, TGFß as well as mTORC1 were strongly enriched in EOBRCA [26, 27].
Altered genes are associated with stemness and tumor suppression
Gene expression analyses revealed that some amplified genes were upregulated in EOBRCA, while some deleted genes were downregulated in EOBRCA. We then wanted to investigate if the amplified genes had any tumorigenic properties and if the deleted genes had tumor suppressor activities, at least in solid tumors. As seen in figure 5, the cancer hallmark gene, FOXM1 was upregulated in almost all EOBRCA and not in LOBRCA. FOXM1 is a master transcription factor, regulating tumor cell proliferation, self-renewal and tumorigenesis in several human cancers . Similarly, CDH6, another EMT promoting gene was upregulated in most of the EOBRCA cases. Another amplified upregulated gene, PPP1R9B, a gene that has been associated with tumor progression and stemness  in human tumors was amplified an upregulated. Among the deleted and downregulated genes, several of these genes have been reported to have tumor suppressor properties in some solid tumors. In lung cancer, the deleted gene MYO18B was reported as a tumor suppressor . Another deleted gene, TGM3, is also know to play tumor suppressor roles in colorectal cancer by repressing EMT and PIK3/AKT signaling . In urothelial carcinoma, deletion of the gene SH3GL2 is known to promote malignant behavior , while DMBT1 has been proposed as a tumor suppressor in brain cancers .
EOBRCA CNA gene expression is prognostic
We then investigated possible associations between patient survival and EOBRCA CNA gene expression in the LOBRCA group. As shown in figure 6a, in a multivariate cox regression, low expression of CDH6 was significantly associated with better survival, (HR: 0.51, 95% CI: 0.23-1.13, p = 0.096). Similarly, low expression of PPP1R9B and SLC1A3 were associated with better patient survival (HR: 0.36, 95% CI: 0.18-0.68, p value = 0.002 & 0.14, 95% CI: 0.033-0.6, p value = 0.008, respectively), but did not reach statistical significance. Low expression of CXCL10 was contrarily significantly associated with poor survival (HR: 2.88, 95% CI: 1.38-6.1, p value = 0.005) meanwhile low expression of the deleted tumor suppressor genes DMBT1 was not significantly associated with poor survival (HR: 1.29, 95% CI: 0.44-3.74, p = 0.64). Low expression of GPR26, another EOBRCA deleted gene was associated with poor survival. The expression of other EOBRCA deleted and downregulated genes were not significantly associated with survival. Kaplan-Meier survival analysis revealed, that patients with low expression of CDH6 lived significantly longer than patients with higher expression log-rank p = 0.01, (figure 6b). Similarly, patients with low expression of PPP1R9B lived significantly longer than those with higher expression log-rank p = 0.00035, (figure 6c), while low expression of SLC1A3 was also associated with better overall survival, log-rank p = 0.015 (figure 6d).