Breast cancer risk factors in relation to molecular subtypes in breast cancer patients from Kenya

Background Few studies have investigated risk factor heterogeneity by molecular subtypes in indigenous African populations where prevalence of traditional breast cancer (BC) risk factors, genetic background, and environmental exposures show marked differences compared to European ancestry populations. Methods We conducted a case-only analysis of 838 pathologically confirmed BC cases recruited from 5 groups of public, faith-based, and private institutions across Kenya between March 2012 to May 2015. Centralized pathology review and immunohistochemistry (IHC) for key markers (ER, PR, HER2, EGFR, CK5-6, and Ki67) was performed to define subtypes. Risk factor data was collected at time of diagnosis through a questionnaire. Multivariable polytomous logistic regression models were used to determine associations between BC risk factors and tumor molecular subtypes, adjusted for clinical characteristics and risk factors. Results The median age at menarche and first pregnancy were 14 and 21 years, median number of children was 3, and breastfeeding duration was 62 months per child. Distribution of molecular subtypes for luminal A, luminal B, HER2-enriched, and triple negative (TN) breast cancers was 34.8%, 35.8%, 10.7%, and 18.6%, respectively. After adjusting for covariates, compared to patients with ER-positive tumors, ER-negative patients were more likely to have higher parity (OR = 2.03, 95% CI = (1.11, 3.72), p = 0.021, comparing ≥ 5 to ≤ 2 children). Compared to patients with luminal A tumors, luminal B patients were more likely to have lower parity (OR = 0.45, 95% CI = 0.23, 0.87, p = 0.018, comparing ≥ 5 to ≤ 2 children); HER2-enriched patients were less likely to be obese (OR = 0.36, 95% CI = 0.16, 0.81, p = 0.013) or older age at menopause (OR = 0.38, 95% CI = 0.15, 0.997, p = 0.049). Body mass index (BMI), either overall or by menopausal status, did not vary significantly by ER status. Overall, cumulative or average breastfeeding duration did not vary significantly across subtypes. Conclusions In Kenya, we found associations between parity-related risk factors and ER status consistent with observations in European ancestry populations, but differing associations with BMI and breastfeeding. Inclusion of diverse populations in cancer etiology studies is needed to develop population and subtype-specific risk prediction/prevention strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s13058-021-01446-3.

Conclusions: In Kenya, we found associations between parity-related risk factors and ER status consistent with observations in European ancestry populations, but differing associations with BMI and breastfeeding. Inclusion of diverse populations in cancer etiology studies is needed to develop population and subtype-specific risk prediction/ prevention strategies.
Keywords: Breast cancer, Molecular subtypes, Risk factors, Kenya, Sub-Saharan Africa Background Women in Africa have lower incidence rates of breast cancer (BC) than women in developed countries (agestandardized rates (ASR) per 100,000 of 36 vs. 74), but higher mortality rates (ASR of 17 vs. 15) [1]. Furthermore, there is variation in the relative survival (RS) from BC by stage and country-level human development index (HDI) in sub-Saharan Africa (SSA) with the 5-year RS after breast cancer diagnosis in Mauritius at 83.2% and the lowest in Uganda at 12.1%, while it ranges between 40.1 and 64% in Kenya as per data abstracted from the Eldoret and Nairobi Cancer Registries, respectively [2]. Furthermore, survival differences in SSA remain for any given breast cancer stage with the lowest 3-year breast cancer-specific survival observed in Nigeria at 38% compared with 68% in Black women from Namibia, thus underlying as yet unexplained risks with survival [3]. In Kenya, country figures indicate that BC is the most frequently diagnosed cancer among women, representing 20.8% of all cancer cases, and the second most common cause from cancer mortality [4].
Although advanced stage at presentation, lack of awareness about BC and limited access to available screening and treatment options [5] are contributing factors to disparate mortality rates, whether incidence for more aggressive breast cancers are higher in African women remains controversial. Women of African descent present with BCs a decade earlier than their Caucasian counterparts [6,3], and despite correcting for risk factor distribution, their tumors still tend to be estrogen receptor (ER) negative [7], suggesting the interplay of other biologic and genetic differences that remain largely unexplored.
Breast cancer can be divided into several molecular subtypes based on gene expression profiling analysis, which are subsequently corroborated by a panel of immunohistochemical (IHC) markers including ER, progesterone receptor (PR), human epidermal growth receptor factor 2 (HER2), proliferation marker Ki-67, cytokeratin (CK) 5/6, and epidermal growth factor receptor (EGFR). Epidemiologic studies have demonstrated that BC risk associated with established risk factors, including genetic and environment/lifestyle factors, differ for different breast cancer subtypes [8], which highlights the importance of developing subtype-specific risk prediction and prevention strategies [9]. Overwhelmingly, these breast cancer prediction models have been derived from European ancestry women and some studies have noted poor performance in African women [10]. This is likely explained by the differential associations of risk factors such as parity and obesity for ER-positive and ERnegative cancers and higher frequencies of ER-negative cancers among African women. In addition, the prevalence of breast cancer risk factors, including genetic background and environmental exposures, show marked differences between indigenous African and European and even African American women. Notably, women in African countries are more likely to have high exposures to infectious agents (malaria and other parasites), and a low prevalence of traditional BC risk factors (including low or late parity, lack of breastfeeding, obesity, and exogenous hormone use), which may contribute to differences in the risk of different BC subtypes. Furthermore, there are great variations in genetic structure and exposures as well as breast cancer subtype distributions across different African populations [11,7,12]. Therefore, studies in diverse indigenous African populations will allow for a broader capture of associations between risk factors and tumor subtypes, particularly for exposures and subtypes that are in general very rare but are prevalent in African populations. Findings from these studies will improve our understanding of risk factor heterogeneity and our ability to develop risk prediction models that are better tailored for specific African populations.
Here, in this study, using carefully annotated risk factor and pathology data collected from 838 BC patients enrolled from multiple hospitals across Kenya, we aimed to evaluate distributions of established BC risk factors across BC subtypes.

Study population and risk factor data
The study has been previously described but in brief, 838 pathologically confirmed BC cases were collected across Kenya between March 2012 and May 2015 [13]. There were 15 hospital/health facilities which we grouped into 5 network/regional facilities: Aga Khan University (AKU) hospitals (including AKU hospitals at Kisumu, Mombasa, and Nairobi), AIC Kijabe Hospital, Nyeri Provincial General Hospital (PGH), St Mary's Mission Hospital (Nairobi), and others (Supplementary Table 1). The grouping was based on whether public, faith-based or private institutions. Institutional ethics approval was obtained. Socio-demographic, clinical, reproductive, and known breast cancer risk factor data were collected using a standardized questionnaire.
Pathology, immunohistochemical data, and molecular subtypes Pathologic characteristics including histologic grade, histologic tumor type, tumor size, lymph node stage, lymphovascular invasion, and ER/PR/HER2 status were extracted from the clinical database. Central pathology review and IHC for ER/PR/HER2 of all breast carcinoma tissue were done at AKU Hospital, Nairobi, and interpreted by SS and ZM. AKU Pathology department is a College of American Pathologists accredited laboratory and as such enrolls in proficiency testing schemes for breast biomarkers. Additional slides were cut at 5 μm and subjected to IHC stains for EGFR, CK5/6, and Ki67 (Dako Monoclonal mouse anti-human antibodies were used; wild type EGFR polyclonal antibody in a dilution of 1:200, CK5/6 clone D5/16 B4 ready to use, Ki-67 Clone MIB-1, ready to use) according to the manufacturer specifications as previously described [13], with appropriate control tissues included, and stained on the DAKO Autostainer link instrument.
ER and PR tumor expression were considered positive by IHC with ≥ 1% nuclear staining. HER2 expression was determined by IHC and fluorescence in situ hybridization (FISH), the latter in case of an equivocal HER2 IHC result. An IHC score of 3+ or a FISHpositive test result was defined as HER2-positive [14]. Ki-67 was considered high if 20% or more of the cells showed nuclear staining based on St Gallen recommendation [15].
We used Ki-67 status (low/high) to discriminate luminal A and B and used tumor grade as a surrogate for patients with missing Ki-67 [16]. For EGFR and CK5/6, a result was considered positive for any amount of cytoplasmic or membranous staining in any percentage of tumor cells as per the recommendations from the British Columbia study for defining the Basal subtype of breast cancer [17].
Molecular subtypes were defined based on previous clinically validated guidelines [18] (Fig. 1 1). Due to the small sample size, in primary subtype analysis, we grouped the two luminal B subtypes into a single subtype for risk factor associations. For patients with EGFR and CK5/6 data available, we further stratified TN patients into core-basal like (CK5/6+ and/or EGFR+) and five negative (CK5/6− and EGFR−).

Statistical analysis
Distributions of breast cancer risk factors, including sociodemographic, reproductive, and tumor pathologic characteristics in the overall study population and by hospital groups, were assessed using the chi-squared test or Fisher's exact test. Multivariable polytomous logistic regression models were used to determine associations Fig. 1 Breast tumor subtype definition in Kenyan breast cancer patients (N=838). *Tumor grade was used to determine tumor subtypes in the absence of ki67: if tumor grade is low or intermediate, define tumor subtype as "Luminal A"; if tumor grade is high, define tumor subtype as "Luminal B HER2 -". †Seventeen cases are not included due to their missing HER2 status. ‡Forty-five cases are not included due to their missing CK5/6 and EGFR status. CK5/6, cytokeratin 5/6; EGFR, epidermal growth factor receptor; ER, estrogen receptor; HER2, human epidermal growth factor receptor-2; PR, progesterone receptor between BC risk factors and tumor molecular subtypes (ER status or luminal A-like as the reference).
All regression models were fully adjusted for the same covariates (except for where noted): age at diagnosis, BMI, age at menarche, age at first pregnancy, number of children, averaged breastfeeding duration, age at menopause, family history of breast cancer in 1st degree female relatives, highest education level, and occupation. A two-tailed P value less than 0.05 was considered statistically significant. All analyses were performed with SAS v9.4 statistical software (SAS Institute Inc.).

Descriptive analysis of sociodemographic and reproductive characteristics
There were 838 invasive breast cancer cases with complete data on ER and PR status after exclusion of DCIS cases (n=21) and cases without any data for tumor subtype (n=8). Fifty-four percent of patients were diagnosed under 50 years of age, 69% had BMI ≥ 25 kg/m 2 at diagnosis and 61% lived in rural areas. Our study population was also characteristic for late age at menarche (≥ 13 years, 92%), young age at first pregnancy (< 25 years, 70%), having 3 or more children (68%), high prevalence in breastfeeding (95%), and long breastfeeding duration (≥ 1 year per child, 80%) ( Table 1).
Compared to patients admitted to the other 4 hospital groups, AKU patients were more likely to be overweight or obese (79%), have tertiary education level (45%), start the first pregnancy ≥ 25 years (35%), have < 3 children (39%), and have shorter breastfeeding duration per child, which is as expected given that AKU is a private health facility, and compared to the others, patients are generally from a higher socioeconomic status.

Distributions of tumor subtypes and pathologic characteristics in the overall study population and by hospitals
The distribution of tumor subtypes defined by IHC markers is presented in Fig. 1  ). Sixty-one percent of tumors showed lymphovascular invasion. Nearly half of patients received definitive surgery, either lumpectomy or mastectomy, among which 91% had stage II or higher disease and for those cases with lymph node metastases, 39.5% were positive for extra-nodal extension. AKU patients were more likely to have small (≤ 2 cm) and early-stage tumors (P < 0.01). Patients admitted to Kijabe and Nyeri hospitals had higher proportions of tumors with lymphovascular invasion: 71.4% and 69.1%, respectively. There was no statistical difference in distributions of patient molecular subtypes (defined by ER, PR, and HER2) across hospitals (P = 0.08).
Associations between breast cancer risk factors and tumor subtypes ER, PR, and HER2 Results of adjusted associations between risk factors and ER status are shown in Table 3. Compared to ER-positive patients, ER-negative patients were more likely to have higher parity (OR = 2.03, 95% CI = 1.11, 3.72, P trend = 0.021, comparing ≥ 5 to ≤ 2 children). ER-negative patients were also more likely to have longer cumulative breastfeeding duration (OR = 2.38, 95% CI = 1.33, 4.24; comparing ≥ 62 to < 39 months); however, these positive associations became insignificant after adjusting for a number of children. In fact, analyzing parity and breastfeeding variables together showed that the association was driven by parity (Table 3). In addition, the average duration of breastfeeding per child did not vary significantly by ER. Overall, we observed similar associations for PR to those for ER (Supplementary  Table 2). BMI, either overall or by menopausal status, did not significantly vary by ER or PR status. When stratified by HER2 status, we found that, compared to HER2-negative patients, HER2-positive patients were less likely to be obese (OR = 0.58, 95% CI = 0.34, 0.97, P trend = 0.038), especially among postmenopausal women (OR = 0.26, 95% CI = 0.10, 0.62, P trend = 0.0026) (Supplementary Table  2). Similar results were observed when we restricted to early-stage patients (OR = 0.76, 95% CI = 0.59, 0.98, P trend = 0.038) suggesting that the association was unlikely to be due to the reverse causation.
Given that several risk factors and clinical variables varied by hospital groups (Tables 1 and 2), we next tested whether the observed associations varied among patients admitted to different hospital groups. In this analysis, we selected five key risk factors (i.e., BMI, age at first pregnancy, number of children, and mean breastfeeding duration per child, combined number of children and cumulative breastfeeding duration) and stratified their associations with ER or HER2 (for BMI) status by five hospital groups (  Table 3 and 4). With the exception of Nyeri, the associations with ER were fairly consistent across other hospitals for age at first birth, parity, and breastfeeding ( Fig. 2). In contrast, the association between BMI and HER2 appeared to be driven by AKU patients (Supplementary Figure 1), among whom obesity was significantly more prevalent than patients in other hospitals; however, this pattern was also observed among patients at Kijabe Hospital. We further evaluated the associations between the risk factors and ER in younger (< 50 years) and older (≥ 50 years) women separately. In general, the associations with most risk factors were similar in younger and older women, except that we observed an association between older age at menarche and ER-negative patients in older (OR = 2.25, 95% CI = 1.04, 4.84, P = 0.038, comparing ≥ 15 to ≤ 13 years) but not in younger women (OR = 0.98, 95% CI = 0.52, 1.87, P = 0.96, comparing ≥ 15 to ≤ 13 years) (Supplementary Table 5). Table 4 shows that the associations between BC risk factors and molecular subtypes defined by joint receptor status. Compared to luminal A patients, luminal B patients (combining luminal B-HER2+ and luminal B-high proliferative) were more likely to have lower parity (patients with 3 or 4 children, OR = 0.47, 95% CI = 0.28, 0.79, p = 0.005; with 5 or more children, OR = 0.45, 95% CI = 0.23, 0.87, p = 0.018, comparing to patients with 1 or 2 children). HER2-enriched patients were less likely to be obese (OR = 0.36, 95% CI = 0.16, 0.81, p = 0.013, comparing ≥ 30 to < 25 kg/m 2 ) or to have older age at menopause (OR = 0.38, 95% CI = 0.15, 0.997, p = 0.049, comparing ≥ 50 to < 50 years). The HER2-BMI association appeared to be stronger among postmenopausal women (OR = 0.24, 95% CI = 0.07, 0.081, p = 0.022) than among premenopausal women. Overall, cumulative or average breastfeeding duration did not vary significantly across subtypes. When looking at a number of children and breastfeeding or age at first birth jointly, it appears that luminal B patients with four or more children seemed to have shorter cumulative breastfeeding duration and later age at birth compared with luminal A patients (Table 4). Further stratifying luminal B and TN subtypes did not reveal additional associations (Supplementary Table 6).

Associations between breast cancer risk factors and molecular subtypes
We also conducted a number of sensitivity analyses to evaluate the impact of using grade to define subtypes when ki67 was missing and removing nulliparous women from analyses of age at first birth on our main conclusions. Overall, the results were similar to those from the original analyses (Supplementary Tables 7, 8, 9).

Discussion
The etiology of early-onset breast cancers is particularly lacking across populations given their rarity. * P values were computed from chi-square tests except where noted. P values less than 0.05 are shown in bold font. a Nulliparous women and parous women who never breastfed were grouped together in chi-square test. b Chi-square test was performed restricted to postmenopausal women. † Only 3.58% (n=30) of study participants reported ever having smoked or used smokeless tobacco. Exposure to smoking is summarized here as exposed/never exposed, where exposed is defined as personal use of tobacco as well as exposure to smoke at the workplace or home during child or adulthood. ‡ Nulliparous cases were women who reported never pregnant, never given birth, and had no children (N=37, 4.4%). AKU, Aga Khan University; BMI, body mass index; IQR, interquartile range; Q, quartile; SD, standard deviation. Studying African populations where risk factors differ and where onset is almost a decade earlier could provide new insights on breast cancer etiology given the etiologic and molecular subtype heterogeneity in diverse populations.
There is limited data from Africa where some of the breast cancer-associated risk/protective factors such as parity and breastfeeding have extremely different distributions. The overall risk factor distribution for BC patients in our study is similar to a large case-control study from Ghana [19], but is strikingly different from that of other populations including African Americans [20][21][22]. As an example, among BC patients in Ghana and Kenya, > 60% of women had ≥ 3 children, > 80% women had the first child before age 25 years, and > 90% women had breastfed with the average breastfeeding duration per child near two years. Whereas among African American BC patients in the African American Breast Cancer Epidemiology and Risk (AMBER) consortium, only 35% had ≥ 3 children and > 40% had never breastfed [21]. Similarly, the prevalence of obesity (BMI > 30 kg/m 2 , 41.7% in AMBER vs. 29.4% in Kenya) and early age at menarche (< 13 years, 52.3% in AMBER vs. 8.5% in Kenya) was much higher in AMBER [22,23]  Parity has been reported to have a dual effect on breast cancer risk; it is protective for ER+ women while increases risk for ER− women especially among younger women [24,21]. Despite the heterogeneity in parityrelated exposures, the differential effect of parity by ER has been consistently reported across different populations [25,21,19,26]. Although we were not able to compare relative risks associated with parity in different molecular subtypes due to the case-only design, our results of higher parity in ER-negative than in ER-positive patients is consistent with results from previous casecontrol studies [19,26]. In particular, taking advantage of the much higher parity among patients in Kenya, we observed that the association of parity with ER followed a dose-dependent manner, with the highest variation by ER observed among women with five or more children. Similarly, in a population where the vast majority of women had their first children before the age of 30 years, we found a similar association between younger age at first birth and ER-negative breast cancer consistent with previous studies [27,26,28], supporting increased parity as a risk factor for ER-negative breast cancers across multiple populations. We observed luminal B patients, both luminal B/high proliferative and luminal B/HER2+, had fewer children compared to luminal A patients. These results are in line with data from the Nurse's Health Study reporting greater reduced risks associated with parity in luminal B than luminal A patients [25], suggesting that parity may have a stronger protective effect for luminal B as compared to luminal A patients. However, using data based on a Malaysian case-series,  we found that luminal B patients were more likely to be parous and to have breastfed compared to luminal A patients [26]. These inconsistent results warrant further investigations especially in diverse populations. Investigations of associations between breastfeeding and breast cancer risk by receptor status have resulted in inconsistent findings, with some showing a similar protective effect for all subtypes [29], and others showing a stronger protection against ER-negative especially TNBC [30]. In the Ghana study in which the frequency of ER-negative breast cancer especially TNBC was higher (28% vs 18% of tumors) than in the Kenya study, the increased risk of parity was offset by more extended breastfeeding, which was only seen among patients < 50 years of age in ER-negative but not in ER-positive patients, while in older women, extended breastfeeding showed an inverse association regardless of ER status yet a stronger association for ER-positive patients [19]. We did not observe significant differences of breastfeeding by ER or by intrinsic subtype, either in all women or by age. The inconsistent findings between different African populations with similar parity and breastfeeding characteristics highlight the complexity of subtype-specific risk associations and the importance of conducting large molecular epidemiologic studies in diverse African populations. 0.32 † Point estimates and 95% confidence intervals were from multivariable models, adjusting for the same series of covariates (except where noticed): age at diagnosis, BMI, age at menarche, age at first pregnancy, number of children, mean breastfeeding duration per child, age at menopause, family history of breast cancer in first-degree female relative, occupation, education level, and location of facility. Estimates of numbers of children, cumulative and averaged breastfeeding duration, and combined age at first pregnancy and number of children were computed among parous women. ‡ Results were from the trend analysis using the categorical risk factor as a trend. a Multivariable modeling analysis without adjusting for age at menopause. b Women who reported never pregnant, never gave birth, and had no child were grouped as "Nulliparous" in modeling analyses. c Multivariable modeling analysis without adjusting for mean breastfeeding duration per child. d Multivariable Modeling analysis was restricted to postmenopausal women. BMI, body mass index; CI, confidence interval; ER, estrogen receptor; OR, odds ratio; Q, quartile Obesity is a known risk factor for breast cancer in post-menopausal women but protective in premenopausal women [31]. Obesity can disrupt some biological pathways, resulting in insulin resistance, and synthesis of endogenous sex hormones [32,33]. When we examined the association of obesity with molecular subtypes, we found that patients with HER2 enriched BC were less likely to have a high BMI. Although we cannot completely rule out the possibility of reverse causality due to weight loss associated with breast cancer, it is unlikely that the association we observed is entirely driven by reverse causation since BMI did not vary significantly by tumor stage in our study. Our findings are consistent with a Polish breast cancer case-control study, which found that in premenopausal women, HER2 expression was inversely associated with BMI adjusted for the 4 markers (adjusted p-trend = 0.01) [34]. In addition, the association was stronger among AKU patients, who were more likely to have early-stage disease as compared to patients from other hospitals. Our findings are similar to a study conducted in Malaysia, which showed that women with HER2-enriched and TNBC tumors were significantly less likely to be obese than those with the luminal A subtype [26]. Our results are also in line with the analysis based on African Americans in the AMBER consortium [22] and a pooled analysis of nine studies of the National Cancer Institute cohort consortium [27] showing that, among postmenopausal women, higher recent BMI was associated with increased risk of ERpositive cancer, but was either associated with decreased Fig. 2 Associations between key breast cancer risk factors and ER status by hospitals. Odds ratios (OR) and 95% confidence interval (CI) were calculated from multivariable logistic regression models with ER status as the outcome variable (ER+ as reference) adjusting for categorized age at diagnosis and BMI  risk of ER-negative tumors in AMBER or was not associated with ER-negative BC in the NCI cohort consortium. Notably, the association with BMI observed in our study was mostly driven by HER2 status rather than by TNBC, which is more similar to the findings in the Malaysian study [26]. The strength of our study includes representation of BC cases from multiple hospitals in Kenya, wellannotated risk factor questionnaire and clinical data, and centralized high-quality biomarker assessment in a unique east African population.
This study was limited by the retrospective collection of risk factor data and possible reverse causation, as well as the case-only design, which prohibited us from estimating relative risks associated with each risk factor. Further, despite being the largest BC study of this type conducted in Kenya, the sample size was still relatively small to evaluate risk factors in rare tumor subtypes, especially in age-stratified analyses.

Conclusion
In summary, our findings, based on data from an indigenous African population with unique risk factor profiles, add to the growing body of knowledge regarding the etiologic heterogeneity of breast cancer molecular subtypes among geographically diverse ethnic groups. Further investigations of genetic and environmental factors that modify breast cancer risk in African populations are recommended. Inclusion of diverse regional population groups from sub-Saharan Africa in global breast cancer studies may help provide a better understanding of the subtype-specific breast cancer risk etiology, which will be critical for the development of risk prediction models in African populations.  *Seventeen cases were excluded from analyses because of their missing data for HER2 status. † Point estimates and 95% confidence intervals were from multivariable models, adjusting for the same series of covariates (except where noticed): age at diagnosis, BMI, age at menarche, age at first pregnancy, number of children, mean breastfeeding duration per child, age at menopause, family history of breast cancer in first-degree female relative, occupation, education level, and location of the facility. Estimates of numbers of children, cumulative and mean breastfeeding duration, and combined age at first pregnancy, and number of children were computed among parous women. ‡ Results were from the trend analysis using the categorical risk factor as a trend. a Multivariable modeling analysis without adjusting for age at menopause. b Women who reported never pregnant, never gave birth, and had no child were grouped as "Nulliparous" in modeling analyses. c Multivariable modeling analysis without adjusting for mean breastfeeding duration per child. BMI, body mass index; CI, confidence interval; HER2, human epidermal growth factor receptor-2; OR, odds ratio; Q, quartile