Reproductive history differs by molecular subtypes of breast cancer among women aged ≤ 50 years in Scotland in 2009-16: A population-based retrospective cohort study

The aetiology of breast cancers diagnosed ≤ 50 years of age remains unclear. We aimed to compare reproductive risk factors between molecular subtypes of breast cancer thereby suggesting possible aetiologic clues, using routinely collected cancer registry and maternity data in Scotland. Methods We conducted a population-based retrospective cohort study of 4,108 women aged ≤ 50 years with primary breast cancer diagnosed between 2009–2016 linked to maternity data. Molecular subtypes of breast cancer were dened using immunohistochemistry (IHC) tumour markers, oestrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor-2 (HER2), and tumour grade. Age-adjusted logistic regression models were used to estimate odds ratios (OR) and 95% condence intervals (CI) for the association of number of births, age at rst birth and time since last birth with IHC-dened breast cancer subtypes: luminal A-like, luminal B-like (HER2-), luminal B-like (HER2+), HER2-overexpressed and triple negative breast cancer (TNBC). among women aged ≤ 50 years. Analyses using linked routine electronic medical records by molecularly dened tumour pathology data can be used to investigate the aetiology and prognosis of cancer.


Introduction
Breast cancer is the most common malignancy worldwide [1]. In Scotland, it constitutes 28.1% of all cancers, with 1 in every 8 women carrying a risk of developing it in her lifetime [2].
Breast cancer has been classi ed into 'intrinsic' or molecular subtypes based on mRNA expression pro ling that have different treatment and survival outcomes [3]. The characteristics of these molecular subtypes are largely distinguished by expression of various combinations of tumour markers such as oestrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor-2 (HER2) and Ki67 tumour proliferation marker. Although gene pro ling is considered the gold standard for classi cation of molecular subtypes, given the cost and lack of genetic pro ling in clinical practice, a similar classi cation de ned by immunohistochemistry (IHC) staining is a well-accepted surrogate [4,5].
The St. Gallen Expert Panel recommends using ER, PR and HER2, along with tumour grade as a proxy for Ki67 index in de ning the subtypes when the latter is unknown [5]. Based on IHC characterization, the molecular subtypes are: Luminal A-like, luminal B-like (HER2-), luminal B-like (HER2+), HER2-enriched and triple negative breast cancer (TNBC). As the luminal-like cancers (ER/PR+) express hormone receptors, they can be effectively treated with molecularly targeted hormone therapy and generally have better prognosis. Due to the absence of therapeutic targets i.e., ER/PR or HER2 in TNBC, the most aggressive subtype, chemotherapy is the only current treatment option [6,7].
Reproductive factors have been well documented as key breast cancer risk factors with direct associations observed with early age at menarche, nulliparity, late age at menopause and rst birth, and limited breastfeeding [8,9]. Data also suggest that there is a temporal relationship with time since last birth, where a short-term increase in breast cancer risk is observed 3-5 years after last birth [10,11], before a long-term protective effect of parity is observed compared to nulliparity.
Within Scotland's renowned, high-quality routine electronic health records, the Scottish Cancer Registry (SMR06) is an excellent resource to investigate risk factors for cancer incidence. In Scotland, ER status data collection began in 1997, and PR and HER2 data collection started in 2009, almost a decade earlier than other registries in the UK. We have recently reported the high quality of these data and shown distinct temporal trends by molecular subtypes and observed increasing incidence of ER + subtypes among women of screening age (50-70 years), among whom about half of all cases are diagnosed [12].
In this study, we aimed to assess whether there are differences in reproductive risk factors among invasive breast cancer cases diagnosed in Scotland using a 'case-case' approach. A case-case analysis compares the risk factor associations of breast cancer by comparing cases of a certain molecular subtype to cases of another subtype, without also describing risk factor patterns in women without breast cancer [13].

Data sources and study population
The Information Services Division (ISD) of Public Health Scotland holds population-level National Health Service (NHS) data for Scotland which can be deterministically linked using the Community Health Index (CHI) number, a unique patient identi er, with additional probabilistic linkage providing < 4% false positive and < 2% false negative linkage [14]. Incident primary breast cancer cases were identi ed using data from the Scottish Cancer Registry [15] which attains an average of 95.4% breast cancer case ascertainment and is over 99% complete [16,17]. All tumours diagnosed in women 20 + years of age, with a primary invasive breast cancer (de ned on the basis of the International Classi cation of Diseases, 10th revision code of C50) between 1997 and 2016 were ascertained [17,18]. Approval for the analysis was obtained from the Public Bene t and Privacy Panel (PBPP) of NHS Scotland, and analyses were conducted in the Scottish National Safe Haven (PBPP reference number 1718-0057).
Maternity data CHI number and probabilistic matching were used to link cancer registry data (SMR06) to Scottish Morbidity Records maternity inpatient and day case records (SMR02) which was available from 1981. To improve completeness of maternity data, the study excluded women who were ≥ 16 years (i.e., already in their reproductive years) in 1981, resulting in a cohort of women born in 1966 or thereafter. Data on number of births, age at rst birth and time since last birth, including both live births and stillbirths, were calculated. The number of births was derived from the number of maternity records each woman held in SMR02. The maternal age from the rst maternity record for a parous woman was considered as her age at rst birth. Time since last birth was calculated as the time from the most recent birth preceding a cancer diagnosis.

Molecular subtypes de nition
The Scottish Cancer Registry (SMR06) records the receptor status for breast cancers using immunohistochemistry (IHC) staining for ER, PR and HER2, and for borderline IHC HER2 results the status based on uorescence in situ hybridization [19]. While ER status for breast cancer became available in SMR06 in 1997, recording of information on PR and HER2 status commenced only in 2009 [19]. As we aimed to evaluate the subtypes based on ER, PR and HER2 status, we focused on cases diagnosed from 2009 onwards. Due to non-availability of data on Ki67 labelling index, tumour grade was employed as a proxy for distinguishing the luminal subtypes [5]. The outcome variable, breast cancer subtype, was derived from four variables in SMR06: ER status, PR status, HER2 status and histological grade of the

Statistical analyses
Age distribution at diagnosis of breast cancer, number of births, age at rst birth and time since last birth were computed for each breast cancer subtype. Pearson's chi-square tests were used to test for differences between subtypes in the distribution of reproductive risk factors of interest. Logistic regression models adjusted for age at diagnosis of breast cancer were used to estimate odds ratios (OR) and 95% con dence intervals (CI) with the most common subtype, luminal A-like, as the reference group.
The models were run separately for each reproductive risk factor of interest. Chi-squared tests for trend were performed. In order to avoid introduction of redundancy in the statistical models, absence of collinearity was con rmed between age at diagnosis and each reproductive risk factor by computing Spearman's correlation coe cients [20]. Tests were considered statistically signi cant at the 5% level. R version 3.6.0 [21] was used for all analyses.

Results
The nal study population included 4,108 women with breast cancer diagnosed at or below 50 years of age with data available to assign breast cancer subtype, after excluding 9.7% of the initial cohort with missing hormone status or tumour grade data (Supplementary table S1). Luminal A-like was the most common type (40%) and HER2-overexpressed was the least common (5%, Fig. 1).
Distribution of age at diagnosis of breast cancer, number of births, age at rst birth and time since last birth by the ve breast cancer subtypes are presented in Table 1. Overall, 34% of breast cancers occurred in patients of age 40 years or younger years of age and 66% in those between 41-50 years of age. The proportion of all luminal A-like tumours diagnosed in the age group 41-50 years was 76% as compared to around 60% of the other subtypes being diagnosed in this age group. Women with luminal A-like subtype had the highest proportion of absence of birth records (assumed nulliparity of 31%) and breast cancer diagnoses that were six or more years following their most recent birth (83%), while women with HER2-overexpressed and TNBC had the highest proportions of one or more birth records (79% and 76%, respectively) and diagnoses made within six years after last birth (30% and 31%, respectively). Chi-square test revealed no statistically signi cant differences for age at rst birth by subtype (Table 1). A signi cant correlation between age at diagnosis and time since last birth was observed (Spearman R 2 = 0.63) and no correlation for number of births or age at rst birth with age at diagnosis (Spearman R 2 < 0.05).  *P value for heterogeneity between subtypes using chi-square test: 5.61e-10 *Signi cant at 5% level **Total counts (N = 2,908) exclude parous women that did not have a pregnancy before diagnosis of breast cancer women with TNBC were signi cantly more likely to have at least one (as against no births) in comparison to the women with luminal A-like tumours ( Table 2). Although based on fewer cases, a similar association was observed for women with HER2-overexpressed tumours who were more likely to have three or more births (relative to no birth records) when compared to women with luminal A-like tumours, in addition to a statistically signi cant test for trend across all sub types.  Table 3 shows case-case analysis for age at rst birth by subtype. There was no evidence of an association between age at rst birth and tumour subtype.

Discussion
Using Scottish cancer registry data linked to maternity health records, we show that parity, number of births and time since last birth to diagnosis of breast cancer differ by IHC-de ned molecular subtypes of breast cancer among women ≤ 50 years of age at diagnosis of breast cancer. Breast cancer aetiology in younger women is not fully understood as few risk factors have been identi ed. Furthermore, few opportunities for early detection of breast cancer are available for younger women beyond genetic counseling for high-risk families.
Multiple reports and pooled analyses have recently evaluated IHC and mRNA expression pro ling de ned molecular subtypes of breast cancer and consistently show a positive association with parity for triplenegative or basal-like breast tumors [22][23][24][25][26][27]. Interestingly, signi cant differences in the incidence of breast cancer exist for different ethnic and racial groups that also frequently have different reproductive histories [28]. Consistent with these data, we also found evidence of heterogeneity in reproductive history across IHC-de ned molecular subtypes of breast cancer in this Scottish cohort. Women with ER-tumours (HER2-overexpressed and TNBC) were more likely to have a higher number of births compared to women with luminal A-like subtype. Unlike ER-cancers, we did not observe heterogeneity in number of births between luminal B-like (HER2+) and luminal A-like, which concurs with other consortium efforts [29][30][31][32].
Time since last birth showed differential associations by subtype, where women with TNBC or luminal Blike (HER2+) were less likely than women with luminal A-like tumours to have a longer time between their most recent birth and diagnosis of breast cancer. Findings for TNBC correspond well with the existing studies [33,34].
Parity confers a dual effect on the risk of breast cancer with an augmented risk observed in the initial years following pregnancy (3-5 years, or even up to 10-15 years) [35][36][37], possibly by stimulating the growth of cells that have undergone initial stages of malignant change and also due to the immunosuppressive effects of pregnancy [35,38]. It is only subsequent to this phase that the protective effect of parity sets in [39,40] owing to the differentiation of normal breast cells that have the potential to undergo malignant transformation. While this has been observed for ER + breast cancers (luminal A-like) [9,29,41], an increased risk of ER-breast cancer continues to persist even in the longer term [32,34,42].
Our results revealed no signi cant difference across subtypes for age at rst birth. However, TNBC cases were more likely to have a younger age at rst birth when compared to luminal A-like cases (approximately 16% versus 12.5% patients for age at rst birth < 20 years). A similar, statistically signi cant association has been reported by other studies [29,30,[42][43][44]. Luminal B-like (HER2-) cases showed no statistically signi cant difference from luminal A-like for either of the three risk factors of interest even though studies have reported an inverse association with number of births and a positive association with age at rst birth for this subtype [45,46].
ER-breast cancers are less likely than ER + breast cancers to be detected through screening [47], and predictive modelling of breast cancer risk has been proposed as possible solution for personalised medicine and risk strati ed screening [48][49][50]. Modelling studies using UK data suggest such risk strati ed screening approaches could reduce overdiagnosis, improve cost-effectiveness, while maintaining the bene ts of screening [51].
The key strengths of our study are the high-quality longitudinal data collected within the Scottish Cancer Registry for the entire population, and the availability and high level of completeness of molecular marker and tumour grade data (10% missing data). Another strength of the study is the inclusion of women diagnosed at age 50 years or below. Although breast cancer is less common within this age range, the tumours are more aggressive with poor prognosis making it important to identify and implement effective approaches to prevention amongst this age group [52]. Moreover, breast cancer incidence appears to be increasing in younger age groups in recent years in Scotland [12] and other populations such as the United States [53].
Although this is one of the largest studies of breast cancer among young women, a limitation is the modest number of cases for rarer tumour subtypes, especially HER2-overexpressed (5% of all cases), potentially reducing the statistical power of analyses for these tumour subtypes. Future work including a comparison cohort of women not diagnosed with breast cancer would add further updated information about the role of reproductive history as a risk factor for breast cancer, including, in due course for whose breast cancer is diagnosed at older ages. Other limitations of our study were the potential for incomplete maternity records for women whose children were born outside Scotland, lack of availability of data for other factors such as breastfeeding as well as for a more detailed mRNA expression or mutation pro ling of the cancers.
In conclusion, our data highlight the value of integrating molecular data from tumours with routinely collected health records data for understanding cancer epidemiology. There is scope for future analysis using the cancer registry linked to other datasets, including community prescription records, mammographic imaging, and primary care records, to provide more detailed information on the role and patterns of key risk factors and possible new aetiologic or prognostic factors for subtypes of breast and other cancers.

Declarations
Data availability All data used in the present study can be accessed by submitting an application to electronic Data Research and Innovation Service (eDRIS), a part of the Information Services Division of Public Health Scotland. More information on how to request access is available at https://www.isdscotland.org/eDRIS.

Competing interests
No competing interests were disclosed.

Funding
This study was funded by Wellcome Trust grant 207800/Z/17/Z

Author contributions
This study was conceived and designed by JDF. Formal analysis was conducted by JDF, IME and AC.
JDF and AC prepared the original draft. SHW, DC and IME contributed to data interpretation and critical