Study design and participants
E3N (Etude Epidémiologique auprès de femmes de la Mutuelle Générale de l'Education Nationale) is an ongoing prospective cohort study involving 98,995 French women, established in 1990 to investigate risk factors for cancer and severe chronic conditions in women [31]. Participants were recruited between June 1990 and November 1991 among women aged 40-65 years, living in France and insured with the MGEN, a national health insurance plan covering people working within the French education system and their families, and have been biennially followed-up with self-administered mailed questionnaires. At recruitment, participants filled in a self-administered questionnaire, which included items relative to lifestyle and reproductive factors, anthropometry, past medical history, and family history of cancer. To date, twelve questionnaires have been sent to the participants (participation rate at each questionnaire ~80%). Between 1994 and 1998, participants were invited to give a blood specimen, resulting in the collection from 25,000 women, and saliva samples were later collected from an additional 47,000 women. Occurrence of cancer was self-reported in each questionnaire, and a small number of cancers were further identified from the insurance files or information on causes of death obtained from the National Service on Causes of Deaths (CépiDC-Inserm). The pathology report for confirming diagnosis of the primary outcome (invasive BC) was obtained for 93% of declared cases and the proportion of false-positive self-reports was low (<5%). The addresses of the subjects selected for the study have been reported in the baseline questionnaire (1990) and in the 5th to 9th follow-up questionnaires (years 1997, 2000, 2002, 2005, 2008, and 2011). In the 3rd and 4th follow-up questionnaires (years 1993 and 1994) only postal codes of participants were reported. In addition, participants’ place of birth (postal code and municipality) was obtained from the first questionnaire and assigned an urban/rural status based on data from the closest national census [32]. Informed consent was obtained from all participants and the study was approved by the French National Commission for Data Protection and Privacy (CNIL).
The nested case control study design
A nested case-control study was designed among women of the E3N cohort who had completed their home address at baseline, lived in the metropolitan French territory during the follow-up time, and had never been diagnosed with any cancer at baseline. Details of this study have been described elsewhere [27]. After excluding women with phyllodes tumors, a total of 5,382 histologically confirmed incident BC cases were identified during the follow-up 1990 to 2008. From these, we excluded women with missing addresses (including women with more than one missing address and those for whom it was impossible to retrieve addresses, N = 981 cases). For each BC case, one control was randomly selected using incidence density sampling, i.e. among cohort participants at risk of BC at the time of case diagnosis, using the follow-up time since inclusion into the cohort as time axis. In order to best select appropriate controls according to the planned studies, two complementary groups of cases were set, according to availability of a biological sample (blood or saliva), for the first group of cases (with a blood sample), controls were matched to cases on the department of residence, age (± 1 year), date (± 3 months) and menopausal status at blood collection. Controls for the second group (without blood sample) were matched on the same criteria but collected at baseline, and additionally matched on the existence or not of a saliva sample. Finally, the nested case-control study involved 4,401 women diagnosed with a primary invasive BC and 4,401 matched controls with complete information on home address at baseline [27].
Assessment of staging, grading, histology and other covariates
Information on tumor-node-metastasis (TNM) stage was extracted from pathological reports or any other medical document (such as bone-scan, magnetic resonance (MRI) or X-ray radiography reports). Of the 4,401 BC cases, a total of 3,924 (89%) cases had stage information. Information on the grade of differentiation and histological subtype at diagnosis was collected based on pathological reports and available for 3,433 (78%) and 4,120 (93.6%) BC cases, respectively. Data on established BC risk factors and other potential confounding factors were obtained from the E3N self-administered questionnaires at baseline. Information collected at baseline on smoking, anthropometry (height, weight), physical activity, diabetes, hypertension, benign breast disease, gynecological follow-up, family history of BC (FHBC), education, age at menarche and at menopause, use of exogenous hormones, number of children, age at first full-term pregnancy (AFP), and breastfeeding [31]. Follow-up questionnaires were sent every 2-3 years thereafter. Daily alcohol intake (g/day) was estimated from the validated E3N self-administered diet history questionnaire (DHQ) in 1993. Physical activity was converted into metabolic equivalent task-hour per week (MET-h/w). Education level was used as a proxy for socioeconomic status.
Assessment of long-term exposure to airborne cadmium
The method employed to estimate airborne cadmium exposure at the individual residential address level have has been previously described in detail [33–35] and applied in two previous studies [27,36]. Briefly, the residential history of the women, from their enrolment in the E3N cohort until the index date (BC diagnosis for cases, date of diagnosis of the case in the case-control pair for controls) was used to estimate atmospheric exposure to cadmium, within a Geographic Information System (GIS).
A detailed retrospective inventory of industrial cadmium emitting sources over the entire metropolitan France between 1990 and 2008 was performed [33]. Sources of emissions were assessed using emission factors from the OMINEA (Organization and Methods of the National Inventories of the Atmospheric Releases in France) [37] and the EMEP (European Monitoring and Evaluation Program) [38] databases. Overall, 2,700 cadmium sources were inventoried over the French national territory from 1990 to 2008 [33].
The participants residential history from 1990 to 2008 and inventoried cadmium emitting sources were geocoded (X and Y coordinates, addresses) using the ArcGIS Software (ArcGIS Locator version 10.0, Environmental System Research Institute – ESRI, Redlands, CA, USA) and its reference street network database, BD Adresse®, from the National Geographic Institute (IGN) [35]..
To classify the study subjects according to their airborne cadmium exposure, a GIS-based metric was developed and calibrated using a set of parameters (local meteorological data, characteristics of industrial sources, e.g. emission intensity and stack height) [34]. Specifically, the annual airborne cadmium exposure index (AACEI) was estimated using the following GIS-based metric:
where j was the place of residence (j=1,…,J); i was the industrial source (i=1,…,I), EI was the source annual cadmium emission intensity (in kg/year); t was the emission period duration (in year); d was the residence-to-source distance (in m); Fi was the factor accounting for the weighted contribution of wind direction from the industrial source i to the participant’s residence j; hi was the stack height (in m); hmedian was the median value of the other sources’ stack height (in m) in a 10 km buffer, and was taken into account only when hi was greater than 90 m.
The exposure to cadmium was computed for each individual and for each calendar year. For each individual, their cumulative airborne cadmium exposure index was calculated by cumulating their AACEI from their entry into the cohort to their index date. The cumulative airborne cadmium exposure index was then expressed from kg/m² to mg/m² [27].
Statistical analyses
Kruskal Wallis and Chi-square tests were used to assess BC cases characteristics differences according to stage, grade of differentiation, histological type with regard to continuous and categorical variables, respectively.
Conditional logistic regression models were used to estimate odds ratios (OR) and their 95% confidence intervals (95% CI) for risk of BC associated with cadmium exposure. Models were conditioned on the matching factors including date of blood collection or of the return of the first questionnaire, age, department of residence, menopausal status at blood collection or at baseline, and existence of a biological sample (blood, saliva, none). Two adjusted models were considered to account for predefined variables recognized as confounding and risk factors for BC. Using a directed acyclic graph to identify the confounding variables, the first model was adjusted for physical activity (< 25.3, 25.3-37.3, 37.4-56.9, and ≥ 57.0 METs-h/week), alcohol intake (never, < 6.7, ≥ 6.7 g/day), level of education (secondary, 1 to 2-year university degree, ≥ 3 year-university degree), BMI (< 25, 25- < 30, and ≥ 30 kg/m²), age at menarche (< 12, 12-13, and ≥ 14 years), parity and AFP (0, 1-2 children & AFP < 30 years, 1-2 children & AFP ≥ 30 years, and ≥ 3 children), breastfeeding (ever, never), oral contraceptive use (ever, never), MHT (ever, never), status of birthplace (rural, urban) [9,32,36] and smoking status (never, current, and former). In the second multivariable model, we further adjusted for previous FHBC (yes, no) and history of personal benign breast disease (yes, no). Since there was no difference in the OR estimates between the two models, we only reported the results of the fully adjusted models in the main manuscript. For contraceptive and menopausal MHT variables, we considered the values collected in the last questionnaire before the date of diagnosis in cases, whereas all other adjustment variables were taken at E3N baseline questionnaire.
For covariates with less than 5% missing data, the latter were imputed by their modal or median value of the control population; and for variables with more than 5% of missing data (only alcohol intake and rural urban status at birth), a category of missing data was created.
Statistical analyses for quintiles of the cumulative airborne cadmium exposure index were performed by stage, grade of differentiation, and histological subtype of BC at diagnosis using the first quintile as the reference value. Quintile cut-points of the cumulative levels were based on the distribution in control subjects. For each variable, the P for linear trend was the p-value associated with the regression coefficient of the categorical variable used as continuous. The statistically significance of the global effect of the quintiles of the cumulative airborne cadmium was derived from the likelihood ratio test comparing the models including and excluding terms for quintiles. Heterogeneity of associations across BC stage at diagnosis was assessed using polytomous logistic regression and P values for heterogeneity were derived from Wald tests [39]. For sensitivity analyses, we repeated our analyses using the mean of the annual airborne cadmium exposure index (from entry into the cohort to the index date) of the cadmium exposure instead of the cumulative exposure. Additional adjustments for the mammographic examination before inclusion (yes, no) was also done. Cubic splines [40], with four knots placed at 5th, 35th, 65th, and 95th percentiles of the cumulative exposure to airborne cadmium distribution [41] were performed to explore the non-linearity of the relationship between cadmium and BC risk.
All statistical tests were two-sided and a threshold of P values < 0.05 were considered statistically significant. All analyses were performed using STATA version 14 (College Station, Texas, USA).