We utilized the case-control  and follow-up  studies from the population-based Long Island Breast Cancer Study Project (LIBCSP), which included 1,508 women diagnosed with incident invasive or breast carcinoma in situ between August 1, 1996 and July 31, 1997 and 1,556 women without breast cancer who were residents of the same two counties, frequency matched by 5-year age group to the expected age distribution of cases. Potentially eligible control women were identified by Waksberg’s method of random digit dialing  for those under 65 years of age, and the Health Care Finance Administration (HCFA) rosters for those 65 years and older. Institutional Review Board approval was obtained from all participating institutions and written informed consent was obtained prior to study participation.
This current study included 1,319 cases and 1,310 controls who had complete data on the components of HLI score, including diet, alcohol consumption, physical activity, BMI, and menopausal status. The CONSORT diagram in Supplementary Fig S1 provides the details of the study population. For the 1319 women with breast cancer, vital status was ascertained through linkage with the National Death Index (NDI). They were followed from the time of diagnosis in 1996-1997 through December 31, 2014 to determine the date and cause of death, including death from breast cancer, identified using International Classification of Death codes 174.9 and C-50.9 listed on the death certificate . Over a median follow-up of 214.5 months (range = 2.8 – 224.2 months), we identified 521 deaths, including 210 deaths from breast cancer. Information on the tumor receptor status, including estrogen receptor (ER), progesterone receptor (PR) and the human epidermal growth factor receptor (HER2), was obtained from the pathology records.
Construction of HLI
The information on demographic characteristics, pregnancy history, menstrual history, hormone use, family history of cancer, body size changes, current alcohol use, active and passive cigarette smoking, physical activity, and breastfeeding history were obtained from the main study questionnaire completed at enrollment. A detailed description of the food frequency questionnaire (FFQ) for the LIBCSP, which captured information on diet in the year prior to diagnosis among cases or prior to enrollment among controls, has been published elsewhere [42,43].
The HLI was generated by including information on six lifestyle recommendations (body fatness, physical activity, consumption of plant foods, animal foods, and alcohol, and breastfeeding) based on the new WCRF/AICR guidelines  with additional consideration on smoking [18,44]. Each of the HLI components was scored from 0 to 1. Physical activity, alcohol consumption and body fatness were each given a score of 1 point when the recommendation was met, 0.5 points when it was partially met and 0 points otherwise. Because body fatness (BMI calculated as weight (kg)/height (m2)) was inversely associated with breast cancer risk in premenopausal women, but positively associated with breast cancer risk in the postmenopausal women , a high HLI score represents high BMI in premenopausal women and low BMI among postmenopausal women. Breastfeeding was assigned a score of 0 for no history of breastfeeding or 1 for having a history of breastfeeding. Intake of plant foods was scored based on recommendations of the intake of fruits/vegetables (score = 0, 0.25, or 0.5, for recommendation not met, partially met, and fully met, respectively), beans (score = 0, 0.15, or 0.25) and whole gains (score = 0, 0.15, or 0.25). Intake of animal foods was scored based on the intake of red meat and processed meat (score = 0, 0.25, 0.5). Consistent with previous studies [46,32,47,31,35,33], smoking was included in the HLI score and scored as 1for being never smokers, 0.5 for being a former smoker who quit more than 12 months before reference date, or 0 points for being a current smokers at or within 12 months of the reference date. The scoring system was based on the assumption that each major recommendation would contribute equally to the study outcomes. Therefore, the total HLI score can be ranged from 0 to 7, with higher scores indicating a healthier lifestyle. Detailed information on the operationalization of HLI score can be found in Supplementary Table S1. Based on the HLI score distribution of the controls, we then categorized the study population into tertiles as low, intermediate, and high HLI groups.
The general characteristics of cases and controls were compared using independent samples Student’s t test or Wilcoxon test for continuous variables and chi-square for categorical variables. An unconditional multivariable logistic regression was used to estimate the odds ratios (ORs) and 95% confidence intervals (CIs) for associations between HLI scores and incident breast cancer. HLI scores were modeled as a continuous (1-point increment) and categorical variables (tertiles) with the low tertile of HLI score serving as the reference group in the categorical analyses. All models were adjusted for reference age (continuous in years; age at diagnosis for cases and age at study enrollment for controls). Potential confounders of the association between healthy lifestyle and breast cancer risk or mortality included: family history of breast cancer in a first-degree relative, education and parity. Starting from fully adjusted models, none of these variables altered the OR or HR estimates by more than 10%; therefore, only age-adjusted results are presented. Separate models were also fitted according to the breast cancer subtypes (i.e. hormone receptor (ER and PR) and HER2 receptor status). Cox regression models were used to estimate the hazard ratios (HRs) and 95% CIs for all-cause and breast cancer-specific mortality in association with HLI scores. Similar to logistic regression analyses mentioned above, only age-adjusted results reported. We also performed stratified analyses by menopausal status and tumor subtypes. Sensitivity analyses were performed with the different weights for fruits/vegetables, beans, whole grains, read meat and processed meat in HLI food score. These sensitivity analyses did not materially alter the results. The analyses were carried out using glm (generalized linear model) function, and the “survival” and “forestplot” packages in R version 3.6.1 [48-50]. All tests were two-tailed, and P values <0.05 were considered statistically significant.