We applied a two-sample MR study by summary data from the GWAS consortium (Fig. 1). The original researches had been informed and obtained by ethical approval. We used single nucleotide polymorphisms (SNPs) which were strongly related with whole serum iron status as instrumental variables, to explore the effect of iron status on breast cancer risk. By formatting data on the relationship of instrumental variables (SNPs) between iron status and breast cancer, the effect of systemic iron on breast cancer was estimated. To investigate whether iron status has a potential bias or mediator effect on breast cancer through other breast cancer risk factors, we analysed the relationship between SNPs related with iron status and breast cancer risk factors.
Data sources of instrumental variables-exposure
We searched summary data of the largest meta-GWAS for MR analysis from the Genetics of Iron Status (GIS) consortium, which involved 4 phenotypes of serum iron, transferrin, ferritin and transferrin saturation. These phenotypes of data from 11 discovery and 8 replication cohorts which comprised 48,972 individuals of European ancestry . We performed genome-wide analyses within each cohort according to a uniform analysis plan, and performed adjustment by principal component scores, age, and other specific searches. Furthermore, the thresholds of the population structure and quality control about each cohort were imputed into the score > 0.5, HWE (Hardy –Weinberg Equilibrium Test) ≥ 10-6, and MAF (minor allele frequency) > 0.01 .
Data sources of instrumental variables-outcome
The publicly available summary data about breast cancer were obtained from the largest meta-GWAS of the OncoArray network, which contains data about five cancers: breast, ovarian, prostate, lung and colorectal cancer . 122,977 cases and 105,974 controls comprised the breast cancer dataset, all of which were of European ancestry. Moreover, 69,501 cases were ER-positive breast cancer and 21,468 cases were ER-negative breast cancer.
The thresholds of the population stratification and quality control about the GWAS analysis for the cohort was imputed into the score >0.3, HWE ≥ 10-12, and MAF > 0.01. The information has been previously reported [16, 17].
Selection of instrumental variables
Instrumental variables of the MR analysis which were strongly significant on iron status (P < 5 × 10-8) were selected from the GIS consortium (Table 1-4) . All of the SNPs could be found in the OncoArray network and were in status of linkage equilibrium (all pairwise r2 ≤ 0.01).
To select instrumental variables, we performed two analysis methods, which ware conservative and liberal instrumental variable analyses. For conservative instrumental variable analyses, 3 SNPs (rs855791, rs1800562, rs1799945) were strongly associated with the increasing concentrations of serum iron ferritin, serum iron and transferrin saturation, and decreasing concentrations of transferrin (P < 5 × 10-8). Due to the concentrations of ferritin, serum iron and transferrin saturation were increased, concentrations of transferrin were decreased and systemic iron status would be increased . Therefore, the iron status of genetic instrumental variables are supposed to have a coincident relationship to these four markers. For liberal instrumental variable analyses, the SNPs are strongly affiliated with at least one of the iron status biomarkers (5 SNPs for serum iron, 9 SNPs for transferrin, 5 SNPs for ferritin, and 5 SNPs for transferrin saturation) at GWAS (P < 5×10−8). In addition, none of the IVs were associated with the risk of breast cancer (all P > 0.05).
Validation of selective instrumental variables
To ensure the validation of the selective SNPs as instrumental variables, all of the SNPs should be in accordance with three important assumptions . Firstly, the instrumental variables should be strongly associated with the exposure (iron status). Secondly, instrumental variables should not be associated with confounders between exposure (iron status) and outcome (breast cancer). Lastly, instrumental variables should have effect on the outcome (breast cancer) only through exposure (iron status) (Fig. 2).
For the purpose of minimising the possible weak instrumental variable bias, the expected F statistic above 10 was of sufficient strength in this study. The F statistic of the instrumental variable should be above 10, and this was used to impose restrictions on the possible bias of weakness . To limit the possibility of bias for population stratification, we selected both cohorts of exposure (iron status) and outcome (breast cancer) from European descent. We performed three methods to work out the problems of pleiotropy. Firstly, we assessed the SNPs which were known asthma risk factors associated with breast cancer. Secondly, to analyse MR estimates, we used two approaches which were the conservative approach (primary analysis) and the liberal approach (secondary analysis). Thirdly, unknown directional pleiotropy was assessed by the MR-Egger for MR estimates.
A two-sample MR was performed for testing the causal relationship between iron status and breast cancer. Moreover, breast cancer was subdivided into ER-positive breast cancer and ER-negative breast cancer groups. Conservative and liberal approaches were used in the MR analyses. In conservative approaches, we used the inverse-variance weighted (IVW)  approach to conduct MR analyses. In liberal approaches, we used IVW, MR-Egger regression , weighted median and simple mode  to estimate the effect of iron status on breast cancer risk. All of the data were selected from the OncoArray network and GIS consortium which were publicly-available GWAS data (Fig. 1). Due to the limited agreement in the use of publicly-available data, the relationship between instrumental variables and other potential confounders, such as exercise and drinking were difficult to assess. Hence, we further used the GWAS Catalog database (https://www.ebi.ac.uk/gwas) to search for other phenotypes related to the selected instrument SNPs, and manually remove these SNPs from the MR analysis to rule out possible pleiotropic effects.
For the sensitivity analyses, we used the IVW and MR-Egger to evaluate heterogeneity and displayed the analysis results by forest plot to estimate the value for each SNP and Cochran’s Q statistic [23, 24]. In addition, by deleting one SNP in turn and recomputing estimates of the overall instrument variable, a leave-one-out analysis was performed to identify SNPs which were overly affected. To ensure the MR analysis results were more robust, we also performed an MR-Egger statistical sensitivity analysis, which limited the pleiotropic effects of the instrumental variables. In MR-Egger regression, the intercept, as an indicator of the average pleiotropic deviation, is allowed to be freely estimated . For conservative approaches, the MR-Egger method regression was not performed to test pleiotropic effects, because only 3 SNPs were used, and 1 SNP was removed for LD, and this is not applicable to make estimates . For the same reason, the pleiotropic effects of iron and pigments on overall breast cancer cannot be tested.
All above analyses were performed by R, version 3.6.1.