Serum Iron Status and the Risk of Breast Cancer in Europeans: A 2-Sample Mendelian Randomisation Study

Background: Previous observational studies showed that there was a conict about serum iron status and the risk of breast cancer, which could have an impact on the prevention of breast cancer. Object: We used a two sample Mendelian randomisation (MR) study to explore the causal relationship between iron status and the risk of breast cancer. Method: To select single nucleotide polymorphisms (SNPs) which could be used as instrumental variables for iron status, we used the Genetics of Iron Status consortium. Moreover, we used the OncoArray network to select SNPs of instrumental variables for the outcome (breast cancer). The conservative instruments (SNPs were all consistent with iron status) and liberal instruments (SNPs was associated with at least one of iron status) were used in MR analysis. In the conservative instruments set we used an inverse-variance weighted (IVW) approach, and in the liberal instruments set we used the IVW, MR-Egger regression, weighted median and simple mode approach. Results: In the conservative approach, none of the iron status were statistically signicant for breast cancer or its subtypes. And in the liberal approach, transferrin was positively associated with ER-negative breast cancer by simple mode (OR for MR: 1.225; 95% CI: 1.064, 1.410; P=0.030). However, other iron statuses had no association with breast cancer or its subtypes (P>0.05). Conclusion: Our MR study, in the liberal approach, suggested that changes in the concentration of transferrin could increase the risk of ER-negative breast cancer, and other iron statuses had no effect on breast cancer or its subtypes. This could be veried in future studies. the pleiotropy for the iron


Introduction
The morbidity of breast cancer is increasing, and this can affect the life quality and economic state of families. As a cause of breast cancer, oxidative stress may be an important mechanism [1]. Iron is a necessary micronutrient for the human body [2]. It plays an important role in various physiological functions, such as electron transfer, oxygen transfer, immune function, DNA synthesis, and energy production [3,4]. Moreover, iron has a role in catalysing reactive oxygen species, which could increase the oxidative stress and activation of oncogenes, thus, it may affect the development of breast cancer [2,5,6].
Previous epidemiological studies have reported that higher levels of iron may be associated with a modestly increased risk of breast cancer [7,8]. In contrast, several other research studies have found that high levels of iron status were inversely related to breast cancer risk [9,10]. Moreover, several studies have suggested that iron status is not associated with breast cancer [11]. This data suggested that there was a debatable link between iron status and breast cancer risk.
In addition, confounding biases which existed in traditional epidemiological studies could affect the causal inferences. Previous studies have reported that confounding factors exist about the association between iron status and breast cancer risk, such as the post-natal living environment, behaviour habits, social status, environment factors, and so on [12]. Moreover, the observed link between breast cancer and iron status may be a reverse causal relationship. Thus, biases about confounding factors and reverse causal relationships lead to di culty in the designation of traditional observational epidemiological studies.
Mendelian Randomisation (MR) uses genetic variations that are closely related to exposure as instrumental variables, which could allow causal inferences to be made on the effect of exposure on outcome [13]. Because the alleles follow the principle of random allocation during the formation of fertilised eggs, the genetic variation associated with the outcome or exposure factors will not be affected by confounding factors or reverse causality [12].
As is widely known, there are no MR studies about the relationship between iron status and breast cancer risk. If we could infer iron status as a risk factor for breast cancer, it will be helpful for breast cancer prevention and treatment. In this research, we used an MR study to investigate whether iron status is related to the incidence of breast cancer. All of the data we used were selected from the publicly-available data of the Genome-Wide Association Study (GWAS).

Methods
We applied a two-sample MR study by summary data from the GWAS consortium (Fig. 1). The original researches had been informed and obtained by ethical approval. We used single nucleotide polymorphisms (SNPs) which were strongly related with whole serum iron status as instrumental variables, to explore the effect of iron status on breast cancer risk. By formatting data on the relationship of instrumental variables (SNPs) between iron status and breast cancer, the effect of systemic iron on breast cancer was estimated. To investigate whether iron status has a potential bias or mediator effect on breast cancer through other breast cancer risk factors, we analysed the relationship between SNPs related with iron status and breast cancer risk factors.

Data sources of instrumental variables-exposure
We searched summary data of the largest meta-GWAS for MR analysis from the Genetics of Iron Status (GIS) consortium, which involved 4 phenotypes of serum iron, transferrin, ferritin and transferrin saturation. These phenotypes of data from 11 discovery and 8 replication cohorts which comprised 48,972 individuals of European ancestry [14]. We performed genome-wide analyses within each cohort according to a uniform analysis plan, and performed adjustment by principal component scores, age, and other speci c searches. Furthermore, the thresholds of the population structure and quality control about each cohort were imputed into the score > 0.5, HWE (Hardy -Weinberg Equilibrium Test) ≥ 10 -6 , and MAF (minor allele frequency) > 0.01 [14].

Data sources of instrumental variables-outcome
The publicly available summary data about breast cancer were obtained from the largest meta-GWAS of the OncoArray network, which contains data about ve cancers: breast, ovarian, prostate, lung and colorectal cancer [15]. 122,977 cases and 105,974 controls comprised the breast cancer dataset, all of which were of European ancestry. Moreover, 69,501 cases were ER-positive breast cancer and 21,468 cases were ER-negative breast cancer.
The thresholds of the population strati cation and quality control about the GWAS analysis for the cohort was imputed into the score >0.3, HWE ≥ 10 -12 , and MAF > 0.01. The information has been previously reported [16,17].

Selection of instrumental variables
Instrumental variables of the MR analysis which were strongly signi cant on iron status (P < 5 × 10 -8 ) were selected from the GIS consortium (Table 1-4) [14].
All of the SNPs could be found in the OncoArray network and were in status of linkage equilibrium (all pairwise r 2 ≤ 0.01).
To select instrumental variables, we performed two analysis methods, which ware conservative and liberal instrumental variable analyses. For conservative instrumental variable analyses, 3 SNPs (rs855791, rs1800562, rs1799945) were strongly associated with the increasing concentrations of serum iron ferritin, serum iron and transferrin saturation, and decreasing concentrations of transferrin (P < 5 × 10 -8 ). Due to the concentrations of ferritin, serum iron and transferrin saturation were increased, concentrations of transferrin were decreased and systemic iron status would be increased [18]. Therefore, the iron status of genetic instrumental variables are supposed to have a coincident relationship to these four markers. For liberal instrumental variable analyses, the SNPs are strongly a liated with at least one of the iron status biomarkers (5 SNPs for serum iron, 9 SNPs for transferrin, 5 SNPs for ferritin, and 5 SNPs for transferrin saturation) at GWAS (P < 5×10 −8 ). In addition, none of the IVs were associated with the risk of breast cancer (all P > 0.05).

Validation of selective instrumental variables
To ensure the validation of the selective SNPs as instrumental variables, all of the SNPs should be in accordance with three important assumptions [19]. Firstly, the instrumental variables should be strongly associated with the exposure (iron status). Secondly, instrumental variables should not be associated with confounders between exposure (iron status) and outcome (breast cancer). Lastly, instrumental variables should have effect on the outcome (breast cancer) only through exposure (iron status) (Fig. 2).
For the purpose of minimising the possible weak instrumental variable bias, the expected F statistic above 10 was of su cient strength in this study. The F statistic of the instrumental variable should be above 10, and this was used to impose restrictions on the possible bias of weakness [20]. To limit the possibility of bias for population strati cation, we selected both cohorts of exposure (iron status) and outcome (breast cancer) from European descent. We performed three methods to work out the problems of pleiotropy. Firstly, we assessed the SNPs which were known asthma risk factors associated with breast cancer. Secondly, to analyse MR estimates, we used two approaches which were the conservative approach (primary analysis) and the liberal approach (secondary analysis). Thirdly, unknown directional pleiotropy was assessed by the MR-Egger for MR estimates.

MR analyses
A two-sample MR was performed for testing the causal relationship between iron status and breast cancer. Moreover, breast cancer was subdivided into ERpositive breast cancer and ER-negative breast cancer groups. Conservative and liberal approaches were used in the MR analyses. In conservative approaches, we used the inverse-variance weighted (IVW) [20] approach to conduct MR analyses. In liberal approaches, we used IVW, MR-Egger regression [21], weighted median and simple mode [22] to estimate the effect of iron status on breast cancer risk. All of the data were selected from the OncoArray network and GIS consortium which were publicly-available GWAS data ( Fig. 1). Due to the limited agreement in the use of publicly-available data, the relationship between instrumental variables and other potential confounders, such as exercise and drinking were di cult to assess. Hence, we further used the GWAS Catalog database (https://www.ebi.ac.uk/gwas) to search for other phenotypes related to the selected instrument SNPs, and manually remove these SNPs from the MR analysis to rule out possible pleiotropic effects.

Sensitivity analyses
For the sensitivity analyses, we used the IVW and MR-Egger to evaluate heterogeneity and displayed the analysis results by forest plot to estimate the value for each SNP and Cochran's Q statistic [23,24]. In addition, by deleting one SNP in turn and recomputing estimates of the overall instrument variable, a leave-oneout analysis was performed to identify SNPs which were overly affected. To ensure the MR analysis results were more robust, we also performed an MR-Egger statistical sensitivity analysis, which limited the pleiotropic effects of the instrumental variables. In MR-Egger regression, the intercept, as an indicator of the average pleiotropic deviation, is allowed to be freely estimated [25]. For conservative approaches, the MR-Egger method regression was not performed to test pleiotropic effects, because only 3 SNPs were used, and 1 SNP was removed for LD, and this is not applicable to make estimates [26]. For the same reason, the pleiotropic effects of iron and pigments on overall breast cancer cannot be tested.
All above analyses were performed by R, version 3.6.1.

Results
The instrument variables with iron status and breast cancer risk Tables 1-4 show the connections with SNPs of iron status which were used as instrumental variables in the liberal and conservative analyses. For the liberal analyses, Table 1 shows the characteristics of genetic variation related to iron concentration (3 SNPs for overall breast cancer, 5 SNPs for ER-positive breast cancer and ER-negative breast cancer); Table 2 shows the characteristics of genetic variation related to transferrin concentration (9 SNPs for ER-positive breast cancer, 8 SNPs for overall breast cancer and ER-negative breast cancer); Table 3 shows the characteristics of genetic variation related to ferritin(log) concentration (4 SNPs for ER-positive breast cancer, 5 SNPs for overall breast cancer and ER-negative breast cancer); and Table 4 shows the characteristics of genetic variation related to transferrin saturation concentration (3 SNPs for overall breast cancer, 5 SNPs for ER-positive breast cancer and ER-negative breast cancer). Next, we used instrumental variables in the conservative analyses, including 3 SNPs for iron, transferrin, ferritin, and transferrin saturation (rs1800562, rs1799945 and rs855791). F-statistics of all the instrument variables ranged from 40 (rs651007, ABO gene) to 3346 (rs8177240, TF gene), showing all of the SNPs were strong instruments (Table 1-4).

The genetic instrument and breast cancer risk
None of the individual SNPs were associated with BMI which was the confounding factor for breast cancer. In short, the variants of increasing BMI were not associated with breast cancer risk factors in the liberal or conservative approach analyses (all P > 0.05) ( Table 5).
Effect of iron status on breast cancer Fig. 3 shows the results of MR in estimating the association of genetically predicted iron status and the risk of breast cancer. The results showed ORs for breast cancer and their subtype per SD increase for every iron status biomarker. In the conservative approach, our results showed the 4 genetically predicted iron status were not associated with overall breast cancer, ER-positive breast cancer or ER-negative breast cancer risk (all P > 0.05). In the liberal analysis, we found a positive correlation between transferrin and ER-negative breast cancer by simple mode (OR: 1.225; 95% CI: 1.064, 1.410; P: 0.030). However, other iron statuses had no association with breast cancer or its subtypes (P > 0.05).

Sensitivity analyses
For MR estimates, we used the liberal instrument method, and the heterogeneity had no statistical signi cance in terms of overall breast cancer, ER-positive breast cancer or ER-negative breast cancer (all P > 0.05). In the IVW method, we found no evidence of heterogeneity for the associations of the 4 iron statuses (iron, transferrin, ferritin and transferrin saturation) and breast cancer risk (for overall breast cancer: Q 0.02, 7.59, 1.12, and 0.08; for ER-positive breast cancer: Q 4.99, 6.21, 2.07, and 3.16; for ER-negative breast cancer: Q 3.65, 7.68, 2.81, and 4.66, respectively; all p > 0.05). Moreover, using the MR-Egger method with the liberal approach, we did not identify aggregated directional pleiotropy for the 4 iron statuses with breast cancer (ER-positive breast cancer: intercept -0.005, 0.003, 0.008, and 0.012; for ER-negative breast cancer: intercept 0.010, 0.005, -0.001, and 0.017; for overall breast cancer: intercept 0.0005 (transferrin) and -0.0004 (ferritin); all P > 0.05) (Supplementary Fig. 1-3).
Nevertheless, the MR estimate did not radically change by using the leave-one-out analyses, although the estimated direction was different ( Supplementary  Fig. 4-7). Previous studies reported that rs174577 was also connected with LDL-C (LDL cholesterol), HDL-C (HDL cholesterol), TG (triglyceride) and TCHO (total cholesterol); rs4921915 was also connected with TG and TCHO; and rs1800562 and rs651007 were also connected with LDL-C and TCHO at GWAS signi cance [27]. Nevertheless, removing the 4 SNPs (rs174577, rs4921915, rs1800562, rs651007) did not change the pattern of results ( Supplementary Fig.  4-7).

Discussion
In this research, we conducted a two-sample MR study to estimate the causal relationship between iron status and breast cancer risk using the summary statistics data of the largest meta-GWAS from European populations. Our results showed that serum transferrin was positively associated with the risk of ERpositive breast cancer, but other iron statuses had no association with the risk of breast cancer or its subtypes.
As MR estimates may have a risk of pleiotropic effects [28], we searched for the possibility of this secondary effect by searching SNPs online. Our online search found 4 SNPs of rs1800562 at HFE, rs174577 at FADS2, rs651007 at ABO, and rs4921915 at NAT2 related to LDL-C, TCHO, and / or TG, which may be connected to breast cancer risk reported in previous studies [28,29]. Nevertheless, the removal of these SNPs did not produce a substantial change in the MR estimation results, which indicated that the MR estimation in this study is unlikely to be biased by blood lipids. In order to further test the robustness of our ndings in regard to potential pleiotropic effects, we increased the number of SNPs available for analysis by relaxing the IV's selection criteria. In sensitivity analysis, our results showed there were no statistical differences identi ed by conservative and liberal approaches. The slight difference in estimation and CI width between different MR analysis methods may be accidental, or it may be due to differences in measurement error instead of different estimation. In addition, in the pleiotropic test, the results of MR-Egger showed that no bias was detected. The public GWAS data about the exposures and outcomes both came from a European population cohort, which could reduce the population bias. Besides, our calculation results for leave-one-out MR are similar to previous main MR estimates. Taken together, the overall analysis and conclusion of our study may not be affected by serious bias.
As far as we know, this is the rst study using an MR study to investigate the association of iron status and breast cancer. Iron can catalyse reactive oxygen species, which will lead to increasing oxidative stress and the activation of oncogenes, thus affecting the occurrence and development of breast cancer [2,5,6]. In traditional observational studies, inconsistent results have been obtained between iron status and breast cancer. In some epidemiological studies, high iron levels may be related to a moderate increase in breast cancer risk [7,8]. Some studies have shown that high iron status is negatively correlated with breast cancer [9,10]. Other studies showed that the relationship between iron status and the risk of breast cancer had no statistical signi cance [11]. These studies suggested that previously observed inconsistent connections may be caused by different races and sample sizes. Moreover, reverse causality and residual confusion may exist in these studies. In these observational studies, unconsidered confounding or unknown risk factors may in uence the observed correlation between iron status and breast cancer.
In the causal estimation of serum transferrin and ER negative breast cancer, besides the other three traditional methods, we used a simple mode to make MR causal inference [OR: 1.225 (1.084, 1.366); P: 0.030]. The results showed that there was a positive correlation between transferrin and ER-negative breast cancer. It has been speculated that a high concentration of transferrin combined with transferrin receptors could increase the transport e ciency of iron, thus leading to an increase in the intracellular iron concentration, resulting in lipid peroxidation, gene mutation, DNA strand breakage and activation of oncogenes, thus leading to an increased risk of breast cancer [30,31]. In addition, the simple mode is a new method, the mode-based estimate (MBE), which obtains a single causal estimation from multiple genetic instruments. It provides a robustness to horizontal pleiotropy in a different manner to IVW, MR-Egger and weighted median methods. Compared with MR-Egger, the detection capability of this method is larger, but compared with IVW and weighted median, the detection capability of this method is smaller [17]. In addition, the causal estimates of simple mode in transferrin and ER-negative breast cancer are less signi cant, so the relationship between transferrin and ER negative breast cancer needs further study.
Breast cancer is a heterogeneous disease with histopathology and molecular subtypes determining different clinical prognosis and risk factors [32]. Previous studies suggested that obesity may increase the risk of breast cancer [33]. Furthermore, BMI has been related to iron concentrations and breast cancer risk [34,35]. Public genetic data on BMI was obtained from the GIANT consortium for 339,224 people of European descent [36]. None of the eleven SNPs were signi cantly associated with the risk factors (BMI) for breast cancer among the IVs we selected (Table 5). Finally, iron status and increased risk of breast cancer may be caused by common exposure factors. For example, in ammation affects iron status, increasing serum ferritin concentration and reducing serum iron concentration [27,37]. The occurrence of in ammation will promote the increase of tumour oxidative stress. Hence, in ammation may lead to an increase in iron status and the risk of breast cancer. However, due to the limited literature, further research is needed.
There are some advantages of our study. We searched summary data of the largest meta-GWAS from GIS consortium and OncoArray network. All of the data were extracted from European descent, which could reduce the bias of descent. In addition, we performed two analysis methods to select instrumental variables including conservative and liberal analyses, which could effectually guarantee the robustness of causal estimation.
However, there are some limitations in our present study. First, due to the limitation of publicly available GWAS databases, it is di cult to perform hierarchical analysis by age, sex and other categories in two combining databases of exposures and outcome. Second, we used the liberal instruments which can provide more power in the study. However, the study may in uence this research which is particularly vulnerable to pleiotropy. In this study, although we try to reduce pleiotropy, the bias due to the unknown biological function of SNPs concerned with iron status may be inevitable. The last limitation, is the sample size, which is relatively small to accurately infer the causal effect of iron status on breast cancer in our TSMR study, although they were selected from the largest cohort of the GWAS.

Conclusion
Our MR study may indicate that changes in the serum transferrin concentration could increase the risk of ER-negative breast cancer, whereas the other three iron statuses had no association with breast cancer. As the liberal instrument was relatively weak, these ndings need to be veri ed in further studies.  Per-allele logarithm of the odds ratios between breast cancer cases and controls.   Per-allele logarithm of the odds ratios between breast cancer cases and controls.  Figure 1 Related databases and analysis methods in MR analysis. The publicly available summary data of SNP phenotypes were obtained from the largest meta-GWAS databases. The effect of iron status on asthma was estimated using a conservative approach (IVs: only SNPs connected with the concentrations of ferritin, serum iron and transferrin saturation were increased, concentrations of transferrin was decreased, and systemic iron status would be increased) and a liberal approach (IVs: one of the SNPs was associated with breast cancer). In conservative instruments set, the inverse-variance weighted (IVW) method was used, and in liberal instruments set, the IVW, MR Egger regression, weighted median and simple mode methods were used. MR, Mendelian randomization; IVs, instrumental variables; SNP, single nucleotide polymorphism; MR Egger, Mendelian randomization-Egger regression method; BMI, body mass index.