Exposure data sources
CRP data sources
Summary data for CRP-associated single nucleotide polymorphisms (SNPs) were acquired from the largest GWAS study (including 204,402 European individuals), which was a meta-analysis of HapMap and 1000 Genomes imputed data that included 88 studies. The analysis excluded data from individuals with autoimmune diseases, those taking immunomodulatory agents, and those with CRP levels differing from the mean value by 4 SD or more. All the participating studies were approved by their respective institutional review boards (Ligthart et al. 2018).
IL-6 and IL-1β data sources
Summary data for IL-6-and IL-1β-related SNPs were obtained from the INTERVAL study, including 3,301 normal subjects, nested based on 50,000 blood donors’ genetic biological resources at 25 centers from England. The Affymetrix Axiom UK Biobank Genotyping Array was used for genotyping. Each subject provided informed consent for participation. This work was approved by the National Research Ethics Service (Sun et al. 2018).
IL-1α data sources
Summary data for IL-1α-associated SNPs were derived from the genetic study of up to 700 maternal and infant cytokines/chemokines, of which 42% were European ancestry (Traglia et al. 2018). The study used the Affymetrix Axiom (Affymetrix 2011) EUR array for genotyping. All the studies were approved by the institutional review board.
Summary data on BC
According to the expression of estrogen receptor (ER), BC is divided into ER + BC and ER-BC, which exhibit different biological behaviors. Thus, they will be discussed in subgroups. Summary data for BC-associated SNPs, including 122,977 cases (69501 ER + cases and 21468 ER-cases) and 105974 controls, were acquired from the Breast Cancer Association Consortium (BCAC) (Michailidou et al. 2017). The study project was from European ancestry. Genotyping was performed using the 1000 Genomes Project (Phase 3) reference panel. SNPs with minor allele frequency (MAF) ≤ 0.5% and imputation quality score < 0.3 were excluded. All participating studies were approved by the appropriate ethical review boards. Moreover, all subjects provided informed consent to participate.
Statistical Analyses
The present study is a TSMR study. As shown in Fig. 1, the research of this study is based on three hypotheses: (1) IVs and exposure factors are strongly correlated (association); (2) IVs and confounders are not correlated (independence); and (3) IVs and the outcome are not directly related, and the effect on the outcome can only be demonstrated by exposure (exclusion restriction criterion). In a TSMR study, exposure-related IVs and outcome-related IVs are obtained from two independent samples (e.g., a GWAS of exposure-related SNPs and a GWAS of outcome-related SNPs) from the same ethnic group. Compared to a single-sample Mendelian randomization study, TSMR involves a larger sample size and thus can obtain greater power. Currently, TSMR is widely used because of the availability of public data from a large number of GWAS collaborative groups worldwide.
To select the IVs that are strongly correlated with exposures (first MR assumption), this study implemented quality control for selecting potentially helpful SNPs. First, a genome-wide significance analysis was conducted to identify SNPs related to exposures (IL-1α, IL-1β, IL-6, and CRP) (p < 5e-08). Second, the exposure-related helpful SNPs should not be within linkage disequilibrium (LD), because SNPs strongly related to LD possibly induce bias in outcomes. The present work implemented a clumping procedure (r2 < 0.001, threshold of distance = 10000 kb). Third, the low instrumental bias was assessed by F statistic, where F statistic of < 10 confirms that the selected genetic variants do not meet the strong correlation between IVs and exposure (Staiger and Stock 1994). F statistic was calculated using the formula: Beta2/SE2, where beta and se represent genetic association with the exposure and standard deviation, respectively. In addition, when choosing the IVs for exposures, the following conditions also need to be considered: First, we eliminated SNPs whose minor allele frequency (MAF) was lower than 0.01 in the process of extracting outcome information from IVs. Second, we eliminated recurrent SNPs from those chosen helpful SNPs during harmonization for ensuring that the effect of SNPs on exposure and the effect of identical SNPs on outcome were corresponding to the identical allele.
For CRP-associated SNPs, inverse-variance weighted (IVW), weighted median, and MR-Egger approaches were used for inferring causality, the most important of which was the IVW method. IVW is a method for meta-aggregating the effects of multiple loci in the process of MR analysis(Burgess, Butterworth, and Thompson 2013). The application of IVW is based on the concept that all SNPs are valid IVs and are completely independent of each other. On the basis of IVW, we additionally modified the MR-Egger approach. Compared to the IVW method, the core of this method is to consider the existence of the intercept term in the weighted linear regression. Simultaneously, the intercept term is used to measure the pleiotropy among the IVs, and the slope estimates the causal effect in an unbiased manner (Bowden, Davey Smith, and Burgess 2015). The weighted median method is the median values obtained by sorting all individual SNP effect values by weight. Weighted median can be a robust estimate when at least 50% of the genetic variation meets the MR core assumptions (Bowden et al. 2016). For IL-6-, IL-1α-, and IL-1β-associated SNPs, we used the Wald ratio for determining estimates for one SNP (Burgess, Small, and Thompson 2017).
Sensitivity analysis
To satisfy independence of the MR study (second MR assumption), after a comprehensive lookup of the PhenoScanner (http://www.phenoscanner.medschl.cam.ac.uk/), the significant associations of the selected SNPs with BC risk factors (P < 1e-08) were excluded, including BRCA1/2 gene mutation, childbearing, breastfeeding, mammographic density, height, obesity, alcohol intake, physical inactivity, age of menarche, and menopause (Britt, Cuzick, and Phillips 2020).
Experimental conditions, analytical platforms, and different study subjects may contribute to heterogeneity, leading to biased causal effect estimates. To meet the exclusion restriction criterion of the MR study (third MR assumption), based on MR-Egger and IVW analyses, our study used the Cochran Q statistic to conduct a heterogeneity test for detecting heterogeneity in causal estimate. P > 0.05 indicated the absence of heterogeneity within those enrolled IVs, and thus, we applied the fixed-effects MR estimates in such models; else, we used the random-effects MR estimates.
When MR-Egger, IVW, Wald ratio, and weighted median approaches were used to investigate causality, there may be other unknown confounders that were not conducive to biased estimates of genetic diversity and causal effects. We determined horizontal pleiotropy based on the intercept of MR-Egger and its p-values. If the MR-Egger regression intercept was close to 0 (< 0.1) and P > 0.05, we considered that there was no evidence of horizontal pleiotropy in the test. Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) method was used for better validating potential outliers and horizontal pleiotropy (Verbanck et al. 2018). We set 5000 distributions as the parameter during MR-PRESSO analyses.
After the assessment of heterogeneity and horizontal pleiotropy, we performed a sensitivity analysis of the eligible SNPs by using the leave-one-out method. This method aims to re-estimate the causal effect by sequentially removing one SNP at each time in order to determine the SNP that greatly affects causal effect estimates. The results are considered to be reliable if the overall error bars do not change significantly after excluding each SNP.
Statistical power
According to the online calculation method (https://shiny.cnsgenomics.com/mRnd/) of the MR statistical power reported by Brion et al. (Brion, Shakhbazov, and Visscher 2013), α(type I error) was 0.05 and K (case composition ratio) was 54%. To calculate R2 for the SNP instrument, we used the following formula: 2×EAF×(1-EAF)×beta2(Papadimitriou et al. 2020), where EAF and beta represent effect allele frequency and predicted genetic impact on exposure, respectively. R2xz (proportion of variance associated with CRP explained by SNPs) was 3.98%, while R2xz (proportion of variance associated with IL-6 explained by SNPs) was 1.06%. Based on the meta-analysis of clinical observational trials, the RR value of CRP for BC in the European population was OR = 1.12 (Guo et al. 2015), and the risk ratio of IL-6 for BC was OR = 1.13 (Il'yasova et al. 2005). We used the number of BC summary data (n = 228,951, 122,977 cases, and 105974 controls) as the sample size. Our MR study had a high power (≥ 80%), and the results are shown in Supplementary Table S3. Statistical power for IL-1α and IL-1β was not calculated because of the lack of clinical studies on the association between IL-1α and IL-1β levels and BC risk.
The present study used the GWAS data to verify whether there was a causal relationship between CPR, IL-1α, IL-1β, and IL-6 levels and BC risk. All data were available online, and the data were analyzed with TwoSampleMR package version 0.5.6 and R version 4.1.0. P < 0.05 was considered to be statistically significant. In multiple testing, the link between exposure and outcome phenotype was considered to be statistically significant when the Bonferroni-corrected P-value was < 0.05/N (N represents exposure factor number).