Genetically predicted C-reactive protein associated with breast cancer risk: interrelation with estrogen and cancer molecular subtypes using a Mendelian randomization


 The authors have withdrawn this preprint due to erroneous posting.


Study population
We used data from the WHI dbGaP Harmonized and Imputed GWA Studies (GWASs) which were coordinated to contribute to a joint imputation and harmonization effort for GWASs. Those studies, under dbGaP study accession number (phs000200.v12.p3), consist of 5 GWASs (AS264, GARNET, GECCO, HIPFX, and WHIMS) ( Table 1) and encompass the 2 WHI representative study arms, Clinical Trials and Observational Studies, representing one of the largest studies on postmenopausal women in the U.S. to date. The detailed study designs and rationale are described elsewhere [26]. Healthy women were enrolled in the WHI study between 1993 and 1998 at 40 clinical centers across the U.S if they were 50-79 years old, postmenopausal, expected to stay near the clinical centers for at least 3 years after enrollment, and able to provide written informed consent. Participants were further eligible for the WHI dbGaP study if they had met eligibility requirements for data submission to dbGaP and provided DNA samples. Of a total of 16,088 women who reported their race or ethnicity as non-Hispanic white, we applied exclusion criteria (history of diabetes, genomic data quality control, less than 1 year follow-up of cancer outcomes, and diagnosis of any cancer type at screening), resulting in a total of 10,170 women ( Table 1 includes the number of participants in each study). They had been followed up through August 29, 2014, a mean of 16 years' follow-up and 537 (5% of the eligible 10,170 women) developed primary invasive breast cancer. The studies were approved by the institutional review boards of each WHI clinical center and the University of California, Los Angeles. Participants' demographic and lifestyle factors were collected at screening by self-administered questionnaires, and the collection process was monitored periodically for data quality assurance by the coordinating clinical centers. With 48 initially selected variables derived from literature review for their association with in ammation and breast cancer [27][28][29], we performed preliminary analyses including univariate and stepwise multiple regression analyses and a multicollinearity test and nally selected the following 15  Primary invasive breast cancer diagnosis as our outcome of interest was determined via a centralized review of medical charts and pathology and cytology reports by a committee of physicians. Cancer cases were coded using the National Cancer Institute's Surveillance, Epidemiology, and End-Results guidelines [35]. The time from enrollment to breast cancer development, censoring, or study end point was calculated and presented in years.

Genotyping And Instrumental Variables
Genotyped data was extracted for this study from the WHI dbGaP Harmonized and Imputed GWASs. Details of genotyping and imputation and the datacleaning process have been reported [19,26,36]. In brief, DNA was obtained from blood samples and genotyped using several GWAS platforms [26]. The genotypes were normalized to Genome Reference Consortium Human Build 37, and genomic imputation was performed via the 1000 Genomes reference panels [37]. SNPs were checked for harmonization with pairwise concordance among all samples across the GWASs. Through the initial and second data quality-control steps, SNPs were included by ltering on a missing-call rate of < 2%, a Hardy-Weinberg equilibrium of p ≥ 1E-04, and imputation quality [38], but individuals with unexpected duplicates, rst-and second-degree relatives, and outliers on the basis of principal components (PCs) were excluded.
We used 4 GWAS resources to select CRP-SNPs: one from our earlier GWAS using the WHI Harmonized and Imputed GWASs that examined CRP as a binary outcome, re ecting chronic low-grade in ammation status with > 3.0 mg/L of CRP [39,40]. Using the same population of genetic instruments and cancer outcomes may reduce bias from the MR analysis of different population structures between exposure and outcome. Of 82 SNPs in total, we selected 5 index/independent SNPs that were not in linkage disequilibrium (i.e., LD < 0.3) ( Table 1). The other 3 GWASs recently reported CRP SNPs analyzing CRP as a continuous variable that was naturally log-transformed (mg/L). They used different genotype and analytic strategies, such as HapMap-based 1000 Genomes imputed data analysis [20], genome-wide analysis of discovery panel combined with replication panel [22], and exome-wide common and low/rare coding variants search [21]. Of the total 89 SNPs from the 3 studies, 16 SNPs overlap; the analysis results from the more recent study were selected. With LD < 0.3, 61 independent SNPs (the 5 from our study plus 56 from other studies) were nally included in our analysis ( Table 1). The allele associated with higher CRP level was assigned to an alternative (risk) allele, whereas the other as a reference allele for all SNPs.

Statistical analysis
Three basic assumptions are necessary for a genetic variant to be valid in MR analysis: (i) the variant is robustly predictive of the exposure; (ii) the variant is independent of factors that confound the exposure-outcome association; and (iii) the variant is independent of the outcome, given the exposure and confounding factors of the exposure-outcome association (i.e., the variant has no pleiotropic pathways other than the exposure) [41]. We checked whether our data met the assumptions for a valid inference. The rst assumption was addressed by selecting only SNPs that were associated with CRP at genomewide signi cance. The inter-individual variability of CRP explained by all of the selected SNPs was about 6% [19,21,22] and, on the basis of sample size and number of instruments [42], the F-statistic was 108.15. Given the traditional threshold of 10 [43], we considered that our SNPs had su cient strength. The second and third assumptions cannot be fully empirically tested because they depend on all confounders, both measured and unmeasured [24]. For the horizontal pleiotropic effect, we excluded pleiotropic GWA SNPs (Additional le 1: Table S1) whose relevant phenotype can be associated with CRP exposure and breast cancer outcome, including obesity (BMI and WHR), diabetic syndromes and diabetes (fasting glucose and insulin, post 2-hour glucose, and type 2 diabetes [T2DM]), and dyslipidemia (low-/high-density lipoprotein, total cholesterol, and triglycerides) [16,44]. For our 5 GWA SNPs, there was no overlap with those pleiotropic SNPs, while 4 SNPs (in relation to BMI, post 2-hour glucose, T2DM, and dyslipidemia) were excluded from the 56 outside GWA SNPs. In addition, we adjusted for potential confounding factors (listed in the Lifestyle variables subsection, above) in the analysis for the association between SNPs and breast cancer risk. We further conducted a MR-Egger regression analysis to test for vertical directional pleiotropy (the third assumption) and checked whether the pleiotropic SNPs were skewed in one direction rather than being balanced [45].
We conducted the MR analysis separately according to the CRP variable type analyzed in the GWASs: binary chronic in ammation status or continuous levels.
In addition to a traditional inverse-variance weighed (IVW) method [46], we employed recently developed MR approaches such as WM/PWM estimates [23,24] and MR G⋅E interactions [25]. The WM estimate allows up to 50% of genetic variants' invalidity (i.e., the assumptions violated) and provides a more consistent estimate of the causal effect if the precision of the individual estimates varies considerably, by assigning a weight to the ordered estimate and establishing linearity between neighboring estimates [24]. When the estimates from invalid instruments are not balanced about the true effect, the WM, however, is inappropriate; the PWM estimate can minimize this issue by down-weighting outlying genetic variants with heterogeneous estimates [24]. The PWMs may also be a better parameter if there is directional pleiotropy. In each MR analysis, we performed exploratory strati ed analyses de ned by potential effect modi ers, including lifestyle factors and breast cancer molecular subtypes. Further, we calculated a corrected MR estimate by taking into account the interaction of genes with selected obesity-related factors (BMI, WHR, MET, % calories from SFA, alcohol, and depressive symptoms) and sex hormone-lifestyles (E-only, E + P, and OC use) by applying the MR G⋅E method [25]. We created a weighted genetic score (GS) for that analysis using a polygenic additive model [47] with the 56 CRP-SNPs from previous GWASs that analyzed CRP as a continuous variable. We then rescaled the GS to the unit of CRP by performing a linear regression among women without breast cancer; by using β0 (slope) and β1 (intercept), we computed the scaled CRP-GS (= β0 + β1⋅GS), where 2 GSs were perfectly corrected (r = 1.0) [23,47].
In the MR analysis for the 5 SNPs from our earlier GWAS, we adjusted a correlation between CRP phenotype and breast cancer in which exposure and outcome were evaluated within the same population. For parameters necessary for the MR analysis, the change in CRP (> 3.0 vs. ≤ 3.0 mg/L) in log-odds and the mean change in log-transformed CRP per allele were obtained from our and the 3 other previous GWASs, respectively. The effect of genetic variants on breast cancer risk was calculated in our study population by using Cox regression with adjustment for (i) age and 10 PCs and (ii) lifestyle covariates in addition to age and 10 PCs. The assumption test was conducted via a Schoenfeld residual plot and ρ evaluation. The Cox results from each of our 5 GWASs were combined using xed-effect meta-analysis. The nal MR results were reported as risk ratios (particularly, hazard ratios [HRs]) and 95% con dence intervals (CIs) for the change in breast cancer risk per unit increase in log-odds or log-transformed CRP.

Results
The total of 61 GWA SNPs identi ed for their association with CRP concentrations are presented in Table 1; 5 were from our GWAS analyzing binary CRP outcomes (> 3.0 mg/L vs.  Table S2, including results from the rst stage of adjustment for age and 10 PCs and from the second stage of adjustment for lifestyle covariates in addition to age and 10 PCs. The pooled analysis for the genetic instruments combining 5 SNPs (Fig. 1A) and 56 SNPs (Fig. 1B)  We performed strati cation analysis de ned by obesity-related factors, lifestyles, a family history of breast cancer, depressive symptoms, and exogenous estrogen use. In the MR analysis of our 5 CRP-SNPs, a 1-unit increase in the genetically predicted chronic in ammation (de ned as > 3.0 mg/L of CRP) was associated with approximately 80% decreased risk of breast cancer among E + P users, particularly in women with < 5-years' use of E + P (PWM-HR 1st − stage = 0.17, 95% CI: 0.05-0.63) ( Table 2). The reduced effect on breast cancer risk was more profound in stage 2 of the MR analysis, which used the SNP-cancer association adjusted for lifestyles in addition to age and 10 PCs. Likewise, genetically determined chronic in ammation status was associated with reduced risk for breast cancer among E-only users. In particular, the cancer risk was reduced by 50% in E-only users for < 5 years and more substantially decreased in longer-term users, showing a dose-response relationship (Fig. 2). Of note, this reduced risk of cancer in E-only users was present only in the rst MR stage. The MR-Egger test showed no signi cant evidence of directional pleiotropy across the tested associations (Table 2 and Fig. 2). * Covariates adjusted in the analyses for the association between genome-wide SNPs and breast cancer risk include education; annual family income; family history of breast cancer; body mass index; waist-to-hip ratio; physical activity; depressive symptoms; number of cigarettes per day; dietary alcohol in g/day; % calories from SFA/day; age at menopause; duration of oral contraceptive use; and duration of exogenous estrogen-only use. ¶ The Mendelian randomization estimate (except weighted/penalized weighted medians) was adjusted for a correlation between CRP phenotype and breast cancer risk within the same population. † Heterogeneity in estimates among genome-wide SNPs was evaluated with Cochran's Q test with xed effects.
Similarly, in the MR analysis of the other 56 SNPs, a 1-unit increase in the log-transformed genetically elevated CRP was associated with about 20% reduced risk for breast cancer among women who had used OC for < 5 years (Fig. 3); the estimates remained consistent in the second stage of MR analysis after excluding pleiotropic SNPs. However, a different pattern was observed among longer-term past OC users. Genetically elevated CRP levels were strongly associated with increased breast cancer risk in past OC users for ≥ 5 year (IVW-HR 2nd − stage = 2.14, 95% CI: 1.11-4.10, after exclusion of pleiotropic SNPs  We further performed MR G⋅E analyses to estimate the corrected MR estimates by incorporating the G⋅E interactions with the selected obesity and sexhormone lifestyle factors; none of the estimates reached statistical signi cance and no pleiotropic effect of the estimates was detected (Additional le 1: Table S5).

Discussion
We performed MR analyses for genetically predicted CRP levels (> 3 mg/L vs. ≤ 3.0 mg/L, indicating chronic low-grade in ammation or a natural logtransformed 1 mg/L increase) in association with postmenopausal breast cancer risk and showed that genetically elevated CRP levels increased the risk for breast cancer in women with particular lifestyle factors and breast cancer subtypes. MR ndings, if the modeled genetic instruments are not linked to the outcome via any alternative pathway and are not associated with confounders of the exposure-outcome association, may be comparable with those of randomized clinical trials [46], thus providing a robust causal inference. Our MR analysis reduced the pleiotropic effect by identifying a wide range of confounding factors that are connected to the CRP-breast cancer pathway and accounting for them in the analysis of the genetic instrument-cancer outcome association, and by removing the pleiotropic SNPs that may confound the association between CRP and breast cancer. Our MR estimates, including traditional MR estimate. To our knowledge, this study is the rst to report the causal effect of genetically elevated CRP levels on increased breast cancer risk in an MR framework.
Most previous epidemiological studies examining the measured CRP levels showed no signi cant association with breast cancer risk [11][12][13], despite the potential role of CRP in breast cancer carcinogenesis both systemically and locally. As pointed out in the previous studies, reverse causation could not be ruled out. For example, the chronic in ammatory status with elevated CRP levels may be involved in cancer cell initiation and growth [13,48], but it may also be the consequence of tumor progression, as shown by the in ltration of CD4 + and CD8 + regulatory T lymphocytes in breast cancer tissues that is associated with poor cancer survival [49]. Thus, several studies have showed the effect of high CRP levels as leading to a worse prognosis after the diagnosis of breast cancer [6,13]. In addition, CRP levels are easily in uenced by various modi able and non-modi able factors such as physiologic and pathologic stimuli, re ecting the inconclusiveness of one or a few time measurements.
An MR study is not likely to be susceptible to reverse causation and potential confounding owing to a random assortment of the genetic alleles at the time of gamete formation before the disease onset. Further, MR can allow the assessment of a long-standing effect of CRP on cancer development. Until now, we have found only one published MR study on the CRP phenotype and breast cancer risk [50]. That study used 4 SNPs in the CRP gene with 9 genotype combinations and adjusted for lifestyle confounding; no signi cant association was reported. Our study utilized 61 CRP-associated SNPs from the most recently updated GWASs and, with no evidence of violation for weak genetic instruments and directional pleiotropy, we conducted MR analyses with 2 separate stages of lifestyle adjustments.
We further conducted strati cation analyses by obesity, sex-hormone and breast cancer subtype to determine whether these lifestyle and pathologic factors modi ed the association between genetically elevated CRP and breast cancer risk. We detected a substantially reduced risk of breast cancer in relation to CRP in E-only, E + P, and past OC users, but only among relatively short-term users (< 5 years). In particular, longer-term E-only users (≥ 5 years) had more profound CPR-decreased cancer risk, in dose-response fashion. This nding is not align with our other nding of CRP-increased ER/PR-positive breast cancer risk. It may re ect the different effect of estrogen on cancer risk when it is taken orally, and our nding is supported by previous studies [51,52] showing that oral intake of estrogen, due to its rst-pass metabolic effect of suppressing hepatic production of IGF-I, reduced insulin-like growth factor-I (IGF-I), a carcinogenic promoter which it is partly induced by CRP, thus suggesting the protective role of exogenous estrogen in postmenopausal breast cancer risk [53].
In contrast, E + P users have different levels of IGF-I and cancer risk owing to non-progesterone-like effects (i.e., different effects from natural progesterone), contrasting with the hepatocellular effect of oral estrogen [54]; but, the mechanism is not completely clear. In addition, synthetic progestin has an a nity for androgen and mineralocorticoid receptors, leading to cell proliferation and anti-apoptosis, contributing to breast carcinogenesis [55]. In our MR study, only longer-term users of E + P (≥ 5 years) had CRP-increased risk for breast cancer, implying an effect of long-term cumulative exposure to synthetic progestin, although this association did not reach statistical signi cance; that result warrants future studies with a larger population for more de nitive results. Similarly, the women in our study who had used OC for ≥ 5 years in the past had a strongly CRP-increased risk for breast cancer. This result is consistent with previously published ndings from other studies [56,57] that showed increased breast cancer risk with long-term duration of OC use. The use of OC, especially those containing E + P, increases the proliferation of human breast epithelial cells [55], which may partially support our ndings of increased cancer risk in long-term OC and E + P users. Our data sources had no information about the type of OC preparation our participants had taken; this calls for a future study that examines the potentially different effects on cancer risk according to speci c OC formulations.
In addition, genetically elevated CRP in our study was strongly associated with increased risk for ER/PR-positive breast cancer, which is consistent with recent ndings [58], suggesting a role of in ammation in obesity-induced hormone receptor-positive breast cancer development. Excessive adiposity characterized by adipocyte hypertrophy leads to chronic in ammation of adipose tissues, forming CLSs; the in amed breast CLSs in turn produce in ammatory molecules such as CRP and other cytokines, leading to the activation of nuclear factor-kB that elevates aromatase production, thus logically driving hormone receptorpositive tumor growth [58]. We also found that genetically predicted CRP was associated with increased risk of HER2/neu-negative breast cancer. In accord with the results of a few previous studies [48,59], our nding supports a potential mechanism connecting in ammation to HER2/neu-negative breast cancer, in which pro-in ammatory markers trigger JAK/STAT signaling pathways, activating genes responsible for cell proliferation and angiogenesis, and those aberrant pathways then contribute to an immunosuppressive tumorigenic microenvironment, leading to more aggressive breast cancer such as basal-like tumors [13,48,59]. potential clinical use of CRP to predict speci c cancer subtypes and CRP-in ammatory marker-targeting interventions to reduce breast cancer risk in postmenopausal women.

Declarations Ethics approval and consent to participate
The Institutional Review Boards of each WHI participating clinical center and the University of California, Los Angeles, approved this study. Participants provided written informed consent at enrollment.

Competing interests
All authors declare no potential con ict of interest. Authors' contributions SYJ, JP, ES, MP, HY and ZZ designed the study. SYJ performed the genomic data QC and the statistical analysis and interpreted the data. JP and ES supervised the genomic data QC and analysis. MP HY participated in the study coordination and interpreting the data. SYJ secured funding for this project. ZZ supervised the project. All participated in the paper writing and editing. All authors have read and approved the submission of the manuscript.  Forest plot of MR estimates by E-only use. The plot shows the effects of genetically predicted chronic in ammation status (CRP > 3.0 mg/L) on breast cancer risk in E-only user subgroups, presented as the 95% CIs (red lines) of the estimates and the penalized weighted medians (percentages proportional to the size of the blue squares). The MR estimates were based on the SNP-breast cancer association that was adjusted for age and 10 principal components only. (CI, con dence interval; E, exogenous estrogen; HR, hazard ratio; MR, Mendelian randomization; PWM, penalized weighted median.) Figure 3 Forest plot of MR estimates by past OC use. The plot shows the effects of genetically predicted CRP phenotype on breast cancer risk in OC user subgroups, presented as the 95% CIs (red lines) of the estimates and the inverse-variance weights (percentages proportional to the size of the blue squares). The MR estimates were based on the SNP-breast cancer association that was adjusted for 1) only age and 10 principal components (PCs) in the rst stage and 2) lifestyle covariates in addition to age and 10 PCs in the second stage. (CI, con dence interval; CRP, C-reactive protein; HR, hazard ratio; IVW, inverse-variance weighted; MR, Mendelian randomization; OC, oral contraceptive; SNP, single-nucleotide polymorphism.)

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.