Genome-wide association study of aromatase inhibitor discontinuation due to musculoskeletal symptoms

Aromatase inhibitors (AIs) are commonly used to treat hormone receptor positive (HR +) breast cancer. AI-induced musculoskeletal syndrome (AIMSS) is a common toxicity that causes AI treatment discontinuation. The objective of this genome-wide association study (GWAS) was to identify genetic variants associated with discontinuation of AI therapy due to AIMSS and attempt to replicate previously reported associations. In the Exemestane and Letrozole Pharmacogenetics (ELPh) study, postmenopausal patients with HR + non-metastatic breast cancer were randomized to letrozole or exemestane. Genome-wide genotyping of germline DNA was conducted followed by imputation. Each imputed variant was tested for association with time-to-treatment discontinuation due to AIMSS using a Cox proportional hazards model assuming additive genetic effects and adjusting for age, baseline pain score, prior taxane treatment, and AI arm. Secondary analyses were conducted within each AI arm and analyses of candidate variants previously reported to be associated with AIMSS risk. Four hundred ELPh participants were included in the combined analysis. Two variants surpassed the genome-wide significance level in the primary analysis (p value < 5 × 10–8), an intronic variant (rs79048288) within CCDC148 (HR = 4.42, 95% CI: 2.67–7.33) and an intergenic variant (rs912571) upstream of PPP1R14C (HR = 0.30, 95% CI: 0.20–0.47). In the secondary analysis, rs74418677, which is known to be associated with expression of SUPT20H, was significantly associated with discontinuation of letrozole therapy due to AIMSS (HR = 5.91, 95% CI: 3.16–11.06). We were able to replicate associations for candidate variants previously reported to be associated with AIMSS in this cohort, but were not able to replicate associations for any other variants previously reported in other patient cohorts. Our GWAS findings identify several candidate variants that may be associated with AIMSS risk from AI generally or letrozole specifically. Validation of these associations in independent cohorts is needed before translating these findings into clinical practice to improve treatment outcomes in patients with HR + breast cancer.


Introduction
Depleting systemic estrogen concentrations by inhibiting the CYP19A1 (aromatase)-mediated conversion of androgens to estrogens is an effective treatment strategy for hormone receptor positive (HR +) breast cancer. Third-generation aromatase inhibitors (AIs) (anastrozole, letrozole, or exemestane) are first-line treatment in postmenopausal patients with HR + breast cancer [1,2]. However, AI treatment is associated with characteristic toxicities that cause treatment non-persistence, which increases risk of breast cancer recurrence and death [3]. One of the toxicities that most often causes treatment discontinuation is AI-induced musculoskeletal syndrome (AIMSS) [4][5][6][7]. AIMSS is characterized by joint pain and stiffness, myalgias, carpal tunnel syndrome, tenosynovitis, and/or reduced grip strength and is experienced by approximately half of AI-treated patients [8][9][10].
AIMSS risk may be affected by clinical factors such as body mass index (BMI), prior chemotherapy (especially taxanes), and prior tamoxifen. However, these clinical factors alone do not accurately predict which patients will necessitate treatment discontinuation due to AIMSS [10][11][12][13]. Inherited germline variants in candidate genes including CYP19A1 [14][15][16] and ESR1 [17,18] have been reported to affect AIMSS risk [19]. Additionally, a genome-wide association study (GWAS) identified variants in TCL1A that may predict AIMSS; however, this has not been successfully validated in other cohorts [20,21]. Further pharmacogenetic discovery and replication are necessary to identify variants that may predict which patients are at increased risk of AIMSS and AI treatment discontinuation.
A substudy of the Exemestane and Letrozole Pharmacogenetics (ELPh) study was conducted to document the clinical course of AIMSS and identify clinical and genetic risk factors, primarily using candidate gene approaches [22,23]. We recently performed genome-wide genotyping within ELPh to validate that CYP2A6 genetics was a major determinant of letrozole pharmacokinetics [24]. The objective of the current study was to identify new genetic variants associated with discontinuation of AI therapy due to AIMSS in ELPh participants using a genome-wide approach and, secondarily, replicate previously reported pharmacogenetic associations.

ELPh patients and treatment
ELPh was a prospective, open-label study that enrolled postmenopausal women with stage 0-III HR + breast cancer considering AI therapy. AI could be given as front-line treatment or following completion of local (i.e., surgery and/ or radiation) and systemic (tamoxifen and/or chemotherapy) treatment [25]. Patients were randomized 1:1 to receive oral exemestane (25 mg/day) or letrozole (2.5 mg/day) for 2 years, and the arms were stratified by prior treatment with bisphosphonate, tamoxifen, and chemotherapy. Patients were enrolled from August 2005 to July 2009 from Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, and the University of Michigan Rogel Cancer Center. Due to the small number of non-white patients in the ELPh study, all analyses and results were restricted to self-reported white patients. All patients provided written informed consent prior to enrollment and the study was approved by the Institutional Review Boards of each site.

Collection of treatment discontinuation due to AIMSS
Reasons for discontinuation of AI treatment, including the specific side effect, were prospectively recorded by study coordinators at each site. The primary end point for this pharmacogenetic analysis was the time to discontinuation of the initial AI medication due to musculoskeletal toxicity (i.e., AIMSS), defined as arthralgias, myalgias, joint pain or stiffness, tendonitis, numbness or tingling, and/or carpal tunnel syndrome. We have previously reported analyses of clinical and candidate genetic predictors of this endpoint [4,23,26].

Genome-wide genotyping and imputation
Methods for germline DNA isolation, genome-wide genotyping, imputation, and quality control have been previously reported [27]. Briefly, pre-treatment germline DNA [28] underwent genome-wide genotyping on the Infinium Global Screening Array V2.0 (> 650,000 variants) at the University of Michigan Advanced Genomics Biomedical Research Core. All sample call rates were > 98%. Variants with call rates (< 95%) or large departure from Hardy Weinberg Equilibrium (p < 10 -6 ) were removed. Imputation was performed to generate allelic doses for > 16 million variants using the Michigan Imputation Server [29], SHAPEIT, and Eagle (v2.4), with removal of variants with imputation r 2 < 0.20.

Statistical methods
Cox proportional hazards models were used to test the association between each genotyped or imputed variant and time to discontinuation of AI therapy due to AIMSS. Patients who discontinued for reasons other than AIMSS were censored at the time of discontinuation. Associations were analyzed under an additive genetic model using the "gwasurvivr" package in R [Rizvi et al. 2019] and adjusted for clinical variables that were previously reported to be associated with discontinuation due to AIMSS, including age (≤ 55 years), prior taxane chemotherapy (Yes vs. No), pre-treatment pain score on the visual analog scale (continuous), and treatment arm (exemestane vs. letrozole) [4]. As a secondary analysis, we repeated all association analyses by AI treatment arm (exemestane versus letrozole) to identify AI-specific associations. For variants that were significantly associated in one AI treatment group but not the other, we also tested the significance of an interaction term in the combined cohort.
Variants with minor allele frequencies (MAF) less than 0.025 in the combined cohort or 0.05 in the treatment-specific arms were excluded. Unless noted otherwise, results are reported for imputed allelic dosages. P values less than 5 × 10 -8 were considered genome-wide significant. For the twenty-five variants in 11 genes that were previously reported to be associated with AIMSS risk [16,[18][19][20][21][30][31][32][33], p values less than 0.05 were considered statistically significantly. Where noted below, associations for a subset of these variants were previously reported in the ELPh cohort [22,23].
Similar to our previous GWAS, dbSNP and LDlink [34] were used to annotate variants of interest and assess patterns of linkage disequilibrium. Genetic variants that were significantly associated with discontinuation of treatment were also functionally interrogated using publicly available databases, including Genotype-Tissue Expression (GTEx) [35], RegulomeDB [36,37], and the mQTL database [38]. Unless specified otherwise, all analyses were carried out using a combination of R and Python programs.

Results
Of the 503 patients enrolled on the ELPh study, 400 selfreported white patients who initiated AI treatment and had genome-wide genotype data were included in the analysis ( Fig. 1). Demographic and clinical data for these patients are presented in Table 1 [28]. One hundred patients discontinued their treatment due to AIMSS, with a median time of 6.3 months (interquartile range of 8.3 months). Clinical variables previously reported to be associated with AIMSS risk in ELPh [4] had consistent findings in this analysis (i.e., increased AIMSS risk with younger age, prior taxane chemotherapy, higher baseline pain score, and exemestane treatment) (data not shown).
In the primary analysis of the combined cohort of 400 AI-treated patients, two imputed variants (rs79048288 and rs912571) were associated with AIMSS-related treatment discontinuation risk after adjustment for clinical covariates ( Table 2, Fig. 2). Patients carrying the "T" allele at rs79048288 had increased risk of AIMSS (HR = 4.42, 95% CI: 2.67-7.33, p = 7.69 × 10 -9 , Supplementary Fig. 1). Rs79048288 is an intronic variant within CCDC148, the gene encoding the coiled-coil domain containing 148 protein. According to GTEx or RegulomeDB, this variant is not known to affect expression of CCDC148 or any other genes or protein. The RegulomeDB score for this variant is 0.35 (scale 0.0-1.0), with higher scores indicating increased likelihood of being a regulatory variant. There was no interaction between rs79048288 and treatment arm (p = 0.30), indicating that the association between rs79048288 and AIMSS risk is not AI-specific.
Secondary GWAS analyses were conducted within each drug separately. In the letrozole analysis, two variants in (rs1324052 and rs74418677) were associated with increased  in TCL1A, were significantly associated with AIMSS risk in either the combined cohort or the exemestane-treated group (p value < 0.05) ( Table 3 and Supplementary Table 2). However, these associations have been previously reported in similar analyses of the ELPh cohort [22,23]. None of the variants that were previously reported to be associated with AIMSS that had not been previously tested in ELPh was associated with AIMSS in this analysis (Supplementary Table 2).
In the primary GWAS analysis, the T allele of the intronic rs79048288 variant within CCDC148 was associated with a 4.4-fold higher AIMSS risk and the G allele of the intergenic rs912571 upstream of PPP1R14C was associated with a 3.3-fold lower AIMSS risk. In silico analyses did not reveal any obvious functional impact of the intronic variant within CCDC148 (rs79048288), and little is known about the physiological function of this protein,

Fig. 2
Associations between all imputed variants and discontinuation of AI therapy due to AIMSS after adjustment for age (under 55 years), baseline pain score on visual analog scale, prior taxane chemotherapy treatment, and drug (exemestane). Two variants (rs79048288 and rs912571) were significantly associated with AIMSS risk in the combined cohort (p value < 5 × 10 -8 , indicated by red horizontal line) except that dysregulated expression has been reported in several cancer types [39,40]. In silico analyses revealed that the G allele of rs912571 was associated with higher expression of PPP1R14C in sun-exposed skin, indicating that this variant could be functionally consequential. This gene encodes a signal-transducing protein phosphatase, also referred to as KEPI, that is an inhibitor of myosin phosphatase and regulates smooth muscle contraction, providing further suggestive evidence that this variant could be functionally associated with AIMSS [41][42][43]. Two additional variants of interest, rs1324052 and rs74418677, were found to be associated with AIMSS in the letrozole-only analysis. In silico analyses suggest that rs74418677 is a regulatory variant that affects methylation of cg19272349 and expression of SUPT20H, also referred to as P38IP, a protein that is known to be involved in cell cycle regulation and cellular autophagy [44,45]. Intriguingly, a nonsense variant (p.Lys25X) in this gene was identified as the likely causal variant in hereditary rheumatoid arthritis [46]. We speculate that patients with subclinical arthritislike conditions are at increased risk of clinically overt musculoskeletal pain when administered letrozole, similar to the identification of hereditary neuropathy genes as predictors of taxane-induced neuropathy [47]. However, we are not aware of any prior studies that have investigated or reported an association for these or other polymorphisms in these genes (i.e., CCDC148, PPP1R14C, or SUPT20H) with AIMSS or any other AI treatment outcome.
Our attempted replication of variants previously reported to be associated with AIMSS found only two significant associations in our combined cohort, both of which have been previously reported in ELPh. The association between rs9322336 in ESR1 and AIMSS risk in exemestane-treated patients was previously reported by our group in 2013 [23].
Interestingly, another group recently reported that this variant was also associated with lower AIMSS risk in an independent cohort of 196 patients treated with letrozole or anastrozole [18]. While these two studies provide consistent and suggestive evidence of association, the association between rs9322336 and AIMSS risk awaits validation in additional studies. In fact, such a validation was recently attempted in the racially diverse ECOG E1Z11 cohort of anastrozole-treated patients using rs2347868, which is modestly correlated with rs9322336 (linkage disequilibrium R 2 of 0.31), and no association was detected for this variant or any of the other nine variants tested [48]. The other association for RANKL (rs7984870) was previously reported by our group in 2019 [22] and was itself an attempt to replicate a previously reported association by Wang et al. [49]. This variant was not included in the E1Z11 analysis and was not successfully replicated in our recent analysis of an independent cohort of 143 AI-treated women [50]. Taken together, there is weak evidence that any of these candidate variants in CYP17A1 [14], CYP19A1 [14][15][16], ESR1 [17,18], or TCL1A [20, 21] are associated with AIMSS risk.
A genetic biomarker of AIMSS could be useful in clinical decision-making. Patients carrying a variant that increases risk of AIMSS for all AI may be candidates for enhanced toxicity monitoring [51] or evidence-based interventions such as exercise, yoga, duloxetine, and acupuncture [52][53][54]. The identification of a genetic variant that increases risk for only one or two of the AI's would be even more clinically useful. These patients could be switched within the AI class, since all third-generation AI's are similarly effective [55] and switching within the drug class can improve treatment tolerability and persistence [4]. This study indicates that none of the previously reported genetic biomarkers is sufficiently robust for clinical use. The variants identified in our GWAS, particularly the rs74418677 variant that may increase risk of letrozole-induced musculoskeletal toxicity through expression of SUPT20H, should be prioritized for future validation studies, e.g., within the E1Z11 cohort [48]. Convincing evidence of clinical validity is warranted prior to further investigation of the causal mechanism for these candidate variants and is necessary before these variants can be used for clinical decision-making. This study had several strengths, including the use of hypothesis-agnostic genome-wide association approach in a large prospectively accrued cohort of patients with a well-documented clinical outcome. However, there were also some limitations that should be considered. Though the ELPh cohort is similar in size to a prior AIMSS GWAS [20], pharmacogenetic sample sizes like these are still orders of magnitude smaller than those used in disease genetics GWAS [56][57][58], which limits power to detect associations with smaller effect sizes or for uncommon variants (i.e., MAF < 0.025), or pathways that are enriched in the detected associations [59]. Finally, we were not able to attempt validation of an AIMSS polygenic risk score reported by another group due to a lack of information in their publication with which to recapitulate their 70-variant signature [60]. While we were not in a position to functionally characterize these variants in preclinical models, whether rs74418677 plays a role in regulating SUPT20H expression should investigated.
In conclusion, we identified several new variants that were associated with AIMSS risk in our cohort of AI-treated patients, including rs912571 (PPP1R14C) and rs74418677 (SUPT20H), that should be prioritized for attempted replication in independent cohorts of AI-treated patients. Successful validation of these associations is necessary prior to prospective studies that use genetic biomarkers to inform clinical decision making to reduce AIMSS and enhance AI treatment persistence to improve clinical outcomes in patients with HR + breast cancer.
Author contribution DLH was involved in designing this analysis and writing the manuscript. JAD and RMM conducted the statistical analysis. KMK cleaned the data for analysis and helped design the analysis. CLG helped generate the genetic data. ZD and TS helped design the initial trial and generate data. AMS, VS, DFH, and NLH designed the initial clinical trial and enrolled participants. JMR helped design this analysis and oversaw the genetic data generation and analysis. Code availability Code is available upon reasonable request to the corresponding author.

Declarations
Ethics approval This was a retrospective secondary analysis of a previously reported prospective clinical study. The prospective study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of each of the three participating Universities.
Consent to participate Informed consent was obtained from all individual participants included in the study.