A data driven approach to identify ZM-447439 as a potential repurposed drug to overcome tamoxifen-resistance in breast cancer


 Background: Tamoxifen is the most commonly used endocrine therapy (ET) for Breast Cancer (BC) patients expressing estrogen receptors (ER), representing almost 70% of all cases. However, one third of early stage BC patients demonstrate endocrine resistance to tamoxifen over the initial five-year treatment period, prompting significant research effort on identifying other drugs to alleviate tamoxifen resistance in ER + BC patients.Methods: We combined a total of 229 tamoxifen resistant and 363 tamoxifen sensitive tumors and tumor cell lines, to select genes that showed consistency as either up- or down- regulated differential expression among these datasets. We use these genes as the input to identify the drugs and compounds in Library of Integrated Network-Based Cellular Signatures (LINCS) database that can reverse the expression of these genes. With an innovative and comprehensive scoring system, we performed quality assessment on the results and prioritized drugs. Finally, we validated top five drugs using in vitro cell culture experiments.Results: We identified that ZM-447439, an aurora kinase inhibitor, can reverse the gene signatures associated with tamoxifen resistance. This was accomplished by a novel bioinformatics approach and scoring system, screening from over 20,000 small molecules in the Library of Integrated Network-Based Cellular Signatures (LINCS) database. The in vitro cell culture experiments showed that ZM-447439 had high potency to reverse gene expression in the tamoxifen-resistance BC cell line (MCF7-RR).Conclusion: We demonstrate the utility of a bioinformatics repurposing approach to identify candidate drug ZM-447439 with the potential to treat patients with acquired tamoxifen resistance.


Introduction
Breast cancer (BC) is one of the leading causes of cancer-related deaths in the United States (1). According to estimates reported by the American Cancer Society, 40,610 women died from BC in 2017.
Approximately 70% of breast tumors express the estrogen receptor (ER), which binds and mediates many of the effects of the estrogen hormone (2). Inhibiting ER signaling using ER antagonists or anti-estrogens such as tamoxifen, the selective ER down-regulator fluvestrant, or the aromatase inhibitors such as letrozole, anastrozole, or exemestance have been reliable therapeutics since 1970 (3). In the adjuvant setting, tamoxifen reduces BC recurrence by 50% and mortality by 31% (4). However, around 40% of patients who respond to tamoxifen develop de novo resistance, which is not fully understood (5).
It is important to develop novel drugs to overcome tamoxifen resistance. Compounds that are able to restore endocrine sensitivity is a possible solution, based on the deep understanding of the molecular mechanism that drive endocrine resistance (6). For example, crosstalk between the ER and mammalian target of rapamycin (mTOR) signaling pathways is thought to play a role in the development of resistance to endocrine therapy (7), and combined inhibition of these pathways using everolimus and temsirolimus has been used to restore endocrine sensitivity (8). However, such combinatorial approaches have many challenges such as such as consumed time and cost (6). In all, overcoming tamoxifen resistance still represents a clinical unmet need.
Recently, repurposing drugs approved for other applications have been suggested as a solution to overcome tamoxifen resistance (9). This approach saves developmental costs and provides a much shorter time for drug approval and launch (10). In this paper, we have developed a novel bioinformatics pipeline to identify potential drug candidates to overcome tamoxifen resistance, by screening over 20,000 small molecules and compounds in the Library of Integrated Network-Based Cellular Signatures (LINCS) database (11). We identified one kinase inhibitor, ZM-447439, that resensitized a tamoxifen resistance-BC cell line (MCF7-RR), which may be possible for treating patients with tamoxifen resistance.

Datasets
We used differential expressed genes (DEGs) obtained from five datasets from different platforms (12)(13)(14)(15)(16) that study the change of gene expression in the tamoxifen resistance tumors/cell lines, primarily available through public domain Gene Expression Omnibus (GEO) database. The detailed description of these datasets is described below:

Dataset 1: This dataset contains gene expression values of early stages BC ER+ tumors diagnosed in John
Radcliffe, Oxford, UK and measured by Affymetrix HG-U133A microarray platform (13 Tumors from each of the above three microarrays datasets were classified as tamoxifen resistance/tamoxifen sensitive if patients relapsed/still respond to tamoxifen within 5 years respectively.

Dataset 4 and 5:
In these two independent studies, tamoxifen resistance and tamoxifen sensitive BC cell lines were sequenced using next generation sequencing (NGS) technology to identify the DEGs responsible for tamoxifen resistance (15,16). These studies identified the 272 DEGs with 141 upregulated and 131 downregulated genes.

Up-and down-regulated genes
We catalogued the DEGs from the above data resources which are available as supplementary tables S 1-3 and S7-8. The list of differentially expressed genes (DEGs) for each dataset was obtained from the supplementary files of the corresponding studies without further processing.

Pathway enrichment analysis
To find the functions and pathways the DEGs have in common, we performed pathway and gene ontology enrichment analysis using ConsensusPathDB resource (17) with the following selection criteria: adjusted p value <0.05 and minimum overlap with input list >10 genes.

Pathways/Gene Ontology (GOs) scoring:
We developed a pathway scoring system to select consistently enriched genes among the DEGs. We selected up and down regulated genes that are involved in highly scored pathways/GOs. The pathway scoring system is based on three metrics: the enrichment score (S 1), up/down regulated score (S2), and the overlap score (S3).
For each pathway/GO i, we calculated its total score ( ) as follows: Where is the total score of pathway/GO i 1 is the enrichment score of pathway/GO i is the number of DEGs involved in pathway/GO i is the total number of genes annotated in pathway/GO i 2 is the up regulated score which equals the ratio between the number of up-regulated DEGs to the total number of DEGs involved in pathway/GO i is the number of up-regulated DEGs in pathway/GO i is the number of down-regulated DEGs in pathway/GO i 3 is the overlap score which represents how many DEGs in pathway/GO i overlap with other pathways/GOs (j=1,…,.,.m) is the number of overlapped DEGs between pathway/GO i and pathway/GO j is the number of DEGs in pathway/GO j is the number of pathways and GOs term (85 pathways +109 GOs=194)

CLUE query system for LINCS data search
CLUE query system: To find the drugs that target selected DEGs, we used the cloud-based platform called CLUE (https://clue.io/) for the analysis of perturbational datasets generated using gene expression L1000 (11). The input of CLUE query app is a list of up-and down-regulated genes which represents the phenotype (tamoxifen resistance for this study Weighted connectivity score: Our input into the CLUE query system was a list of up (qup) and down (qdown) regulated genes that were compared to the reference database (Touchstone, r) in LINCS using the weighted Kolmogorov-Smirnov enrichment statistic (29) as shown below: Where , is the weighted connectivity score which represent the similarity measure between a query q (qup , qdown ) and a reference signature r. is the enrichment of in r and is the enrichment of in r.
Drug ranking score: To rank drugs, we empirically developed a novel drug prioritization system based on four metrics: CLUE connectivity score in MCF7 cell line subject to drug treatment ( 1 ), drug selectivity score ( 2 ), genetic perturbant score ( 3 ), and drug class score ( 4 ).
Where is the total score of drug i 1 is the potency score of drug i to reverse the gene expression of the DEGs in MCF7 cell line.
is the connectivity score of the drug i in MCF7 cell line generated from the CLUE server (11).
2 is the selectivity score of drug i is the average transcriptional impact score of drug i which describes its activity, relative to all other drugs, as derived from its replicate reproducibility and magnitude of differential gene expression (11). 3 is the genetic perturbation score that represents the averaged connectivity score of knocking down the target genes (1,…,j) of drug i is the connectivity score of knocking down target gene j by drug i 4 is the class score of drug i. It represents the number of drugs that have negative connectivity scores and belong to the same class as drug i.
is the ratio between number of drugs that have negative score and the total number of drugs belonging to the class of drug i.

= +
is number of drugs that have negative connectivity score and belong to class of drug i. And is number of drugs that have positive connectivity score and belong to class of drug i.

Cell culture validation
Tamoxifen resistance cell line (MCF7-RR) was purchased from EMD Millipore (Burlington, MA) and the MCF7 cell line was a gift from the HTS lab (Cancer Centre-University of Hawaii, HI). The BC cell lines were maintained at 37°C in a humidified atmosphere (5% CO2) in phenol-red free IMEM media containing 5% fetal bovine serum and 1% penicillin-streptomycin. All media and supplements were obtained from Gibco-Invitrogen, USA.
Cells were expanded in T-75 flasks and when confluent, were trypsinized and seeded in a 96-well plate (BD Biosciences) at a concentration of 5,000 cells in 200 µL of growth media per well. Cells were left to adhere overnight and the following day treated with the drugs in fivefold serial dilutions ranging from 100 to 0.0064 µM. After 72 hours of treatment, old media was replaced with fresh media and proliferation was assessed with CellTiter-Glo Luminescent Cell Viability Assay Kit (Promega). After adding CellTiter reagent, plates were placed on an orbital shaker for 2 minutes and further incubated for 10 minutes.
We purchased compounds ZM-447439, palbociclib, raltitrexed, oxindole-I and tamoxifen from Selleckchem (Houston, TX), and BMS-754807 from Sigma (North Liberty, IA). We reconstituted the drugs according to the manufacturer's instructions and stored the stock solutions at -20°C.
MCF7-RR were exposed to varying concentrations of tamoxifen (10 nM, 100 nM, 1000 nM) to determine the synergistic effect of the drugs. 300,000 cells were seeded and allowed to adhere overnight at 37°C. The following day, the five drugs were added and cells were incubated for 72 hours with a media change every 2 days. At the end of incubation, cells were counted using trypan blue stain and a hemocytometer. The IC50 was determined by the concentration at which half the cells were killed.

qPCR validation
Extracted RNA from MCF7-RR was reverse transcribed into cDNA and quantified using qPCR using SYBR Green RT-PCR kit on Step One Plus. PCR reactions were subject to 40 cycles of denaturation (95°C,

Drug reposition bioinformatics pipeline
The overall design of our proposed bioinformatics pipeline is shown in Figure 1. We used five datasets that interrogated the gene expression changes between tamoxifen resistant tamoxifen resistance and sensitive tumors (12)(13)(14) and cell lines (15,16). Briefly, we first used the list of significant differentiated expressed genes (DEGs, adjusted p-value <0.05) between tamoxifen resistant and sensitive tumors/cell lines as reported in the previous five studies (12)(13)(14)(15)(16), with a total of 229 tamoxifen resistant tumors, 363 tamoxifen sensitive tumors, as well as tamoxifen sensitive or resistant cell lines. Then we selected the 864 genes that showed consistency as either up-or down-regulated DEGs among these datasets. Next we identified the drugs and compounds that can reverse the expression of the consistent DEGs using CLUE platform (11), with an innovative and comprehensive scoring system (see Materials and Methods section). We also performed quality assessment on the results and prioritized drugs based on our proposed drugs ranking system Finally, we validated top five drugs using in vitro cell culture experiments.

Bioinformatics analysis results on differentially expressed genes
We first identified up-regulated and down-regulated genes using the five datasets (Table 1). All together we identified a total of 864 unique DEGs (467 up and 397 down-regulated) between tamoxifen resistance and sensitive tumors/cell lines, after FDR correction (adjusted p-values <0.05). Table S1 (20)).
Additionally, the GO enrichment analysis identified 109 GO entities: 97 biological processes, 6 molecular functions and 6 cellular components (Table S3). We ranked these significant pathways by the rate of enrichment, that is, the ratio between the number of DEGs and the total number of genes involved in a specific pathway (Figure 2). Foxm1 transcription factor network and Aurora B signaling pathways have the two highest enrichment rates of 25% each ( Figure 2); 12 DEGs are detected out of 43 annotated genes in these two pathways, according to pathway interaction database (PID) (18). Thus genes in these specific pathways/GOs are good candidates to be selected to reverse the tamoxifen resistance phenotype. In order to select genes as the input for CLUE query system (www.clue.io) to identify the candidate drugs, we developed a comprehensive scoring system (see the methods section) to rank the gene list. This scoring system takes into account of the drug connectivity score of MCF7 cell line subject to drug treatment in CLUE server ( 1 ), drug selectivity score ( 2 ), genetic perturbing score ( 3 ), and drug class score ( 4 ), and combine them into a single total score (St). We selected the 150 most up-regulated genes from the high ranked pathways/GOs. Table S4 shows the list of pathways/GOs ranked by its total score (St). Pathway APC/C mediated degradation of cell cycle proteins has the highest total score (St=1.37). It has a score of S1=0.24 as enrichment score, S2 =1 with 11 up-regulated DEGs, and S3 =0.12 as many of these 11 genes involved in other pathways/GOs. The final list of 150 up-regulated DEGs is listed in Table S5. 72, 44, 35, 4, 28 genes were selected from Datasets 1-5, respectively. We used the same approach above to select the 150 down-regulated DEGs (Table S5), except that we changed S2 to the down/up rate score instead of up/down rate score. Table S6 shows the pathways/GO ranked based on its total score (St). Pathway Beta1 integrin cell surface interactions had the highest total score (St=1.08), with 11 down-regulated genes. The names of the 150 down-regulated genes are shown in Table S5. 65, 5, 49, 1, 34 genes were selected from Datasets 1-5 respectively.

Filtering drug candidates with potential to re-sensitize tamoxifen resistance
We had 190 drugs and 376 genetic perturbants that have connectivity score < -90 (Table S7 and Table S8) in the MCF7 cell line. From Table S7, palbociclib, an CDK inhibitor, had the best connectivity score (Cs=-99.96) which is consistent with reports of the potential of CDK inhibitors to significantly reduce tamoxifen resistance (5,19,20).
Within the 190 hits, it is possible that false positive and inconsistent drugs exist, as such, it is critical to evaluate multiple replicates of the same compound. For example, some drugs were purchased from different vendors and produced inconsistent connectivity scores. As another example, the drug simvastatin (Broad ID #; BRD-KA81772229) produced negative score (Cs=-99.62) however, its identical structure from another vendor (BRD-K22134346) produced positive results (Cs=30.20). Such discrepancy may be due to the difference in drug purity from different vendors (personal communication with CLUE team). We removed 30 drugs that have inconsistent scores, such as simvastatin.
Another method to improve the biological relevance of predicted drug responses is to integrate the connectivity score of the drugs (Table S7) with its genetic perturbants (Table S8). We assumed that knocking down the target genes of the effective drugs (Cs<-90) should also produce negative connectivity score. For example, palbociclib (Cs=-99.96) has three target genes (CDK4, CDK6, and CCND3). Knocking down of palbociclib's target genes produced negative scores -88, -59 and -67 respectively, indicating the credibility of the connectivity score of palbociclib. Due to this concern, we removed additional 22 drugs that did not have any target genes. Finally, among the drugs each of which has a single target gene, we removed 38 that have scores inconsistent with those by knocking down its target gene. For example, the connectivity score of a JAK3-inhibitor drug (BRD-K04546108) is -99.85, however the score of knocking down JAK3 gene is 80.76, which indicates predicated response may not be reliable. After applications of these quality control steps, our final list contains100 drugs and compounds ( Figure 3).

Prioritizing drug selection with a scoring system
The ultimate goal of this study is to find candidate drugs that overcome tamoxifen resistance. To achieve this goal, we developed a drug prioritization system to rank the final list of 100 drugs (Figure 3) based on the summarization of four scores: CLUE connectivity score in MCF7 cell line ( 1 ), drug selectivity score ( 2 ), genetic perturbant score ( 3 ), and the drug class score ( 4 ). We hypothesized that the best drug had to meet four criteria: high efficacy; high negative connectivity score ( 1 , produced from CLUE server) to reverse DEGs in MCF7 cell line; high selectivity ( 2 ), consistent with knocking down its target genes( 3 ); and most of the drugs belonging to its drug class have negative connectivity score ( 4 ).
The 100 drugs belong to 123 unique classes. For some classes, such as mTOR inhibitors, all the drugs that belong to it (10 drugs), produced negative connectivity score ( Figure 4, Table S9). This indicates that the drugs in this class have the potential to reverse tamoxifen resistance.
The final list of 100 drugs ranked by its total score (St) is shown in Table S10.

Experimental validation of top 5 ranked drug and compounds
We validated five drugs out of the list of 100 drugs (Table S10) Cell viability assay results reveal that each of the five drugs was able to induce pronounced cell death on its own after reaching a certain concentration (red curves). However, cell viability dropped even faster when tamoxifen was added in combination with one of the five drug candidates (green, blue and purple curves).  Figure 5G). Further reduction in Aurora A/B mRNA levels was noticed when 0.5µM ZM-447439 was combined with 100nM tamoxifen. Relative to AURKA, AURKB has less significant fold changes. This trend is supported by previous study that showed AURKA was a determinant of tamoxifen sensitivity (21). The in vitro experimental results provide the preliminary evidence that ZM-447439, a known Aurora kinase inhibitor, is an alternative drug candidate that can re-sensitize BC patients toward tamoxifen treatment.
Lastly, we also explored the pharmaco-dynamic interactions between ZM-447439 and tamoxifen. Towards this, we used software CompuSyn v1.0 (22) to compute the combination index (CI) ( Figure 5H). CI<1, =1, and >1 indicate synergism, additive effect and antagonism, respectively. Since the CI values between ZM-447439 and tamoxifen ranging from 0.75 to 0.02 for fa=0.07~0.99 in the Fa-CI plot, it strongly suggests that these two drugs are synergistic.

Discussion
In this study, we developed a systematic computational drug repositioning strategy to rapidly identify FDAapproved or investigational drugs that have the potential to overcome tamoxifen resistance in breast cancer patients. We identified ZM-447439, a known AURORA kinase inhibitor, as a potential drug candidate for treatment of patients with tamoxifen-resistant breast tumors.
BC remains the most prevalent cancer in women, with the majority of these tumors expressing ERα.
Resistance to endocrine therapies remains a critical limitation towards curing a significant subset of ER+ patients. A large number of gene expression signatures pertaining to tamoxifen resistance and the behavior of drugs in many BC cell lines were generated although few examples have been translated into effective clinical treatments. A key challenge that has been addressed in this paper is the extraction of meaningful information from the big data that has been generated to accelerate the translation into effective medical treatments for BC patients with tamoxifen resistance. To this end, we have succeeded in solving two significant challenges. The first challenge consisted in the absence of reliable overlapped biomarker within the list of DEGs reported by independent datasets used in this study (lack of reproducibility at individual gene level). Our solution was to find a group of functionally related genes rather than a single gene that contributes significantly to the resistance of tamoxifen. To achieve this, we developed a pathway ranking system to score pathway and GO terms (Equation 1). This highly selective method of DEGs involving stringent selection criteria improved the accurate identification of target genes with likely relevance to tamoxifen resistance. Second, a large number of drugs that have high negative connectivity score from CLUE platform may be false positive. We used a novel drug scoring system to score drugs not based only on CLUE score but based on target genes perturbation score and drug class score.
Our rigorous bioinformatics pipeline, followed by in vitro viability experiments, successfully identified five novel compounds of distinct chemical structure and primary mechanisms, as being able to reverse tamoxifen resistance gene expression. Among them, ZM-447439 is most potent in this regard. The strong synergism effect (CI < 0.1) when combined with tamoxifen, adds such confidence. Additionally, many other previous clinical or experimental results support the potential role of ZM-447439 proposed here. ZM-447439 was in phase 1 clinical trials for other gynecologic cancers (23). It also appeared to sensitize resistance cancer cells to cisplatin in treating cervical cancers (24). ZM-447439 showed both antiproliferative and pro-apoptotic properties in gastroenteropancreatic neuroendocrine tumor cells (25). It likely reduces cell proliferation by triggering aberrant cell division and mitotic defects (23). It was also reported to reduce histone H3 phosphorylation at Ser10 (H3S10ph) in Hep2 carcinoma cells (26). H3S10ph is known to be mediated by Aurora kinase, the known target of ZM-447439, to regulate chromatin condensation during cell division (Sawicka and Seiser, 2012, PMID 22564826). Therefore, the molecular mechanisms of anti-proliferative and pro-apoptotic effects for ZM-447439 are likely complex, and Aurora kinase inhibition is part of it.
Aurora kinase family has three members: Aurora kinase A, B, and C. It has been reported that inhibition of AURKB with barasertib (drug #14 in Table S10) could be a candidate to treat BC patients with tamoxifen resistance (27). In addition, AURKA was overexpressed in NSCLC and contributed to cisplatin-based chemotherapy resistance (28). To our knowledge, the combined inhibition AURKA and AURKB with ZM-447439 has not been described to restrict growth of antiestrogen resistant BC cells elsewhere. Our in vitro data indicate that ZM-based treatment alone or in combination with tamoxifen may induce a regression of tamoxifen-resistant tumors in breast cancer patients in vivo. Co-treatments of tamoxifen with aurora kinase inhibitor such as ZM-447439 could be especially favorable in vivo, as we have shown the combination to be synergistic, that is, the antitumor property of their mixture is likely to be greater than either drug separately.

Conclusions
we propose a generalizable, systematical bioinformatics pipeline to connect tamoxifen resistance based on consistent gene expression in different datasets with drug perturbation expression profiles. We successfully and prelimiliarily demonstrated that drugs that have high scores from such a pipeline show potency to reverse tamoxifen resistance, with ZM-447439 being the most promising candidate.

Ethical Approval and Consent to participate
Not applicable

Consent for publication
Not applicable

Availability of supporting data
All data generated or analyzed during this study are included in this published article [and its supplementary information files].
We gratefully acknowledge Dr Herbert Yu from the University of Hawaii-Cancer Center for introducing the research question addressed in the paper.     based on the summarized average of four criteria: S 1 is the drug connectivity score in MCF7 cell line calculated by CLUE system [10]; S2 is the drug selectivity score; S3 is the drug consistency score between the connectivity score of the drug and its target genes; S4 is the score of drug class which represents the proportion of drugs with negative connectivity score in that drug class.

Figure 4:
The score of drug class. This score represents the ratio between the drugs that have negative connectivity scores and the total number of drugs belonging to this class. For example, the score of mTOR inhibitor class is 100 as all drugs in this class have negative scores.

Supplementary Materials
Supplementary