Drug Repositioning Study in Search of Potent Inhibitors Blocking Multiple Pathways for the Treatment of Breast Cancer

Breast cancer is most malignant cancer in women worldwide. The efforts are going on for improvement in patient’s survival and treatment strategies. The goal of the present work is to nd out connections among drugs-genes-breast cancer and to re-purpose approved drugs for the treatment of breast cancer. In this context, the gene expression data of breast cancer samples was analyzed to identify the upregulated/downregulated genes in different clinical stages of breast cancer. A large number of genes were found to be upregulated/downregulated in different clinical stages of breast cancer. Some of the genes were found to be stage specic and some were common to all stages. The biological pathways were studied in early and late stages of breast cancer, which indicated that pathways as Methylglyoxal Degradation I, Catecholamine biosynthesis and Serotonin/Melatonin biosynthesis were enriched in early stage and pathways as Matrix metalloproteases, Airway pathology in COPD, Glycogen degradation II and Glycogen degradation III were enriched in late stages of breast cancer. The drug repurposing analysis revealed that the different classes of drugs as enzyme inhibitors, CNS agents, Glucocorticoids, insulin sensitizers, tubulin inhibitors, adhesion inhibitors showed strong connections with MCF7 cell line. Further, it was found that the drugs as Nortriptyline, AZD-6482, Acitretin and GW-507 were found to target multiple genes in interleukin pathway, enzymatic pathway and GPCR signaling pathway and drugs as Caffeine, Canertinib and Triciribine was found to target multiple genes which were dysregulated and involved in interleukin pathway, enzymatic pathway, GPCR signaling pathway and metabolic pathways.


Introduction
Breast cancer is the most frequently diagnosed cancer in women and is the leading cause of death worldwide 1 . As per ACS statistics 2017, approximately 2 lakhs 50 thousand new cases of breast cancer were detected. It is one of the prevalent cancers in women and its heterogeneity is one of the major challenges in the treatment of breast cancer [2][3][4] . So, the need of the today is to choose the crucial and effective therapy for breast cancer treatment. The ultimate goal of the cancer research is to connect the disease to the genes/protein underlying it and nally the chemical modulator/drug which can be used to treat them.
Connectivity Map 5 is an online tool developed by Golub et al., that provide generic solution to the problem by establishing the disease-gene-drug connections and accelerate the pace of drug development. The genomic signatures can be used to identify the drugs with common action and to discover mechanism of actions of unannotated small molecules and recognize the potential drugs. CMAP initially includes only 164 drugs tested on few cell lines 5 . CLUE (CMAP linked user environment) is the updated version with 1000-fold increase of CMAP pilot data 6 .
Drug repurposing is a method to identify new therapeutic uses of already existing/approved drugs. It signi cantly reduces the escalating cost and length of time needed for bringing new drugs to the market. Drug repurposing is also known as drug re-positioning and is the application of treating pathological conditions with the known drugs which is other than their use in original conditions. It is an alternative method to conventional drug discovery pipeline which takes 10-15 years for the discovering de novo drug compounds. Nowadays in United States, around 30% of newly marketed drugs coming through drug repositioning method. Despite recent advances in various technologies, the number of newly approved drugs does not increase. Only 5-10% of the drugs entering the Phase I clinical trials comes to market and rest fails due to their signi cant toxicity or suboptimal pharmacological response. Moreover, it is not possible to develop de novo drug compounds for more than 8000 orphan diseases with the huge research and development costs and time. In 2004, Ashburn and Thorn introduced an approach known as drug repurposing in which new targets and indications of existing drugs are explored and analysed using bioinformatics facilities 7,8 . The strategy for drug repositioning involves the computational method, algorithms and tools which have been extensively used and called as computational drug repurposing. Various in-silico methods and techniques help in identifying the relationship among drug, targets and diseases.
The polypharmacological compounds acting on multiple targets can be screened through high-throughput screening and can be used in treatment of multiple diseases. The data mining tools and bioinformatics analysis can be used to derive some meaningful results and conclusions required for repurposing of drugs. There are various drugs which have been repurposed for example Nel navir for cancer, Tamoxifen for bipolar disorder, Gleevec for rheumatoid arthritis, Pentylenetetrazol for Down Syndrome, Astemizole for malaria, Lipitor for Alzheimer's and Metformin for cancer 9 . In CLUE, the authors have tested various molecular perturbagens against different cell lines belonging to breast cancer, prostate cancer, leukemia, melanoma etc. and scored them in terms of negative to positive connections. The query in CLUE involves the set of differentially expressed genes belonging to a particular pathological condition of interest. CLUE then connects those lists of genes to the drugs and gives a connectivity score ranging from + 100 to -100 in positive (positive connectivity) or negative (negative connectivity). All instances were then ranked based on their connectivity scores, those at top are the positive connections and those at bottom are the negative connections.
In the present work, we have used bioinformatics approach to relate the FDA approved drugs to the genes underlying breast cancer. In context of this, we have initially analysed the gene expression data of breast cancer and adjacent normal samples and evaluated the genes and biological pathways dysregulated in different clinical stages of breast cancer. Further, the genes were related to molecular perturbagen/drug with the help of connectivity map and biological pathways of target genes were studied. The drugs were screened by mapping to biological pathways and nally selected using Venn analysis.

Material And Methods
The ow of the present employed computational approach is given in Fig. 1.

Data collection and classi cation
The expression data of transcripts was downloaded from Cancer RNA-seq Nexus 10

Differential gene expression analysis
The differential gene expression was computed in different stages as stage I, Ia, Ib, II, IIa, IIb, IIIa, IIIb, IIIc and IV of breast invasive carcinoma. The fold change was calculated using transcript expression values in carcinoma and adjacent normal samples and fold change values were converted to logarithm scale. The genes with Log2 fold change (FC) greater than 1 and adjusted p-value < 0.05 were considered to be signi cant. The genes with Log2 FC > 1 and adjusted pvalue < 0.05 were considered to be signi cantly upregulated and genes with Log2 FC>-1 and adjusted p-value < 0.05 were considered to be signi cantly downregulated.

Biological pathway analysis
The biological pathways were analyzed using Ingenuity Pathway analysis (IPA) software (https://www.qiagenbioinformatics.com/products/ingenuitypathway-analysis/). The top 200 upregulated and downregulated genes were considered for pathway analysis from all the stages. The pathways were analysed in early and late stage of breast invasive carcinoma. The early stage included stages I, Ia, Ib, II, IIa, IIb and late stage included stages as IIIa, IIIb, IIIc and IV. The top pathways based on p-value were considered to be signi cant pathways.
The query in CLUE involves the set of differentially expressed genes belonging to a particular pathological condition of interest. Clue connects those lists of genes to the drugs and gives a connectivity score ranging from + 100 to -100 in terms of positive or negative connectivity. All instances were then ranked based on their connectivity scores. In the present study, we have mapped speci c breast cancer genes to approved drugs with the help of CLUE. The drug targets were then clubbed, and the enrichment of biological pathways were studied using IPA software. The drugs were mapped against speci c gene targets in pathway maps. The drugs affecting multiple genes in pathways were further studied.

Differential expression analysis
The differential expression study was done in different stages of breast invasive carcinoma as Stage I, Ia, Ib, II, IIa, IIb, IIIa, IIIb, IIIc and IV. A huge number of genes were found to be dysregulated in different stages (supplementary table S1a to S1j). Some of the genes were found to be stage speci c and some were common to all stages as there were about 1984 genes which were speci c to early stage and around 1753 genes that were speci c to late stages of breast cancer. (Fig. 2a and 2b, supplementary table S2). The number of upregulated and downregulated genes in each stage are shown in Table 1. Additionally, we found some genes that were speci c to breast cancer cells and were not expressed in adjacent normal cells ( Table 2). The hierarchical clustering showed that genes of stage I and Ia were clustered together based on differential expression which further clustered with genes belonging to IIIb, IIIc and IV and genes of stage IIb and IIIa were clustered together (Fig. 3a). Additionally, the clustering of genes common to all stages showed clustering of stage I & II and III & IV (Fig. 3b).  Expression in Clinical stages   IFNL2  I,IA,IIA,IIB,IIIA  KRTAP10-3  IIA   KRTAP5-6  I,IA,IIA,IIB,IIIA ,IIIC  LOC146481  IIA, IIB, IIIA   SPANXN3  I,IA,IIA,IIB,IIIA, IIIC  MBD3L1  IIA, IIB   ELSPBP1  I, IIB  MBD3L5  IIA, IIB   ACTL9  IIA  OFCC1  IIA, IIB   AK056267  IIA, IIIA  OR2G6  IIA   AK093214  IIA  OR2W5  IIA   AK131021  IIA  OR5T2  IIA, IIB   APCS  IIA  OR6M1  IIA   ATOH1  IIA  PDCL2  IIA, IIB, IIIA   AX747578  IIA  PRODH2  IIA   CGB2  IIA  RNASE9  IIA   CHAT  IIA  SLC22A24  IIA, IIB, IIIA   CT45A3  IIA  SNAR-B2  IIA, IIB   ELP4  IIA  SSX7  IIA   FAM25C  IIA  TBC1D3P2  IIA   FTHL17  IIA  TCP10  IIA   GAGE1  IIA, IIB  TINAG  IIA   GAGE12D  IIA  CELA2A  IIB   GAGE12J  IIA  CGB1  IIA   GAGE2A  IIA  LGALS14  IIB   GAGE2D  IIA, IIB  PLGLA  IIB   GAGE4  IIa, IIB  PRAMEF1  IIB, IIIA   IFNA10  IIA  SI  IIB   IL1F10  IIA  SNORA51  IIB   KIF2B  IIA  TRIM53AP (TRIM53)  IIB   KRMP1  IIA, IIB  OR8D2  IIB To compare the genomic variations among the clinical stages of breast cancer, we have plotted the differentially expressed genes as per their fold change values in a circos plot. The outermost track (track 0) is circular ideogram representing chromosomal number. Extending inside, is the heatmap plots representing different stages of breast cancer from stage I to IV (Fig. 4)

Biological Pathway analysis
The top up and downregulated genes in each stage were further studied for biological pathways. In early stages, 127 biological pathways were found and in late stage 69 pathways were found (supplementary table S3a, S3b). Figure 5a and 5b shows the bar plots of biological pathways with ratio and -log(p-value).
In early stage of breast invasive carcinoma, pathways as Methylglyoxal Degradation I pathway, Catecholamine biosynthesis pathway and Serotonin and Melatonin biosynthesis pathway were found and in late stage, pathways as Inhibition of Matrix metalloproteases, Airway pathology in COPD, Glycogen degradation II and Glycogen degradation III were found (supplementary gure S1a to S1f). The literature reports suggest that these pathways have been involved in development or progression of breast cancer by direct or indirect means. The knockdown of glyoxalase I in MCF-7 signi cantly reduced tumorassociated properties such as migration and proliferation. Study supports that Glyoxalase inhibitors may be used as anti-cancer drugs. [11][12][13][14] . The chronic stress can enhance breast cancer disease progression by upregulating catecholamines level and signaling of β-adrenergic receptors 15 . Serotonin regulates homeostasis in breast, also blocks ER-alpha 16,17 . It was found that the increased level of 5-HT signaling favor malignant progression of human breast cancer cells 16 .

Repurposing of drugs
The genes which were expressed only in breast cancer cells were considered for drug re-purposing. The inhibitors of these genes were mapped using CLUE.
Around 177 drugs (inhibitors) were mapped against those genes with connectivity score cutoff − 95.0 (supplementary table S4). The inhibitors belonged to different classes as enzyme inhibitors, CNS agents, antioxidant, glucocorticoid receptor agonist, hormonal agents, diuretic, sigma receptor agonist, DNA alkylating agents, mucolytic agents, angiogenesis inhibitors, reuptake inhibitors, platelet aggregation inhibitor, retinoid receptor agonist etc. (Fig. 6a). These inhibitors were found to target various other genes which were enlisted (supplementary table S5). The biological pathways of the all these genes were studied using IPA software. A total of 421 pathways were found (supplementary table S6), out of which 22 pathways were signi cantly enriched (p-value < 0.05) ( Fig. 6b and Table 3), such as fatty acid biosynthesis pathway, melatonin degradation pathway, tryptophan degradation pathway, IL-22 signalling, Bupropion degradation, role of JAK family kinase in IL-6 type cytokine signaling, PXR/RXR activation pathway, EGF signaling, IL-15 signaling, cAMP mediated signaling, G-protein coupled receptor signaling etc. The drugs were mapped against speci c genes in pathway maps (supplementary gures S2a to S2q) and their frequency was calculated (Fig. 6c) using pathway matrix table (supplementary table S7). Further, 22 pathways were categorized into four broader categories as interleukin pathway, enzymatic pathway, GPCR signaling pathway and metabolic pathways. A venn analysis indicated that the drugs as Nortriptyline, AZD-6482, Acitretin and GW-507 were found in three categories (interleukin pathway, enzymatic pathway and GPCR signaling pathway) and drugs as Caffeine, Canertinib and Triciribine were found in all four categories (interleukin pathway, enzymatic pathway, GPCR signaling and metabolic pathway) (Fig. 7a). The score of Canertinib, Nortriptyline, Triciribine, Caffeine, Acitretin, AZD-6482, GW-5074 against MCF7 cell lines was found to be -99. Additionally, the genes involved in breast cancer were related to strong knockdown-overexpression (KD/OE) gene pairs using CLUE. Around 26 strong KD/OE pairs were found which indicates how the knockdown or overexpression of a gene is related to breast cancer. For example, the overexpression of query genes (input gene to CLUE) is positively connected to knockdown of ANKZF1 and negatively connected to overexpression of ANKZF1 (Fig. 7c,  Nortriptyline, AZD-6482, Acitretin and GW-507 were found to target multiple genes in interleukin pathway, enzymatic pathway & GPCR signaling and drugs as Caffeine, Canertinib and Triciribine were found to target multiple genes in interleukin pathway, enzymatic pathway, GPCR signaling and metabolic pathways. These drugs showed strong connections to other cancers as well such as prostate cancer, lung cancer, liver cancer, colon cancer, kidney cancer, melanomas.
Our approach seeks to maximize the use of datasets, tools and techniques to understand role of approved drugs in breast cancer. We hope that a single or combination of few of these drugs will contribute to drug design and discovery against breast cancer.

Con ict of interest
No potential con ict of interest was reported by the author(s). Figure 1 The ow of present employed computational approach   Heatmap showing all differentially expressed genes (a) and common genes (b) in different stages.

Figure 5
Bar diagrams representing biological pathways in a) early stage and b) late stage of breast cancer.