Alterations of Genetic Variants and Transcriptomic Features of Response to Tamoxifen in the Breast Cancer Cell Line

Breast cancer is one of the most important causes of mortality in the world, and Tamoxifen therapy is known as a medication strategy for estrogen receptor-positive breast cancer. In current study, two hypotheses of Tamoxifen consumption in breast cancer cell line (MCF7) were investigated. First, the effect of Tamoxifen on genes expression prole at transcriptome level was evaluated between the control and treated samples. Second, due to the fact that Tamoxifen is known as a mutagenic factor, there may be an association between the alterations of genetic variants and Tamoxifen treatment, which can impact on the drug response.


Abstract Background
Breast cancer is one of the most important causes of mortality in the world, and Tamoxifen therapy is known as a medication strategy for estrogen receptor-positive breast cancer. In current study, two hypotheses of Tamoxifen consumption in breast cancer cell line (MCF7) were investigated. First, the effect of Tamoxifen on genes expression pro le at transcriptome level was evaluated between the control and treated samples. Second, due to the fact that Tamoxifen is known as a mutagenic factor, there may be an association between the alterations of genetic variants and Tamoxifen treatment, which can impact on the drug response.

Methods
In current study, the whole-transcriptome (RNA-seq) dataset of four investigations (19 samples) were derived from European Bioinformatics Institute (EBI). At transcriptome level, the effect of Tamoxifen was investigated on gene expression pro le between control and treatment samples. Moreover, Tamoxifen is known as a mutagenic factor, therefore, its contribution to alterations of genetic variants and drug response were examined.

Results
Results achieved from RNA-seq analysis indicated the contribution of several candidate genes to tumor suppression process and consequently, the achievement of an effective treatment. For instance, XIAPassociated factor 1 (XAF1) was reported as an up-regulated gene under Tamoxifen therapy. XAF1 is a tumor suppressor that contributes to the apoptosis induction and tumor growth inhibition along with TP53. Results of gene ontology enrichment analysis of differential gene expressions indicated that most of them could considerably lead to the cell death, apoptosis, and negative regulation of proteolysis process. Findings achieved from evaluating Tamoxifen mutagenicity effect on drug response was not con rmed perfectly. The most reported candidate genes, which were related to differential genetic variants between control and treated samples, played the oncogene and tumor suppressor dual roles and also their exact roles in breast cancer were not investigated precisely.

Conclusion
At transcriptome level, Tamoxifen consumption in MCF7 cell line could be associated with candidate genes and biological pathways that contribute to the apoptosis, proteolysis, and tumor suppression. The mutagenicity effect of Tamoxifen and its contribution to drug response was not con rmed perfectly.

Page 3/33
Background Cancer is one of the most important causes of mortality in the world and breast cancer is a common disease among women. Hormonal therapy is a medical strategy for breast cancer treatment ( Akram et al. 2017). Tamoxifen is considered as the main non-steroidal drug in breast cancer treatment for postmenopausal women (Abo-Touk et al. 2010), which inhibits the estrogen activity through binding to the estrogen receptor competitively (Geisler, 2011).
There are several investigations carried out with the purpose of illustrating the hormonal therapy effects that provide a better understanding of the drug response mechanism and select an effective strategy for the therapeutic period Genetic variants refer to the genetic differences between individuals of a population (Kharrati et al., 2012). DNA is a vulnerable molecule against various mutagens including ultraviolet, toxins, chemical agent, and free radicals (Koopaee and Esmailizadeh, 2014). Recently, high-throughput sequencing platforms have been applied as powerful tools in order to investigate the association between a massive number of genetic variants and drug response (Madian et al. 2012;Esfandiari et al., 2020).
It is shown that Tamoxifen has a mutagenic effect on the endometrium cells and increases the incidence of endometrial tumors (White and Smith, 1996). Results of evaluating the rat hepatic tissue showed that activated Tamoxifen could bind to guanine N2-position of DNA and consequently, produce pro-mutagenic lesion (Brown, 2009). More importantly, it was found that Tamoxifen mutagenicity effect induced DNA damages in human endometrial cells (Liapis et al., 2008). Considering the in vitro conditions, Tamoxifen would lead to the gene mutations and increased incidence of abnormal chromosomal structures in rat liver tissues (Davies et al., 1997). All of the above-mentioned literature reviews indicated that Tamoxifen could play a critical role in the alterations of genetic variants background. There are several examples about the role of genetic variants in drug response. To achieve a therapeutic effect, there has to be an interaction between the drug and its target. DNA variations can both increase and decrease the binding a nity of a drug to its target. In addition, genetic variations can transform the antagonist role of drug into an agonist one; therefore, the most common problem of treatment procedures is resistant mutations in drug targets. Tamoxifen blackens Estrogen (ER-positive cancer) in the breast cancer treatment procedure and consequently, decreases the risk of cancer recurrence. Tamoxifen is an anti-estrogen hormone that inhibits the Estrogen receptors; however, its e ciency would be decreased as a result of the occurrence of mutations in estrogen receptors and it leads to the conversion of RE-positive into progesterone-positive cancer. Consequently, it causes the drug resistance development and nonresponse to treatment (Wardell et al. 2019).
It is noteworthy that genetic variations may contribute to drug metabolism and in uence the drug response. For instance, if the drug is rapidly metabolized, its concentration will decrease due to the weaker drug action or side effects. Considering slower metabolism procedures, higher drug levels would result in the stronger or longer actions and side effects (Levy and Polatsek, 2002). Gene expression analysis at transcriptome level is considered as another considerable tool that can show the effects of hormone therapy on genes expression pro les. It has been shown through several studies that there is a close association between transcriptome response and therapeutic drug consumption (Sjöström et  Current study investigates two hypotheses of Tamoxifen consumption in breast cancer cell line (MCF7). First, there may be an association between genetic variants alterations and Tamoxifen treatment due to the fact that Tamoxifen is a mutagenic factor that impacts on the drug response. Second, how does Tamoxifen affect the gene expression pro le at transcriptome level? Investigating Tamoxifen therapychanged genetic variants and genes expression could be useful to understand the drug response mechanism; also, it would provide a new insight to the increase of the chance of survival, decrease the side effects, and select an appropriate strategy for the therapy period.

Materials
The summary of data collections, genetic variant, and RNA-seq analyses are represented in Figure 1.

Data collection
In current study, the whole-transcriptome (RNA-seq) dataset of four investigations were derived from EBI (https://www.ebi.ac.uk/). More details of collected datasets were provided in Table1.
Table1. More details of RNA-seq datasets to discover the genetic variants and gene expression analysis was observed within cancer cell lines. Therefore, the variant calling procedure was carried out using the low frequency algorithmon the basis of the following parameters: required variant probability (%) = 95.0 ignore broken pairs = yes, minimum coverage = 10, minimum count = 2, minimum frequency (%) = 30, base quality lter = Yes, neighborhood radius = 15, minimum central quality = 30, and minimum neighborhood quality = 25 (Doan et al., 2012). Chi-square test was performed with the purpose of explaining the differences of genetic variants distribution between control and treated samples.

Comparing the variants and gene ontology (GO) enrichment analysis
After performing the variants calling process, genetic variants of treatment samples were compared with the reads of control samples in order to remove the common genetic variants among treated and control samples. The le of gene ontology association, which included the gene names and associated gene ontology terms, was downloaded from the gene ontology consortium (http://geneontology.org/) and imported to CLC Genomic Workbench 12. Moreover, differential genetic variants were applied to perform GO enrichment analysis at levels of biological process, molecular function, and cellular component. The signi cance of the level of GO analysis was determined to be 0.01.

RNA-seq analysis
RNA-seq analysis was carried out based on the annotated genome reference with transcripts using various parameters including mismatch cost=2, insertion cost=2, deletion cost=2, length fraction=0.8, and similarity fraction= 0.8; also, the expression value was calculated through RPKM implementation. Reads per kilobase of transcript, per million mapped reads (RPKM) is a normalized unit of transcript expression. Results achieved from RNA-seq analysis of treated and control samples were compared in order to identify the differential genes expression (DGE). The signi cant level (P≤0.01) was considered as a threshold of DGEs detection. Heat map and volcano plot were implemented to visualize the RNA-seq analysis results; also, the process of RNA-seq analysis was carried out using CLC genomic workbench 12.

Genetic variants detection
Results of quality control indicated that there was not any necessary special trimming strategy for RNAseq datasets. To minimize the alignment errors, 5% of the short reads with the lowest Phred scores were removed. Results of alignments of short reads against reference genome (hg 38) are provided in Table2. Furthermore, 66%-89% was reported for the mapping percentage.
Table2. The mapping summary of short reads against reference genome There were almost 5.8 million genetic variants identi ed in current study including single nucleotide variations (SNVs), multi nucleotide variations (MNVs), insertion, deletion, and replacement. The highest and lowest frequencies among detected genetic variants were respectively related to SNVs and replacement. More details of genetic variants frequencies are provided in Figure 2.
To investigate the effect of Tamoxifen on genetic variants distribution within control and treated samples, a statistical analysis was separately carried out for each genetic variant on the basis of chi-square test for total genetic variants in the control and treatment samples. Results showed that genetic variants distribution within control and treated samples was signi cant (P≤0.05). Also, it was found that all of the genetic variants distributions in control and treatment samples were signi cant (Table 3), which indicated the possible effects of Tamoxifen on the genetic variants frequency. Results of the comparison between genetic variants of control and treatment samples indicated that there were 117 differential genetic variants. Among all of the differential variants, 13 genetic variants were located in the coding regions and 12 variants led to the amino acid sequence transformation within the protein structure. Table 4 shows more details of differential genetic variants. The process of gene ontology enrichment analysis of differential genetic variants was carried out at three levels of biological process, cellular component, and molecular function; therefore, a total number of 77 signi cant GO terms were reported (Table 5). At the biological process level, the most repetitive of reported overlapping gene names were GEN1, HSPA5, NSMCE2, AURKA, and DDX11 candidate genes. GEN1 (Flap endonuclease GEN homolog 1) encoded a member of Rad2/xeroderma pigmentosum group G nuclease family. As it was observed for BRCA1 and BRCA2, ndings showed that GEN1 contributed to resolve the Holliday junction in the homologous recombination. It is noteworthy that Holliday junction can play a vital role in the cancer chemo-sensitivity (Wu et al., 2016). Somatic truncating GEN1 mutations have been reported in breast cancers; therefore, it would indicate the fact that GEN1 may be a predisposition gene in breast cancer. However, it was shown that although it plays a critical role in the double-strand DNA break repair, GEN1 would not make any appreciable contribution to breast cancer susceptibility through acting as a high-or intermediate-penetrance breast cancer predisposition gene such as BRCA1, BRCA2, CHEK2, ATM, BRIP1, and PALB2 (Turnbull et al., 2010). Sun et al. (2014) suggested that GEN1 would play a vital role in DNA damage response; therefore, its alteration could lead to the breast cancer. HSPA5 (Heat-shock protein 5) is considered as a marker of poor prognosis in breast cancer patients, and plays a critical role in promoting the drug resistance and metastasis (Chang et al., 2014). A close association was observed between the cancer behaviors of heat shock proteins family; however, all members of HSP family have not been studied completely (Zoppino et al., 2018). NSMCE2 is an E3 SUMO ligase and a subunit of SMC5/6 complex that could be associated with DNA repair (Pond et al., 2019). Although SMC5/6 complex functions were not described precisely, it was reported that it could act as a tumor suppressor in mice (Jacome et al., 2015). AURKA (Aurora Kinase A) is a serine/threonine kinase that contributes to the regulation of cell cycle progression; therefore, it could be a potential cancer susceptibility gene (Cox et al., 2006 (Fig. 3). Therefore, it could be said that Tamoxifen therapy may impact on gene expression pro les at the transcriptome level among treated and control samples.
The visualization of RNA-seq analysis results derived from control and treated samples were shown in a heat map (Fig 4). It was also shown that mitochondrial respiratory genes were expressed at lower levels within treated samples compared to control ones. In current study, it was reported that MT-CO1, MT-CO3, MT-ND2, MT-ND4, MT-ND5, MT-ND6, and MT-ATP6 were the mitochondrion respiratory genes. MT-ND genes provide NADH dehydrogenase. This protein is a part of a large enzyme complex encoded by the mitochondrial genome. Moreover, the dysfunction of MT-ND proteins would lead to the electron transport chain disruption and ATP production. MT-CO genes encode Cytochrome C Oxidase subunits within mitochondria. It is found that they were the last enzyme in the mitochondrial electron transport chain for ATP synthesis ( GATA binding protein 3 (GATA3) is a highly conserved transcription factor that belongs to GATA family and leads to the expression of a large number of important genes (Voduc.et al., 2008). Furthermore, it contributes to the human growth and differentiation cells including the mammary tissue. Lower levels of GATA3 expression in breast tumors are associated with larger tumors. Therefore, GATA3 is considered as an important gene in breast cancer development; however, its exact role as an oncogene or tumor suppressor is unclear (Afzaljavan.et al., 2021;Takaku.et al., 2015). Keratin 18 (KRT18) is a member of the intermediate lament family of cytoskeletal protein that is involved in the tissue integrity, and its overexpression has been reported in many cancers (Zhang et al., 2019). It was also reported that KT18 was over-expressed in breast cancer and played a vital role in the breast tumorigenesis and tumor dedifferentiation (HA et al., 2011). Insulin-like growth factor binding proteins (IGFBPs) would regulate many cellular processes such as cell proliferation. IGFBPs act as binding proteins for insulin-like growth factor (IGF); furthermore, it is evidenced that they play a critical role in the cancer progression, especially in breast cancer (Hermani.et al., 2013). However, there are various reports regarding their activities as oncogenes or tumor suppressors. IGFBP5 may be considered as an oncogene due to its contribution to metastasis, proliferation, and limited responses to endocrine treatment; also, it acts as a tumor suppressor because of its apoptotic role, anti-metastatic function, and anti-migratory effects (Akkiprik et al., 2015) Sulfatase family, which includes sulfatase1 (SULF1) and sulfatase 2 (SULF2), plays an important role in the multiple biological pathways through regulating the sulfation status (Jiang.et.al, 2020). It was con rmed that SULF2 would promote the breast cancer progression and regulate the tumor-related genes expression in breast cancer (Zhu.et.al, 2016).
Results of the whole transcriptome analysis showed that there were 21515 DGEs among control and treated samples, while ndings of Volcano plot indicated that most of DGEs were classi ed as the downregulated genes. At signi cant levels (P<0.01, -Log 10 (P-value)>2), there were 910 and 3 candidate genes reported as the signi cant down-and up-regulated ones (Fig.5).
Results of Volcano plot showed that three candidate genes including GREB1, EGR3, and XAF1 were clustered as the up-regulated genes. It was also found that the estrogen-based growth regulation in breast cancer 1 (GREB1) was an early estrogen-responsive gene, and there was a close association between GREB1 expression and estrogen levels in breast cancer patients. In fact, GREB1 was an ESR1 (estrogen receptor 1) that could mediate the estrogen action. It was reported that the optimal level of GREB1 expression was required for breast cancer cells proliferation (Cheng.et al, 2018). However, GREB1 knockdown could prevent the breast cancer cell lines proliferation; therefore, it was found that targeting GREB1 could provide a possible treatment strategy through inhibiting the tumor-promoting pathways (Hodgkinson.et al., 2018). Early growth response (EGR) is a family of transcription factors that contributes to various biological pathways (Pio.et al., 2013). It was reported that EGR3 could be induced by estrogen in breast cancer MCF-7 cells and consequently, become involved in the estrogen-signaling pathway (Inoue.et al., 2004). Moreover, EGR3 levels were signi cantly higher within tissue samples derived from patients with recurrent breast cancer compared to those with primary tumors (Knudsen.et al., 2020). XIAP-associated factor 1 (XAF1) is a tumor suppressor observed in the multiple human neoplasms (Shin et al., 2017). It was shown that XAF1 loss expression would be resulted from tumor staging and its dysfunction was associated with tumor progression. Moreover, its appropriate expression could play a critical role in the apoptosis inductions and tumor growth inhibition in the gastric cancer It is noteworthy that PROM1, FBN2, and KLHL14 were highly down-regulated (Fig.5). Prominin 1 (PROM1orCD133) is known as a biomarker of cancer stem cells; however, its biological role is not illustrated perfectly (Saha et al., 2020). Findings showed that there was an association between PROM1 levels and malignancy properties stages including initiation, progression, and metastasis. Moreover, it was reported that PROM1 would contribute to the cell motility and invasion, and may affect the malignancy of breast tumors. Also, PROM1 genes were highly expressed in TNBC cell lines (Brugnoli et al., 2019;Guo et al., 2017). Fibrillin-2 (FBN2) is an extracellular calcium-binding micro bril that contributes to several biological pathways including the bone mineralization, osteoblast maturation, and calcium binding (UniProtKB: P35556). FBN2 is considered as a biomarker of cancers early diagnosis. For example, Promotor hypermethylation of FBN2 is associated with colorectal cancer as an early event. In fact, Methylation may lead to FBN2 down-regulation in primary tumors (Yi.et al., 2012), while Kelch-like protein 14 (KLHL14) belongs to Kelch family genes and interacts with torsin-1A (UniProtKB: Q9P2G3). It was shown that KLHL14 was signi cantly overexpressed in breast cancer compared to normal breast tissues, and had a positive relationship with tumor aggressiveness (Fritzsche.et al, 2006). Moreover, ndings indicated the vital role of KLHL14 in the development of various cancers including ovarian cancer (Chen.et.al, 2020).
Results derived from GO enrichment analysis of DGEs showed that most of DGEs were enriched in the regulations of apoptosis and cell death pathways (Table 6). Cell cycle damage is considered as the main cause of cancer incidence; therefore, the balance between proliferation and cell death is disrupted in cancers (Robert and David, 2005). It was shown that apoptosis inactivation would play a vital role in the process of cancer development (Brown and Attardi, 2005). Therefore, it could be said that signi cant GO term of apoptosis pathways could contribute to cancerous cell death under Tamoxifen therapy.
Proteolysis is a hydrolysis reaction that occurs when peptide bonds and proteins are broken down into smaller polypeptides or amino acids. There is an association between the metastasis of malignancy tumor and overexpression of proteolytic enzyme. More importantly, proteolysis inactivation in cancerous tissue plays a critical role in the inhibition of tumor invasion, angiogenesis, and migration (Wyganowska-Świątkowska et al., 2019). Interestingly, our ndings suggested that negative proteolysis regulation and consequent regulations of proteolytic pathways could be regarded as considerable GO terms that control the cancer under Tamoxifen treatment (  vesicle  14  ACSL4, AKR1B1,  CRISPLD1, CTSZ, FSTL1,  GREB1, GSTP1, IGFBP7,  ITM2C, LDHB, MSN,  PSAT1, RAB34, SCNN1A   0044432  28  Endoplasmic  reticulum part   9  ACSL4, COL9A2, CTSZ,  FADS2, FSTL1, GJA1,  IGFBP3, IGFBP7, RCN1 Discussion Generally, breast cancer tumors are hormone receptor-positive with highly estrogen-and progesteronedependent growth rates. Tamoxifen is a type of hormonal therapy implemented with the purpose of treating the estrogen receptor-positive breast cancer; also, it can decrease the risk of invasive cancer development. However, it is able to affect the gene expression pro le at transcriptome level (Taylor et al., 2010). Furthermore, the activated form of it could bind to DNA and act as a mutagen factor (Brown, 2009); therefore, it may impact on the drug response process through the alteration of genetic variants.
To con rm this idea, Chan et al. (2016) showed that the genetic alteration of variants would contribute to drug response and provide a new insight about pharmacogenomics implication. Therefore, it could be said that Tamoxifen consumption may lead to several biological pathways that can increase or decrease the chances of survival and effective treatments. Results achieved from RNA-seq analysis in MCF7 cell line under Tamoxifen therapy revealed that there were several candidate genes and biological pathways that could result in the tumor suppression and consequently, effective treatments. XIAP-associated factor 1 (XAF1) can be up-regulated under Tamoxifen therapy; also, it is a tumor suppressor that plays a critical role in the apoptosis induction and tumor growth inhibition in gastric cancer (Tu.et.al, 2009 reported that Tamoxifen could regulate the gene expression in breast cancer cells and indicated several Tamoxifen-regulated candidate genes including SOCS1 and IEX-1, which was in accordance with our ndings. Cytokine Signaling 1 (SOCS1) suppressor is considered as a tumor suppressor that plays a negative regulatory role for cytokine action through JAK/STAT pathway and suppresses the growth of hepatocellular carcinomas. IEX-1, which is an immediate early response gene that can widely be expressed in various tissues, is overexpressed in breast cancer cells and has an inhibitory effect on breast cancer cell proliferation. Generally, it could be concluded that Tamoxifen implementation in the treatment procedures could be bene cial at the transcriptome level; however, it may induce the oncogene expression in rodent uterine (Nephew et al., 1996).
Our hypothesis regarding Tamoxifen mutagenicity effect and its role in the drug response was not approved appropriately. It was found that most of the candidate genes with differential genetic variants had dual roles as oncogenes or tumor suppressors; however, their exact contribution in breast cancer has not been investigated precisely.
For example, ndings of the genetic variant analysis revealed that differential genetic variants between control and treated samples (under Tamoxifen therapy) were overlapped with NSMCE2 and DDX11 candidate genes. NSMCE2 contributes to DNA repair pathway and acts as a cancer suppressor in mice (Jacome.et.al, 2015). DDX11 is a member of helicase family, which involves in DNA replication pathways; however, its role in cancer has not been investigated precisely. DNA helicases may have a tumor suppressor function in cancers (Mahtab.et.al, 2021).
It was revealed by GO analysis that reported differential genetic variants were associated with FNTA, HSPA5, IL-6, and AURKA candidate genes. FTase enzyme (FNTA) could lead to the tumor progression and abnormal copy numbers of FNTA were associated with pathological changes of breast cancer (Tian.et.al, 2020). HSPA5 (Heat-shock protein 5) promoted the drug resistance and metastasis (Chang et al., 2014), while Interleukin-6 (IL-6) contributed to the expansion and differentiation of tumor cells (Masjedi et al., 2018). AURKA gene was associated with potential cancer susceptibility and became involved in the cell cycle progression regulation (Cox et al., 2006).

Conclusion
Results of current study that was carried out at transcriptome level showed that Tamoxifen consumption in MCF7 cell line could be associated with candidate genes and biological pathways that contribute to the apoptosis, proteolysis, and tumor suppression. Moreover, Tamoxifen could decrease the expression of candidate genes involved in tumor progression, invasion, and metastasis. The mutagenicity effect of Tamoxifen and its contribution to alterations of genetic variants was not con rmed perfectly. Therefore, it was suggested that Tamoxifen could not have any signi cant role in an effective treatment through changing the genetic variants pro le. Availability of data and materials The datasets analyzed during the current study are available in the European Bioinformatics Institution (EBI) repository (Table1).

Con ict of interest
The authors declare no con ict of interest. Author's contributions HKK, MNC and STH conceived the study. HKK contributed to data analysis and prepared the primary draft of manuscript. AD, MD, and KBL revised the manuscript. All authors read and approve the manuscript.   The heat map of RNA-seq analysis for control and treated samples