A pivotal aspect in any gene expression study is the selection of appropriate internal controls that are expressed uniformly irrespective of culturing conditions, experimental treatment, nutrient stress etc. These controls (or references) normalize any variations in starting quantities, calibration issues or poor pipetting, thereby providing accurate results. However, complex gene-gene interactions and environmental effects on gene expression complicate identification of such controls. Another layer of complexity in this pursuit is added by intrinsic heterogeneity of the cancer cells. Several tools and algorithms (NormFinder, geNorm, BestKeeper, Comparative ∆Ct and RefFinder) have been developed that can aid in sorting, selection, and validation of the reference genes. However, the considerations regarding applicability of these algorithms and the acceptable cut-off limits are often misinterpreted or misapplied. Finally, identification and extensive validation of the reference genes for each type of biological object in different conditions may not be feasible in every instance. Such studies are often time and resource (labor, financial) consuming. In the present study, we have identified and validated novel reference genes for normalization of RT-qPCR results in SK-BR-3 breast cancer cell line.
In the past the TCGA database has been commonly used to identify novel and reliable candidate genes in different cancer types [37–38], but it has never been investigated individually for HER2-E subtype of breast cancer. Our analysis of TCGA legacy data revealed that GABARAP, ACTB and PFN1 were the top three genes with the least variation amongst the samples (Fig. 1b). However, upon validation in the SK-BR-3 cell line, we found that the least variation amongst the samples was displayed by RBX1, UBC and CFL1 (Additional File 3). These differences between TCGA database and RT-qPCR results are normal and expected. Firstly, the database represents a heterogenous mixture of similar subtypes while the SK-BR-3 is a single cell line, therefore constitutes a more homogenous sample. Secondly, all samples of TCGA repository come from primary untreated tumors collected from different institutes. Thirdly, there is an inherent possibility of bias in the biorepository generation process, stemming from different institutional research interests, operative protocols, or patient populations . Also, it must be taken into account that tissue specimens (TCGA data) in addition to cancer and normal mammary cells contain various types of stromal and immune cells, which are absent from the cell culture samples. This type of heterogeneity also may have effects on the overall transcription levels and stability. The database accommodates for most of the possible gene expression variations in tissues (in vivo) and, since cell lines like SK-BR-3 have been under constant cultivation over long-term, they tend to behave like outliers when compared to the TCGA database. Nonetheless, such extrapolation, bridging and application of in vivo data to in vitro conditions enable us to find common core genes which are uniformly expressed in both scenarios.
Gene Ontology (GO) analysis (Table 2) revealed that the candidate genes selected based on the TCGA samples and literature were most significantly enriched in cellular response to cytokine stimulus (GO:0071345; CFL1, GAPDH, HSP90AB1, PPIA, PSMB4, RBX1, RPL13A, TUBA1B and UBC). Indeed, it is well known that breast cancer cells respond to various cytokines in their microenvironment. The cells have been shown to evade cytokines like TGF-β in the early stages (due to its anti-proliferative effects), however, in the later stages TGF-β stimulates the progression of the disease by induction of epithelial to mesenchymal transition (EMT type 3) [72–74]. IL-1 (adipokine) is known to increase the aromatase activity in SK-BR-3 cells, resulting in generation of bioactive estrogens and increased cellular proliferation . Other cytokines like IL-6, IL-19, IL-20, TGF-α, TNF-α, and IL-23 are also known to promote cancer progression . Using data available from Mouse Genome Informatics (MGI; http://www.informatics. jax.org/), CFL1 was associated with cellular response to IL-1, IL-6 and TNF, GAPDH and RPL13A were associated with response to IFN-γ, while HSP90AB1 and TUBA1B were associated with the response to IL-4. Apart from cytokine response, the genes were found to be enriched in viral life cycle (GO:0019058; BSG, HSP90AB1, PCBP1, PPIA and UBC) and positive regulation of viral processes (GO:0048524; BSG, PFN1 and PPIA). Evidence of various virus-related DNAs like EBV (Epstein-Barr virus), HPV (Human Papillomavirus), BLV (bovine leukemia virus) and MMTV (Mouse mammary tumor virus) has been found in breast cancers [76–79]. In fact, Lawson et al., demonstrated presence of HPV associated pre-malignant koilocytes in normal and malignant breast tissues , indicating the oncogenic correlation between viruses and breast cancer and the possible explanation for the enrichment of virus associated pathways in SK-BR-3 cell line. Using data from MGI, we found that BSG was associated with positive regulation of viral entry into the host cell, while HSP90AB1 was associated with virion attachment to the host cell. On the other hand, PFN1 and PPIA were found to be associated with positive regulation of viral transcription and viral genome replication, respectively.
To understand the role of the four validated reference genes in metabolic processes, GO biological processes analysis in association with data from MGI were investigated. The genes were found to be associated with negative regulation of apoptosis (DAD1) and nucleobase containing compound metabolic process (Table 2). HSP90AB1 was found to be associated with positive regulation of activities of telomerase, phosphoprotein phosphatase and protein serine/threonine kinase. PFN1 was found to play a role in positive regulation of DNA metabolic processes and of transcription by RNA polymerase II, while PUM1 was associated with regulation of mRNA stability and production of miRNAs (microRNAs). Based on Molecular function, HSP90AB1 and PFN1 shared two common ontologies – RNA binding and Cytoskeletal protein binding (Additional File 2), both of which seemed to be downregulated in hypoxic conditions (for PFN1, fold change was 0.68 and 0.45 after 24 hours 72 hours of acute hypoxia, respectively, and 0.89 in chronic hypoxia). With respect to cellular component analysis the regulation of different genes (even from the same ontology) during hypoxia was divergent/unrelated. Whilst PUM1, HSP90AB1 and PFN1 shared two ontologies - nuclear and cytosol associated genes, the genes were differently regulated with PUM1 showing upregulation while other two showed downregulation in gene expression.
Our analysis revealed that the four reference genes (HSP90AB1, DAD1, PUM1 and PFN1), when used in any combination of three, successfully normalized the genes of interest (AURKA, BUB1 and SNAI1) in both hypoxic and normoxic conditions over multiple passages. HSP90AB1, previously known as HSPCB, was first described as a candidate reference gene by Jacob et al.  in a variety of 25 different cancer cell lines. However, the only breast cancer line in that study was MCF-7 (Luminal A subtype). The gene has been included among the most stable reference genes in other tissues/organs, e.g., ovary, muscle tissue, adipose tissue etc. [81–82]. Heat Shock Proteins like HSP90AB1 have been previously shown to be downregulated in response to hypoxia in pig adipose-derived stromal/stem cells  as well as in human hepatocytes . Such downregulation (fold expression change of 0.71 after 24 hours and 0.88 after 72 hours in acute hypoxia and 0.52 in chronic hypoxia) appeared to have an impact on the stability of the gene by marginally lowering the expression stability in hypoxic conditions (Fig. 5).
DAD1, a novel reference gene identified in the present study, is a small integral membrane protein of the oligosaccharyltransferase (OST) enzyme complex involved in the highly conserved asparagine-linked glycosylation of proteins in all eukaryotic cells . The gene has been shown to be involved in apoptosis. Loss of DAD1 gene led to apoptosis in hamster cell lines and yeast cells [86–87]. In fact, DAD1 was preferentially expressed in hepatocellular carcinoma (HCC) and prostate cancer cells [88–89]. Hence, it has been postulated that high expression of DAD1 in HCC cells can activate OST and block apoptosis, thereby enhancing tumor cell survival . We speculate a similar role of the gene in the SK-BR-3 breast cancer cell line that could explain its consistent and stable expression. In A431 epithelial carcinoma cells, DAD1 was upregulated in hypoxic conditions (by a fold change of 1.2) after exposure of 72 hours . In our results, however, DAD1 was downregulated by 0.64-fold change after exposure to acute hypoxia for 24 and 72 hours. The expression then increased and approached normoxia levels during chronic hypoxia (fold change 0.95). The differences in our results from those in A431 cells could be explained by the cardinal differences in background transcriptome and epigenome of A4321 and SK-BR-3 cells. Furthermore, there were differences in culturing conditions. While our cells were exposed to 2% O2 in acute hypoxia, the A431 cells were exposed to < 0.1% O2 . Our results suggest that SK-BR-3 cells after an initial shock phase quickly adapted to chronic hypoxic conditions and continued to express core genes in levels similar to normoxia. This is also supported by the fact that the expression stability of DAD1 improved in hypoxia, i.e., it was expressed even more stably after exposure to hypoxia.
PFN1, another novel reference gene identified in the present study, is often regarded as the founding member of its family constituting four profilin genes (PFN1, PFN2, PFN3 and PFN4). These genes have been associated with almost every aspect of cellular functions including proliferation, survival, motility, endocytosis, membrane trafficking, mRNA splicing as well as gene transcription [91–92]. The overexpression of PFN1 could negatively regulate cancer cell motility in breast cancer cells . Additionally, it has been demonstrated that in triple negative breast cancer cell lines, overexpression of PFN1 suppresses AKT (serine-threonine kinase) activation via upregulation of PTEN (phosphatase and tensin homolog) , indicating the tumor-suppressive character of PFN1 gene. Similar observations have also been reported in pancreatic cancer cells . These findings are of interest since the expression of PFN1 in the SK-BR-3 cell line seems to be uniform and stable despite its anti-tumorigenic nature. In fact, depletion of PFN1 in breast cancer cells has enabled hyper-migratory phenotype in vitro and enhanced hematogenous dissemination from primary tumor in vivo . We can hypothesize that some downstream regulation is at play, which leads to decreased PFN1 protein levels in breast cancer cells despite rather uniform expression of the gene.
Finally, the fourth reference gene identified for the SK-BR-3 cell line, PUM1 has been previously described as a candidate reference gene in various cancers including breast cancers [28,33,35]. The gene has been associated with cancer cell growth, migration, and invasion . In fact, amongst the four selected reference genes in our study, only PUM1 showed an increase in expression fold change in hypoxia. The fold change was 0.80 and 1.13 after 24 hours and 72 hours of acute hypoxia, respectively, and reached 1.56 in chronic hypoxia. These findings indicate the crucial role played by the gene in the maintenance of cellular functions in hypoxic conditions. Another candidate gene, RPL13A, has been previously described as a stable reference gene in breast cancers [27,33]. The expression of RPL13A was quite stable in our analysis, however, it did not yield successful normalization of the genes of interest.
Various other reference genes in the past have been described to be suitable reference genes in breast cancers including GAPDH, ACTB, PPIA and RNA28S/RNA18S [26,29,32,34]. However, there is also contrary evidence suggesting against the use of these genes as references [18–22]. Use of GAPDH as a reference gene has long been a topic of debate. It is overexpressed in cervical, prostate, pancreatic and lung cancers, and it has been the least stable gene in multiple studies [29,34,98–99]. Similar concerns have been raised concerning use of ACTB as a single internal control . RNA18S and RNA28S have been reported previously to be stable reference gene candidates , however, concerns regarding absence of purified mRNA samples and their relatively high abundance compared to the target mRNAs have been reported . Both RNA28S and RNA18S were highly expressed (mean Cq = 7–8) in the present study and in our study of MCF-7 cell line . Secondly, these two genes are not included in the TCGA database, which makes it difficult to compare their expression between in vivo and in vitro scenarios.
PPIA along with ACTB was found to be the most stable reference gene for basal type breast cancer cell lines in hypoxic and serum deprived conditions . However, our analysis revealed that the expression of PPIA was moderately stable and ranked 9th in the combined analysis (Additional File 3). In our study of MCF-7 breast cancer cell line, we identified GAPDH-CCSER2-PCBP1 triplet as the most stable reference gene triplet which could be used to normalize the expression of genes of interest in various nutrient stress conditions . However, the expression of CCSER2 was extremely variable in the present study. The gene did not reach the thresholds set for TCGA analysis, and the expression of the gene was low (mean Cq = 27.4). Hence, we eliminated it from our analysis in early stages. PCBP1 and GAPDH were among the top 15 most stable reference genes in our analysis (Additional File 3). Interestingly, GABARAP was identified to have the least variation (CV%) among our novel candidate reference genes, however, the gene was associated with low expression and poor primer efficiency when confirmed by RT-qPCR (Additional Files 1 and 3).
Nonetheless, the results of the present study are constrained by some limitations. First, the expression stability of the reference genes was validated in vitro only. Second, these reference genes were tested in normoxic and hypoxic conditions only. Their use for normalization of expression in other conditions such as nutrient stress, drug treatment etc. remains to be validated. Finally, given the heterogeneous behavior of cancer cells, there is a need for inter-laboratory validation to further confirm our results. The major question that arises is how many more reference genes we need to identify. Rather, how many reference genes can we identify. Although a precise number will be difficult to predict, the estimates in normal human tissues (using ESTs) predict the numbers to be in the range of 3100 to 6900 genes , thereby making a plethora of reference genes still waiting to be identified and validated that could be more promising than the ones previously reported. However, we agree with other authors that rather than testing thousands of genes, we need to validate a panel of reference genes whose expression under varying conditions can be proven to be as minimally variable and as robust as possible . Accordingly, based on our analysis, we suggest the use of HSP90AB1, DAD1, PFN1 and PUM1 in any combination of three (triplet), thereby giving other researchers not one but four different combinations to choose from based on their individual experimental designs and needs.