Cancer Stem Cell Transcriptome Landscape Reveals Biomarkers Driving Breast Cancer Heterogeneity

Breast cancers are heterogeneous diseases with distinct clinical outcomes and cancer stem cell percentages. Exploring breast cancer stem cell landscape could help understand the heterogeneity of such cancers with profound clinical relevance. Methods We conducted transcriptional proling of cancer stem cells and non-cancer stem cells isolated from 3 triple negative breast cancer cell lines, analyzed the cancer stem cell transcriptome landscape that drives breast cancer heterogeneity through differential expressed gene analysis, gene ontology and pathway enrichment as well as network construction, and performed experimental validations on the network hub gene. Results We identied a cancer stem cell feature panel consisting of 122 and 381 over-represented and under-expressed genes capable of differentiating breast cancer subtypes. We also underpin the prominent roles of the PI3K-AKT pathway in empowering cancer cells with uncontrolled proliferative and migrative abilities that ultimately foster cancer stemness, and reveal the potential promotive roles of ATP6V1B1 on breast tumor stemness through functional in vitro studies. Conclusions Our study contributes in identifying a cancer stem cell feature panel for breast tumors that drives breast cancer heterogeneity at the transcriptional level, which provides a reservoir for diagnostic marker and/or therapeutic target identication once experimentally validated as demonstrated by ATP6V1B1.


Introduction
Breast cancers are highly heterogeneous regarding the molecular feature, pathogenesis and clinical outcome, which can be classi ed into at least luminal A, luminal B, human epithelial receptor 2 (HER2) positive, and triple negative breast cancer (TNBC) subtypes [1]. TNBCs are the most aggressive and heterogeneous among all subtypes, which can be further distinguished into, e.g., luminal androgen receptor (LAR), basal-like immunosuppressed (BLIS), basal-like immune-activated (BLIA), and mesenchymal (MES) subclasses [2], rendering the precise diagnosis di cult to make. Further, TNBCs lack targeted therapies due to the lack of sur cial receptors estrogen receptor (ER), progestrogen receptor (PR) and HER2 [1], and patients carrying TNBCs suffer from severe side effects if treated with chemo-or radiotherapies [3]. Therefore, exploring effective therapeutic strategies with little side effects against TNBCs is an important yet di cult task for breast cancer management.
TNBCs contain higher percentage of cancer stem cells (CSCs) than the other subtypes [4]. Therefore, targeting CSCs, which are considered to promote DNA mutation during carcinogenesis and the generation of heterogeneous bulk cancer cells, brought up new opportunities to the therapeutics of TNBCs. CSCs are not homogeneous across cancer types [5]. Thus, exploring the transcriptomic pattern of CSCs of TNBCs can advance our understandings on the molecular features of TNBCs and help us identify the diagnostic marker and/or therapeutic targets for establishing novel medical platform for precise TNBC management.
By exploring the molecular differences between CSCs and non-CSCs of breast cancer cells, following pathway and gene ontology enrichment analysis, as well as network construction, we identi ed a CSC feature panel consisting of 503 genes that is capable of differentiating breast cancer subtypes as validated using two publicly available datasets. We hypothesize that CSCs drive the molecular heterogeneity of breast cancers and modulating key CSC relevant genes in this panel may alter cancer cell stemness with experimental validations con rmed using ATP6V1B1 that has not been previously associated with cancer stemness. Our study contributes in providing a marker panel capturing breast cancer stemness and stimulating in-depth studies on the novel roles of APT6V1B1 in CSC control.

Differential expression analysis
The GSE132083 data stored at Gene Expression Omnibus (GEO) was used in this study. Differential expression analysis was performed based on student T test and Bayes theorem using the 'llimma' library from the 'Bioconductor' package (http://www.bioconductor.org/). Genes differentially expressed in stem versus non-stem cells were identi ed according to FPKM (fragments per kb per million reads, Equation 1) using the empirical Bayes approach (the 'eBayes' function), with the signi cance criteria set to log2FoldChange ≥ 2 and Benjamin-Hochberg adjusted p value≤0.05 Hierarchical clustering analysis Hierarchical clustering was performed to evaluate the differential expression pattern between stem and non-stem cell cohorts, where Euclidean distance and the Ward linkage were used [6]. An additional microarray gene expression data comprised of 56 cell-lines [7] was used for computational validation. The data was publicly available and stored at European Bioinformatics institution (https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-181/).

Functional enrichment analysis
Functional enrichment analysis was performed based on Gene Ontology (GO; http://www.geneontology.org) and Kyoto Encyclopedia of Genes and Genomes database (KEGG; http://www.genome.jp/kegg/pathway.html) using the R package 'clusterPro ler'. Fisher's exact test was utilized to measure the signi cance of GO terms and biological pathways. The p-values were adjusted using Benjamini-Hochberg false discovery rate (FDR), and the threshold of p<0.01 was used to assess the statistical signi cance of each test [8].
Gene interference and cell transfection assay SKBR3 cells were plated in the 6-well plate, and three siRNAs (RIBOBIO, Guangzhou, China) were designed to silence ATP6V1B1 (supplementary Table 1). SKBR3 cells were transfected with ATP6V1B1 siRNA-1, siRNA-2, siRNA-3 and non-targeting siRNA (negative control) separately and in combination using siRNA mate transfection agent (RIBOBIO, Guangzhou, China). The siRNA concentrations used for transfection were 50 nm for single siRNA. Cells were incubated in 5% CO 2 at 37°C for 36 h and 48 h, respectively, before testing the silencing effect at transcriptional and translation levels using qRT-PCR and Western blot. Primer pairs of ATP6V1B1 used in this study are provided in Supplementary Table 2. Triplicates were conducted, with statistical signi cance cutoff being set at p≤0.05 from student T-test.
Flow cytometry assay SKBR3 cells after siRNA transfection for 36h were harvested and resuspended in PBS. 200,000 cells were incubated with the antibodies in recommended concentrations at 37°C for 0.5 h, followed by PBS washing for two times. Cells were sorted and analyzed by Flow cytometry (BD Biosciences). The ALDEFLUOR Kit (StemCell Technologies) was used to test ALDH activity following the manufacturer's protocol. The Diethylaminobenzaldehyde (DEAB), a speci c ALDH inhibitor, was used as negative control in the ALDH analysis. Flow cytometry data processing was performed with owJov10(Tree Star ,USA).

Transwell invasion and migration assays
Cell invasion and migration were examined in 24-well Transwell chambers (8 μm; Corning, USA). 50 ul Matrigel (2 mg/mL; Sigma-Aldrich) was used to cover the surface of the upper chambers in the invasion assay but not in the migration assay. SKBR3 cells (1 × 10 4 ) in 100 μL FBS-free medium were seeded in the upper chamber followed by the addition of 500 μL complete medium (with 10% FBS) to the bottom well. After 24-h incubation, cells invaded to the lower surfaces were gently cleaned and xed with 4% Paraformaldehyde for 30 min and stained using 0.1% crystal violet for 30 min. Five random elds from the membrane were selected, and the number of migrated cells was counted under the ortho microscope (Nikon; ×100 magni cation) and photographed.

Results
Selection of CSC feature genes There are 212 and 522 genes concomitantly up-and down-regulated, respectively, in both SUM149 and HCC1937 cells (Figures 1A, 1B). Genes over-and under-expressed in CSCs of SUM149PT and HCC1937 were reduced to 122 and 381, respectively, when being mapped to the upper and lower 50% quantiles of SUM159PT (which is comprised primarily of CSCs). These 503 genes were selected as candidates capturing the features of breast cancer stem cells and named as 'CSC feature genes' (Figure 1C,   supplementary Table 3).
Our experimental samples could be nicely clustered based on their cancer stemness following cell linewise difference (Figure 2A), suggesting that these CSC feature genes primarily capture differences on cancer stemness. Such a clustering pattern could be well-reproduced using the 56 breast cancer cell line data retrieved from [7] Figures  3C, 3D), where the critical roles of PI3K/AKT pathway in cancer cell migration [9,10] and differentiation [11] and migration are well-known, and losing tight junctions represents an important feature of cancer metastasis [12].
Regulatory networks were constructed using up-and down-regulated CSC feature genes ( Figure 4).
Overall, more CSC feature genes were down-regulated than up-regulated, which is in accordance with the less differentiated state that CSCs represent. PRKCA and ESR1 each has the most connectivity among the up-and down-regulated CSC feature genes, where PRKCA is a central signaling node and therapeutic target for breast cancer stem cells [13] and co-inhibiting ESR1, mTORC1, HDAC suppresses breast cancer stemness [14].
Functional studies on one CSC feature gene Among these CSC feature genes, selected ATP6V1B1, a gene with the highest connectivity next to ESR1 and being connected with FA2H (our recently identi ed gene with suppressive role on cancer stemness [15]), and tested its functionalities in transiting cells between states having different CSC and non-CSC percentages.
The siRNA technique was used to silence ATP6V1B1 in SKBR3 cells to explore the functional roles of ATP6V1B1 on breast cancer stem cell regulation. By testing the e ciency of the designed siRNAs separately and in combination, we found that siRNA-1 and siRNA-3 could effectively silence ATP6V1B1 in SKBR3 cells at the transcriptional level ( Figure 5A, p=0.0012 for siRNA-1, p=0.0064 for siRNA-3) and translational level (Figures 5B, p=0.015 for siRNA-1, p=0.007 for siRNA-3). We used siRNA-1 in the following assays. Flow cytometry analysis was conducted following the procedure aforementioned to examine the CSC percentage of cancer cells. The results showed a substantial reduction on CSC percentage after silencing ATP6V1B1, with and without supplementing cells with ALDH inhibitor (DEAB) in SKBR3 cells ( Figure 5C). Transwell assays were performed to test cell invasion and migration according to procedures described in [16]. With non-transfected cells as the control, silencing ATP6V1B1 signi cantly reduced cell invasion (p=0.0335) and migration (p=0.0213) in SKBR3 cells ( Figure 5D). In addition, breast cancer patient overall survival could be signi cantly strati ed by ATP6V1B1 protein expression using the 'Tang_2018 (n=118)' dataset (p=0.0067, Figure 5E).
In summary, by silencing ATP6V1B1 in SKBR3 (considered to be comprised of non-CSCs), cells exhibited increased cancer stem cell percentage, cell invasion and migration abilities, which is consistent with our observation from the transcriptome data that ATP6V1B1 is lowly expressed in CSCs.

Discussion
More genes were suppressed than over-expressed in CSCs, i.e., 381 versus 122 genes, in the CSC feature panel, suggesting a more plastic state that CSCs represent. Thus, CSCs are similar with stem cells in a sense that they are pluripotent, but differ from stem cells as they are chaotic and occur under pathological conditions. Down-regulated genes were enriched in 'tight junction' and 'epidermis development' (Figures 3A, 3C), attracting CSCs in a less differentiated state. Up-regulated genes were enriched in the 'PI3K-AKT signaling pathway' and cell migration related GO terms ( Figures 3B, 3D), implicating the more proliferative and aggressive feature of CSCs than the bulk tumor cells.
The regulatory network constructed using down-regulated CSC feature genes was centered around ESR1 and ATP6V1B1. ESR1, also known as ER, is the primary biomarker used for breast cancer subtyping and typically under-expressed in triple negative breast cancers that harbor higher cancer stemness. Several genes connected with ESR1 have been associated with breast cancer stemness. For example, we previously reported the suppressive role of FOXA1 on breast cancer stemness [17]; SOX9 is as a stem cell factor[18] that drives the epithelial mesenchymal transition (EMT) in non-small cell lung cancer through Wnt pathway [19] and maintains human breast luminal progenitor and breast cancer stem cells through SOX9 mediated signaling [20], and both the SOX9/FXYD3/SRC [21] and the SOX9/SOX2 [20] axes are critical for breast cancer stem cell functionalities; PLA2G7 is associated with ER negativity in clinical breast cancer samples and regulates EMT in vitro [22]. The prognostic values of several markers have been reported, including PTPN6 and KRT19 in breast cancers [23][24][25], EPPK1 in non-small cell lung cancers [26], SERPINB5 in colorectal cancers [27], and NCCRP1 in squamous cell carcinoma [28]. Besides, the roles of S100A7 and SLC37A1 was implicated in breast cancers [29,30], that of SERPINB4 was reported in squamous cell carcinoma [31], and that of MYH14 was lately identi ed in pancreatic cancers [32]. Also identi ed down-regulated in the CSC feature panel include several famous genes in cancer stem cell maintenance or signaling such as CDH1 [33], EPCAM and CLDN7 [34].
While the ER centered cluster has already been well-associated with cancer stemness (i.e., 12 out of 14 densely connected genes already have well-supported evidences), relatively less has been reported on the association between the cluster centered at ATP6V1B1 and CSC features. Most genes in this cluster are involved in cell energy production and metabolic regulations, suggesting the pivotal roles of metabolic reprogramming in CSC feature maintenance. Out of the 11 genes, 5 have known roles in cancer initiation and progression, i.e., SH3YL1 together with DOCK4 regulate breast cancer cell migration [35], ATP6V1C2 is an early prognostic marker for colorectal cancers [36], PRKAR2B promotes EMT and is oncogenic in prostate cancers [37,38], MYO5B is associated with gastric cancers [39,40], and the physiological role of AKR1B15 and its involvement in cancer development has been characterized in [41]. Among the rest 6 members in the cluster, FAT2 encodes a cadherin family member and is aliased as FAT tumor suppressor homolog 2, MYO5C encodes a member of the same family with MYO5B that has already been implicated in cancers, ATP6V1B1, ATP6V1C2 and ATP2C2 encode ATPase subunits, and FA2H encodes fatty acid 2hydroxylase, where the association between FA2H and cancers has been recently reported [15].
Up-regulated genes are centered at PRKCA which is a key component in PI3K-AKT signaling. This is consistent with the pathway analysis and suggests the prominent roles of uncontrolled proliferation and increased migration ability in CSC feature initiation and maintenance. VIM [42], MMP2 [43], FGF2 [44], ITBG3 [45,46], ANXA6 [47] are metastasis-associated genes. The oncogenic roles of KCNMA1 has been revealed in breast, prostate, ovarian and colorectal cancers [48][49][50][51][52], and RRAD is associated with glucose uptake in a human ovarian cancer model [53] and implicated in non-small cell lung cancers [54].
This CSC feature panel can clearly distinguish samples with high and low stemness in GSE132083, and characterize basal-A, basal-B and luminal cell lines in the E-MTAB-181 dataset. Though cell line wise difference exists in the molecular pro ling of GSE132083, it is secondary to the differences characterized by cancer stemness (Figure 2). These not only suggest the validity of identi ed CSC feature in differentiating breast cancer cells with different CSC percentages, but also implicate the prominent roles of CSCs in driving breast cancer heterogeneity.
We experimentally explored the functionalities of ATP6V1B1 and its association with breast cancer stemness, through which the novel role of ATP6V1B1 in suppressing breast cancer stemness has been revealed. However, in-depth investigations on the regulatory mechanism of ATP6V1B1 in breast cancer progression and its potential prognostic value on breast cancer outcome await to be conducted.

Conclusion
We identify a CSC feature panel consisting of 503 genes which can be used for breast cancer cell subtype differentiation and offer a reservoir for diagnostic marker and therapeutic target identi cation. We propose that CSCs are the driving force of breast cancer heterogeneity empowering cells with differential uncontrolled proliferative and migrative abilities. We also reveal the potential encouraging role of ATP6V1B1 on cancer stemness with explorations on the underlying mechanisms await in-depth elucidation.

Ethical Approval and Consent to participate
Not applicable.

Consent for publication
The authors are consent for the publication of this paper.

Competing Interests
The authors declare no con ict of interests