CD133 expression is associated with less DNA repair, better response to chemotherapy and survival in ER-positive/HER2-negative breast cancer

Purpose: CD133, a cancer stem cells (CSC) marker, has been reported to be associated with treatment resistance and worse survival in triple-negative breast cancer (BC). However, the clinical relevance of CD133 expression in ER-positive/HER2-negative (ER+/HER2−) BC, the most abundant subtype, remains unknown. Methods: The BC cohorts from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC, n = 1904) and The Cancer Genome Atlas (TCGA, n = 1065) were used to obtain biological variables and gene expression data. Results: Epithelial cells were the exclusive source of CD133 gene expression in a bulk BC. CD133-high ER+/HER2− BC was associated with CD24, NOTCH1, DLL1, and ALDH1A1 gene expressions, as well as with WNT/β-Catenin, Hedgehog, and Notchsignaling pathways, all characteristic for CSC. Consistent with a CSC phenotype, CD133-low BC was enriched with gene sets related to cell proliferation, such as G2M Checkpoint, MYC Targets V1, E2F Targets, and Ki67 gene expression. CD133-low BC was also linked with enrichment of genes related to DNA repair, such as BRCA1, E2F1, E2F4, CDK1/2. On the other hand, CD133-high tumors had proinflammatory microenvironment, higher activity of immune cells, and higher expression of genes related to inflammation and immune response. Finally, CD133-high tumors had better pathological complete response after neoadjuvant chemotherapy in GSE25066 cohort and better disease-free survival and overall survival in both TCGA and METABRIC cohorts. Conclusion: CD133-high ER+/HER2− BC was associated with CSC phenotype such as less cell proliferation and DNA repair, but also with enhanced inflammation, better response to neoadjuvant chemotherapy and better prognosis.


Introduction
Although cancer stem cells (CSC) comprise only 0.1-1% of the cancer cells within a bulk tumor, they possess properties of self-renewal, initiate tumors from a single cell, and differentiate to resist treatments, thereby being implicated in causing relapses [1][2][3].The roles of CSC in breast cancer (BC) have been studied [4][5][6]; however, the results were sometimes contradictory or incongruent due to heterogeneity in the techniques used to detect CSC populations [7][8][9].For example, some studies utilized protein expression, often detected by immunohistochemistry, while others relied on gene expressions of multiple markers.
CD133 was shown to activate WNT/β Catenin pathway in vitro, which is an essential signaling pathway for cell proliferation of CSCs [14].CD133 also activates the NOTCH and Hedgehog pathways which are other characteristic pathways related to CSCs [15].Expression of CD133 assessed by immunohistochemistry in 67 patients was shown to correspond to the aggressiveness of triple-negative breast cancer (TNBC) [9].CD133 was associated with a less response to neoadjuvant chemotherapy (NAC) in 102 BC patients when measured by immunohistochemistry [16].CD133 mRNA overexpression was linked with a poor prognosis in invasive BC [17].On the other hand, the clinical signi cance of CD133 expression in the ER-positive/HER2-negative (ER+/HER2-) BC remains unexplored.
Despite the distinct biology of each BC subtype and considering that ER+/HER2-is the most prevalent subtype, there is a gap in our understanding of the speci c implications of CD133 in this context.
Our group has been employing an in-silico approach to conduct translational research, investigating the clinical relevance of gene expression.Unlike experiments involving cell lines or animals, we have gained comprehensive and reliable insights by analyzing multiple independent large patient cohorts of transcriptomes associated with clinical parameters [18][19][20][21].In this study, we hypothesize that high CD133 expression would be associated with the prognosis of ER+/HER2-BC.To better elucidate the association between the prognosis of ER+/HER2-BC and the CSC surface marker CD133, we utilized two distinct cohorts: The Cancer Genome Atlas (TCGA) cohort, which includes 1065 breast cancer patients, and the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort, consisting of 1904 breast cancer patients.
According to the staging guidelines of the American Joint Committee on Cancer, BC staging was conducted.

Gene set enrichment analysis (GSEA)
GSEA investigates the extent to which the expression of genes related to a certain pathway differs between groups.In this study, the cohort was categorized into 'high' and 'low' expression groups based on the median value as the cutoff.Fifty Hallmarks of Cancer gene sets in the Molecular Signatures Database (MSigDB) [39] were studied, as previously demonstrated by the Broad Institute (http://www.gseamsigdb.org/gsea/index.jsp)[40].The Normalized Enrichment Score (NES) was employed to assess the strength of the correlation.The False Discovery Rate (FDR) was utilized for statistical analysis, considering a cutoff for signi cance as an FDR value of less than 0.25.This choice aligns with the recommendation by the Broad Institute to adjust for gene set size, considering the multiple gene sets analyzed in our study.

Cell composition analysis
Thorsson et al. computed and reported cell proliferation score, homologous recombination defects score, mutation rate score, neoantigens score, and immune activity score [41].We utilized xCell, the web-based computational algorithm developed at the University of California San Francisco (https://www.xcell.ucsf.edu), to analyze the association between the transcriptomic data and the enrichment of immune cells in different BC groups, as previously described [24-25, 27-31, 42-44].Transcriptomic data of 64 cell types in the tumor microenvironment (TME) can be analyzed by the xCell algorithm.In this study, we analyzed transcriptomic data of immune cells such as T helper cells [45], regulatory T cells, M1 and M2 macrophages, CD8 + cells, CD4 memory cells [23], dendritic cells [24], and B cells.

Statistical Analysis
Data downloading, analysis, organization, and visualization were performed using R software (version 4.0.1.http://www.r-project.org/).Histograms were created to describe differences between "high" and "low" CD133 tumors.Two-sided test was employed to calculate p-values, and the cutoff of less than 0.05 was regarded as statistically signi cant.Median and interquartile level values were displayed using Tukey-type boxplots.
Survival analyses were performed using Kaplan-Meier plots with log-rank tests.

CD133 gene was predominantly expressed in epithelial cells in the BC tumor microenvironment (TME) of single-cell sequence cohorts
Given that CD133 is a well-characterized cell surface marker of CSCs [13], we rst investigated which type of cells in the TME express CD133.This analysis was conducted using two independent single-cell sequence cohorts of BC patients, SCP1039 and SCP1106.We found that epithelial cells, from which BC cells arise, predominantly express CD133 in the TME (p < 0.001, Fig. 1).Since epithelial cells in bulk tumors are almost exclusively cancer cells, the observation of high CD133 expression in "normal" epithelial cells in SCP1039 likely represents less proliferative cancer cells.
Furthermore, CD133-low tumors were associated with higher Ki67 gene (MKI67) expression compared to CD133-high tumors consistently in both cohorts (both p < 0.001, Fig. 3B).In concurrence with this, the precalculated Proliferation Score by Thorsson et al. [41] demonstrated an inverse relationship with CD133 expression in the TCGA cohort (p < 0.001, Fig. 3C).These ndings compellingly suggest that elevated CD133 expression is linked to lower cell proliferation in ER+/HER2-BC.

CD133 expression was associated with less DNA repair activity and less mutation rate
We have previously demonstrated that DNA repair pathway enhancement is linked with cell proliferation [18, 45,52].Given that CD133 expression was negatively associated with cell proliferation in ER+/HER2-BC, it was of interest to explore whether CD133 gene expression was associated with the DNA repair pathway and its related gene expressions.We found that DNA repair gene set was signi cantly enriched in CD133-low tumors in both TCGA (NES = -1.83,FDR = 0.07) and METABRIC cohorts (NES = -1.60,FDR = 0.10, Fig. 4A), aligning with the relationship between CD133 expression and cell proliferation.Expressions of genes related to DNA repair such as BRCA1 and E2F7 were signi cantly higher in CD133-low tumors in both TCGA and METABRIC cohorts (all p < 0.001, Fig. 4B), whereas E2F1 and CDK1/2 expressions were higher in only one cohort (TCGA and METABRIC, respectively, both p < 0.001), not validated by other.Although E2F4 is also known as a gene related to DNA repair [53], there was no relationship with CD133 expression in our study.Homologous recombination de ciency (HRD) scores were negatively correlated with CD133 expression in the TCGA cohort (Fig. 4C), and silent and non-silent mutation rate was slightly enriched in CD133-low tumors (Fig. 4D).On the other hand, there were no differences in single nucleotide mutations (SNV) neoantigens and indel neoantigens by CD133 expression in the TCGA cohort.These results overall suggest CD133-high ER+/HER2-BC is associated with less DNA repair activity.

CD133-high tumors were associated with better response to neoadjuvant chemotherapy (NAC) and survival
Given that CD133 expression was associated with less DNA repair but enhanced in ammation and immune response, we hypothesized that CD133-high ER+/HER2-BC may be vulnerable to cellular insult and respond better to chemotherapy.We utilized GSE25066, GSE20194, and GSE32646 cohorts, which included ER+/HER2-BC patients who underwent NAC: taxane and anthracycline in GSE25066, paclitaxel, 5uorouracil, cyclophosphamide, and doxorubicin in GSE20194, and paclitaxel followed by 5-uorouracil, epirubicin, cyclophosphamide (P-FEC) combination in GSE32646.CD133-high tumors responded signi cantly better than CD133-low tumors with pathological complete response (pCR) rate of 6.6% vs 14.9% after NAC in the GSE25066 cohort (n = 278, p = 0.03), the largest cohort of the three.Although statistical signi cance was not achieved, the trend that CD133-high tumors achieved a higher pCR rate was consistent in GSE20194 with a pCR rate of 3.1% vs 7.7% and GSE32646 cohorts with a pCR rate of 3.7% vs 14.3% (n = 129 and 55, respectively, Fig. 6A).CD133-high tumors were associated with better disease-free survival (DFS, p = 0.015) and overall survival (OS, p = 0.05) in the TCGA cohort, and both DFS (p = 0.027) and OS (p < 0.001) were validated by the METABRIC cohort (Fig. 6B).

Discussion
In summarizing our study, CD133 was exclusively expressed in cancer cells compared to stromal and immune cells and was associated with other CSC markers (CD24, NOTCH1, DLL1, and ALDH1A1), as well as enriched WNT/β-Catenin, Hedgehog, and NOTCH signaling, validating CD133 as a CSC marker.We found that the expression of the cancer stem cell marker CD133 is associated with reduced cell proliferation and DNA repair, yet heightened in ammation, and is linked to a more favorable outcomes after NAC and improved survival among ER-positive/HER2-negative BC patients.
Based on the fact that the CSCs are less proliferative than other types of cells in the tumor, we expected the expression of the CSC marker CD133 to be related to less cell proliferation.However, Joseph et al. reported that CD133 is associated with greater cell proliferation, less response to NAC, and worse prognosis in invasive BC [17].Our data was consistent with our expectation and contradicted Joseph et al.'s report, which analyzed invasive BC as a whole, as opposed to our study that speci cally investigated the ER+/HER2subtype based on the understanding that biology and characteristics are signi cantly different by subtypes.
It may be worth noting that CD133 protein expression evaluated by ow cytometry did not correlate with its mRNA expression level [54].
Our team, alongside other investigators, has reported an association between DNA repair enhancement and cell proliferation [18,52].The same trend has been shown by Oshi et al. in hepatocellular carcinoma [45], who found that enhanced DNA repair was associated with a worse prognosis and more cell proliferation but not with the fraction of immune cell in ltration nor immune response.Consistently, high expressions of RAD51 [18] or BRCA2 [52], both of which play a critical part in DNA repair, were associated with increased cell proliferation and aggressive biology in BC.Given that CD133-high BC was associated with less cell proliferation, its association with less DNA repair may explain its mechanism.On the other hand, Cheah et al. reported that CD133-marked putative CSCs correlated with pro cient mismatch repair [55], thus multiple mechanisms may be involved in the relationship between CD133 expression and DNA repair.
We also found that in ammation and immune response were enriched in CD133-high TME.The number of many types of in ltrating cells in TME were not signi cantly different between high and low CD133 tumors and, interestingly, some types of cells were negatively correlated with CD133 expression.However, cytolytic activity, which represents the overall activity of the immune cells and thus cancer immunity, was signi cantly and positively correlated with high CD133 expression.It remains unclear precisely how and, after all, whether low DNA repair leads to high in ammation in the TME.Several previous studies reported that in several cell lines and cancer types, low DNA repair led to a higher neoantigen load, therefore high immunogenicity, and, as a result, more lymphocytes in ltration and richer in ammation [38,56].Nevertheless, while we observed slightly higher silent and non-silent mutation rates in CD133-low tumors, no discernible difference was noted in SNV neoantigens and indel neoantigens based on CD133 expression.This observation diminishes the persuasiveness of that explanation in our study.However, there are still several possible mechanisms that impaired DNA repair results in richer in ammation in TME, although not in higher loads of neoantigens.One is through the accumulation of DNA damage and subsequent activation of several signaling pathways such as the ATM/ATR pathway and the DNA/PK pathway, which can lead to the activation of NFκB and other proin ammatory transcription factors that induce the production of pro-in ammatory cytokines, chemokines, and growth factors by cancer cells and surrounding immune cells [57].This hypothesis is further supported by the fact that TNFα signaling via NFκB is enriched in CD133-high tumors in our study (Fig. 5C), which is also known to enrich in ammation [58].Another possible explanation is that impaired DNA repair results in the accumulation of damaged or misfolded proteins in the endoplasmic reticulum of cancer cells, leading to endoplasmic reticulum stress and activation of the unfolded protein response (UPR).The UPR can also activate pro-in ammatory pathways, leading to the production of pro-in ammatory cytokines and chemokines [59][60], although several previous studies suggest that the chronic activation of UPR is considered a mechanism of tumor progression [61-63], going against better DFS and OS observed in our study, which may be due to the difference in cohorts.
Finally, and most importantly, we found that CD133-high BC carried a better survival outcome.We cannot help but speculate that while CD133-high tumors have a poor prognosis, as previous studies suggest accordingly with the cancer stem cell concept that involves self-renewal, differentiation, and the initiation of tumorigenesis, CD133-low tumors may carry even worse prognosis due to their ability to repair DNA, more cell proliferation, decreased immunogenicity, hence less response to NAC and worse survival outcome.The correlation between in ammation and pCR in invasive BC has been proposed by Hatzis et al. [36].
Furthermore, less cell proliferation in CD133-high BC may explain better prognosis, going along with some prior ndings that showed an association between more expression of genes related to proliferation such as G2M [26-27], E2F [23,25], and MYC [64] and worse prognosis in ER+/HER2-BC.In summary, the association of elevated CD133 expression in breast cancer cells with diminished DNA repair, improved response to NAC, and enhanced survival underscores CD133's potential role as a marker for predicting the treatment response in ER+/HER2-subtype BC.
Our method is subject to certain limitations inherent in the essentially retrospective nature of this study.
Firstly, the utilization of patient sample data from a public domain means the analysis relies on information that had previously been cataloged, resulting in limited granularity.Secondly, the origin of the sample within the bulk tumor may vary among patients, even though the spatial relationship of CSCs in the bulk tumor may

Figure 2 The
Figure 2