Abundance of reactive oxygen species (ROS) is associated with tumor aggressiveness, immune response, and worse survival in breast cancer

Reactive oxygen species (ROS) are oxygen-containing molecules that have high reactivity and play roles in protection or harm the cancer cells. We aimed to clarify the clinical relevance of ROS in breast cancer (BC) tumor microenvironment (TME). We hypothesized that it is associated with worse BC patient outcomes. ROS score was generated by Gene Set Variation Analysis of Hallmark ROS pathway gene set and a total of 6245 BC patients were analyzed. High ROS BC significantly enriched cell proliferation-related gene sets (MYC targets v1 and v2, G2M checkpoint, E2F targets), pro-cancer-related gene sets (DNA repair, unfolded protein response, MTORC1 signaling, PI3K/AKT/MTOR signaling, glycolysis, and oxidative phosphorylation), immune-related gene sets (inflammatory response, allograft rejection, interferon-α and γ responses, complement, and IL6/JAK/STAT3 signaling), and infiltrated immune cells (CD4+ memory and CD8+ T cells, Th1 and Th2, dendritic cells, Tregs, M1 and M2 macrophages) and B cells, as well as elevated cytolytic activity consistently in both METABRIC and GSE96058 cohorts. Cancer cells were the major source of ROS in BC TME of single-cell sequence (GSE75688) cohort. High ROS was associated with intratumor heterogeneity, homologous recombination defects, mutation rates, and neoantigens, and with clinical aggressiveness in AJCC stage, Nottingham grade and Ki67 expression, as well as worse overall survival in both GSE96058 and METABRIC, and with worse disease-specific survival in METABRIC. Abundant ROS in BC patients is associated with abundant mutations, aggressive cancer biology, immune response, and worse survival.


Introduction
The body maintains its homeostasis and health by balancing oxidative system that produce energy and antioxidant system that prevent the negative effects of excess oxidation, also known as oxidative stress [1,2]. Recently, it has become clear that excessive oxidation and oxidative stress directly damage DNA and are deeply involved not only in carcinogenesis but also in cancer progression, including cancer growth, invasion, and metastatic spread.
Reactive oxygen species (ROS) are oxygen-containing molecules that have high reactivity. Hydroxyl (OH*) and superoxide (O 2 *) free radicals both comprise ROS along with hydrogen peroxide (H 2 O 2 ). ROS are physiologically produced in the mitochondrial electron-transfer system, peroxisomes, and phagosomes in the cells, where they contribute to energy production and phagocytosis [3]. ROS is also produced in small amounts by the NADPH oxidase (Nox) family at the plasma membrane and is involved in a number of intracellular signaling pathways, such as Ras, c-jun N-terminal kinase (JNK), p38 and mitogen-activated protein kinase (MAPK), and PI3K/AKT/mTOR pathways. Recently, it has been shown that ROS is involved in cell proliferation [4,5] as well as autophagy [6], apoptosis [7], and inflammation via the NLRP3 inflammasome and nuclear factor-kB (NF-kB) pathway [8,9]. Cells experience oxidative stress if either the ROS production increases or the number of scavenged ROS decreases [10]. ROS have been reported to be increased in breast cancer cells as well as several immune cells in tumor microenvironment (TME), and, therefore, is an important factor in the TME [11]. It likely plays a multifaceted role in tumor progression and metastasis by ensuring bidirectional communication among various components [12]. Therefore, it has been suggested that ROS may be a potential therapeutic target in breast cancer [13]. While ROS have physiological functions as described, they can be harmful by directly damaging DNA during oxidative stress. Our group and others have shown that abundant DNA damage causes higher mutation load that leads to carcinogenesis and aggravation of cancer [14]. Ongoing aerobic glycolysis followed by pyruvate oxidation in the mitochondria (the Warburg effect) generates ROS which in turn increases the activity of oncogenes and receptors. This results in stimulation of oxidizing enzymes and/or growth factor-dependent pathways, thus inducing instability of genes [10]. For this reason, cells are equipped with antioxidant system that either take in antioxidants or produce them by enzymes. Although excessive induction of oxidative stress is expected to prevent and/or treat cancer since cancer cells have higher ROS than normal cells as a base line, ROS may aggravate cancer by promoting cell proliferation. To this end, the clinical relevance of abundant ROS in breast cancer remains unclear.
In this study, we generated a ROS score using ROSrelated gene expressions in human breast cancer to quantify the abundance of ROS and investigated its clinical relevance. Gene Set Variation Analysis (GSVA) with 200 ROS-related genes defined by Molecular Signatures Database (MSigDB) Hallmark collection was used to generate the ROS score [15]. This approach is commonly used in genetic research because pathways involve various genes, and a single gene expression may fail to grasp the whole picture. For instance, we found the association between G2M checkpoint pathway and pancreatic cancer survival and drug response [16,17], angiogenesis and aggressiveness of gastric cancer and worse survival [18], and DNA repair signaling [19] as well as unfolded protein response [20] and worse survival in liver cancer. We have demonstrated in breast tumor that pathways such as early estrogen response [21], KRAS signaling [22], angiogenesis [23], inflammation [24], thermogenesis [25], adipogenesis [26], and apoptosis [27] are associated with survival, and G2M checkpoint pathway [28] and E2F pathway [29] predict response to neoadjuvant chemotherapy in Luminal A subtype. In this study, we hypothesized that abundant ROS is significantly associated with cancer aggressiveness and tumor microenvironment, as well as patient survival.

Data acquisition
Clinical information and gene expression were obtained on 1,903 breast cancer patients from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) cohort, as we previously reported [30,31]. The GSE96058 cohort consists of full transcriptome profile of resected tumors from 3,273 breast cancer patients whose clinical data were obtained from the Swedish Breast Cancer Analysis Network (SCAN-B) [32]. Transcriptomic data were also obtained on 1069 female breast cancer patients from The Cancer Genome Atlas (TCGA) cohort [33]. The GSE75688 cohort has single-cell RNA-sequencing data of cancer cells, myeloid cells, immune cells, and stromal cells in breast cancer [34], which was obtained from Gene Expression Omnibus.

Gene set enrichment analysis
The difference in activated signaling pathways between lowand high-ROS score groups was investigated by Gene Set Enrichment Analysis (GSEA) [46] using GSEA Java software (version 4.2) with MSigDB Hallmark gene sets [15]. False discovery rate (FDR) < 25% was defined as statistically significant, as recommended by GSEA.

Other statistical analyses
All analyses and data plots were generated using Microsoft Excel (version 16) and R software (version 4.0.1). The analysis of the comparison of groups used the Kruskal-Wallis test, the Mann-Whitney U test, or the Fisher exact test. Logrank test and the Kaplan-Meier plot were used for survival analyses. Values of p < 0.05 generally indicate a statistically significant difference.

Breast cancer with a high reactive oxygen species (ROS) score enriched cell proliferation and other pro-cancerous gene sets
The Gene set variation analysis (GSVA) algorithm on MSigDB was used to define the ROS score (Table S1), similar to how we defined the other scores in our previous publications [16][17][18][19]. We first investigated the underlying mechanism involved in the score by conducting Gene Set Enrichment Analysis (GSEA) on two independent breast cancer cohorts, GSE96058 and METABRIC. We found that tumors with a high ROS score significantly enriched expression of Hallmark cell proliferation gene sets; MYC targets v1 and v2, G2M checkpoint, and E2F targets (Fig. 1a), as well as other pro-cancerous signaling; glycolysis, MTORC1 signaling, PI3K/AKT/MTOR signaling, oxidative phosphorylation, DNA repair, and Unfolded protein response, in both the cohorts consistently (Fig. 1b). To this end, there is a significant association of high ROS score with both cell proliferation and also other pro-cancer signaling and responses in breast cancer. Fig. 1 Cell proliferation-related and other pro-cancer signalingrelated gene sets enriched to breast cancer with a high ROS score consistently in both GSE96058 and METABRIC cohorts. a Enrichment plots of cell proliferation-related gene sets (MYC targets v1 and v2, G2M checkpoint, E2F targets), and b other pro-cancerous gene sets (glycolysis, PI3K/AKT/MTOR signaling, DNA repair, Unfolded protein response, oxidative phosphorylation, and MTORC1 signaling) in the GSE96058 and METABRIC cohorts. The classical gene set enrichment analysis (GSEA) method was used to determine the NES (normalized enrichment score) and the FDR (false discovery rate) where the cut-off to divide two score groups was defined as median value

Cancer cells were the major source of ROS in tumor microenvironment (TME), and immune-related gene sets significantly enriched to high ROS breast cancer
To identify which cell type is the major source of ROS in the tumor microenvironment (TME), the score was measured in a single-cell sequence cohort (GSE75688) that differentiate transcriptome from cancer cells, myeloid cells, T cells, B cells, or stromal cells. We found that cancer cells are the major source of ROS in TME ( Fig. 2a; p < 0.001). Next, we investigated the relationship between ROS and immune response in TME of breast tumor. We found that immune-related gene sets, allograft rejection, interferon (IFN)-γ response, IFN-α response, inflammatory response, complement, and IL6/JAK/STAT3 signaling, were significantly enriched to breast cancer with elevated ROS score in both METABRIC and GSE96058 cohorts consistently (Fig. 2b). To this end, immune response was significantly associated with high ROS score in breast cancer.

Immune cells were universelly infiltrated in ROS high tumor immune microenviornment (TIME) where cytolytic activity (CYT) was elevated
Since immune response was enhanced in ROS high tumor, to investigate which type of immune cell is infiltrated in breast cancer TIME was of interest. The xCell algorithm was used to estimate immune cell infiltration, in a similar fashion as previous publications [39,42,[47][48][49][50][51]. We found that in high ROS breast cancer, there was significant infiltration of not only pro-cancer immune cells; M2 macrophages, T helper type 2 (Th2) cells, and regulatory T cells, but also anti-cancer immune cells; CD4 + memory T cells, CD8 + T cells, T helper type 1 (Th1) cells, M1 macrophages, and dendritic cells (DC), and B cells, consistently in both GSE96058 and METABRIC cohorts ( Fig. 3a; all p < 0.001). Furthermore, they were significantly associated with high CYT (Fig. 3b; both p < 0.001). To this end, immune cells are universally infiltrated with high cytolytic activity in high ROS breast cancer. The median value was used as a cut-off to divide high vs low score groups. NES (normalized enrichment score) and FDR (false discovery rate) were determined with the classical gene set enrichment analysis (GSEA) method

High ROS score was significantly associated with high level of intratumor heterogeneity, homologous recombination deficiency (HRD), and mutation load in breast cancer
Tumor-infiltrating immune cells are known to be attracted to TME through neoantigens generated by high tumor mutational burden [14]. To this end, investigating the relationship of the ROS score with the intratumor heterogeneity, HRD, and mutation load in breast cancer was of interest. We found that breast cancer with a high ROS score was significantly associated with HRD and intratumor heterogeneity, as well as mutation load-related score; single nucleotide variation (SNV) neoantigens, silent and non-silent mutation rates, in the TCGA cohort ( Fig. 4; all p < 0.001). This result is in agreement with the notion that immune cells are attracted to cancer cells with high mutation load, and our observation that high ROS score breast cancer is associated with immune response and immune cell infiltration is commonly seen in aggressive cancer.

A high ROS breast cancer was significantly associated with clinical aggressiveness
To study the clinical relevance of the ROS score in breast cancer, its association with breast cancer subtypes, AJCC ROS score groups in the GSE96058 and METABRIC cohorts. b Box plots of cytolytic activity score (CYT) by low and high ROS score groups in both cohorts. The median value was used as a cut-off to divide two score groups. P-values were calculated by the Mann-Whitney U test Fig. 4 Association of the ROS score with the intratumor heterogeneity, homologous recombination deficiency (HRD), and mutationrelated score in the TCGA cohort. Boxplots of homologous recombination defects (HRD), and intratumor heterogeneity, and the mutation-related scores; silent and non-silent mutation rate, fraction altered, single nucleotide variation (SNV) neoantigens, by low and high ROS score groups. The median value was used as a cut-off to divide two score groups. P-values were calculated by the Mann-Whitney U test (American Joint Committee on Cancer) staging including the lymph node category, the Nottingham histological grade were investigated in the GSE96058 and METABRIC cohorts. In the METABRIC cohort, the high score was significantly associated with triple-negative breast cancer (TNBC) subtype, advanced stage and lymph node metastasis, and higher grade (Fig. 5, all p < 0.001). GSE96058 cohort was used to validate these results (all p < 0.001). Therefore, ROS score was associated with clinical parameters of tumor aggressiveness.

High ROS ER-positive/HER2-negative breast cancer was significantly associated with worse survival, but not the other subtypes
Given that ROS score was significantly different by breast cancer subtypes, we hypothesized that there may be a different relationship of each subtype with ROS. Therefore, overall survival (OS) in the GSE96058 and METABRIC cohorts and disease-specific survival (DSS) in the METABRIC by subtypes were analyzed. High ROS score was significantly associated with worse OS in ER-positive/HER2-negative subtype and the whole group, but not in the other subtypes in the GSE96058 cohort (Fig. 6). OS and DSS in METABRIC cohort validated these results. To this end, high ROS was associated with worse survival outcome only in ER-positive/ HER2-negative breast cancer subtype. Furthermore, we investigated the association of ROS score with infiltrating immune cells in ER-positive/HER2-negative breast cancer. We found that high ROS ER-positive/HER2-negative breast cancer was also associated with high infiltration of Th1 and Th2 cells, CD4 + memory T cells, and M1 macrophages, consistently in both cohorts (Fig. S1).

Discussion
In this study, we generated a ROS score utilizing Hallmark ROS pathway gene sets by GSVA algorithm and quantified the abundance of ROS in total of 6245 breast cancer patients. We found that high ROS breast cancer enriched cell proliferation gene sets; MYC targets v1 and v2, G2M checkpoint, E2F targets, as well as other pro-cancerous gene sets; glycolysis, PI3K/AKT/MTOR signaling, DNA repair, unfolded protein response, MTORC1 signaling, and oxidative phosphorylation, consistently in both GSE96058 and METABRIC cohorts. Cancer cells were the major source of ROS among the cells in TIME, including myeloid cells, T cells, B cells, and stromal cells, based on single-cell sequence cohort. High ROS breast cancer enriched immune gene sets; inflammatory response, IL6/JAK/STAT3 signaling, IFN-γ response, IFN-α response, allograft rejection, Fig. 5 Breast cancer with a high ROS score was significantly associated with advanced stage, lymph node metastasis, higher pathological grade, and TNBC. Box plots of the score by American Joint Committee on Cancer (AJCC) stage in the METABRIC, and lymph node metastasis status, Nottingham pathological grade, and breast cancer subtypes in the METABRIC and GSE96058 cohorts. P values were calculated by Kruskal-Wallis and Mann-Whitney U test and complement, in both the cohorts consistently. High ROS breast cancer was significantly associated with high infiltration of pro-cancer immune cells; Tregs, Th2 cells, and M2 macrophages, as well as anti-cancer immune cells; M1 macrophages, CD4 + memory T cells, CD8 + T cells, dendritic cells, Th1 cells, and B cells in both cohorts. Furthermore, high ROS breast cancer was significantly associated with enhanced CYT. High ROS score was significantly associated with high levels of homologous recombination defects and intratumor heterogeneity, along with mutation-related score, including single nucleotide variant and indel neoantigens, fraction altered, silent and non-silent mutation rate in the TCGA cohort. Clinically, high ROS breast cancer was significantly associated with advanced cancer by AJCC stage, lymph node metastasis, and Nottingham grade. Furthermore, high ROS was significantly associated with worse OS in both cohorts, and DSS in the METABRIC cohort in ERpositive/HER2-negative subtype.
ROS attack DNA, thus inducing DNA damage which could either inhibit or induce transcription, replication errors, signal transduction pathways and genomic instability, resulting in carcinogenesis [52]. Several risk factors of breast cancer are related to ROS production and oxidative stress, thus, supporting the role of ROS involvement in breast cancer pathogenesis. Aging is associated with the production of ROS, like H 2 O 2 and HOCl, that induce mutations in mitochondrial DNA [53]. Mutations in BRCA1 and BRCA2, the breast cancer susceptibility genes, are also implicated in breast cancer. BRCA1 upregulates the expression of multiple genes that are involved in the cytoprotective antioxidant response, such as, glutathione-S-transferase (GST), oxidoreductase, glutathione peroxidase, and other antioxidant genes. A potential source of ROS, ionizing radiation, can induce irreversible DNA damage, thus initiating carcinogenesis. Female breast tissue is particularly susceptible to the carcinogenic effects of ionizing radiation, especially before it is fully differentiated before first child birth [54]. Estrogens are known key player in all stages of breast cancer, where estrogen receptor (ER) independent effects result from oxidative metabolism of estradiol to genotoxic metabolites, which then cause direct damage to DNA [55]. This supports our observed finding that ROS was associated with aggressive breast cancer, as it was associated with higher stage and pathological grade.
We also observed that ROS was associated with higher mutation load in tumors. It is known that ROS can cause unpaired or misrepaired DNA damage in cellular division leading to mutations. ROS induce mutations that commonly cause G to T transversions [56], which are the most frequent mutations in the human tumors in the p53 suppressor gene [57]. The presence of oxidative DNA damage in cancer supports the fact that initiation of cancer may at least partly be triggered by ROS [58]. In our study, tumors with high ROS were shown to be enriched for cell proliferation-related gene sets. This is consistent with previous reports that ROS in high amounts is associated with uncontrolled cell proliferation [4,5].
Our study showed that all the immune cells were higher in ROS high breast cancers. Proper T cell activation, proliferation, and differentiation depend on low levels of ROS [59,60]. Tregs are able to persist in the oxidant environment, and therefore, a greater number of Tregs have been detected at tumor sites. Tregs have high antioxidant capacity, and are therefore less sensitive to oxidant stress-induced cell death [61]. In breast cancer models, ROS were essential for tumor-associated macrophages (TAMs) to invade the Fig. 6 ER-positive/HER2-negative breast cancer with a high score is significantly associated with worse survival in the GSE96058 and METABRIC cohorts. Forest plots with log-rank p-values and hazard ratio (HR) of OS in the GSE96058 cohort, and OS and DSS in the METABRIC cohort between low and high ROS score in the whole, ER + /HER2-, HER2 + and TNBC subgroups. Log-rank test was used to calculate p values. DSS, disease-specific survival; ER, estrogen receptor; HER2, human epidermal growth factor 2; OS, overall survival; TNBC, triple-negative breast cancer TIME and acquire a M2 pro-tumorigenic phenotype [62]. ROS also results in impairment of antigen presentation by dendritic cells [63][64][65]. However, we observed higher M1 macrophages as well. This could be explained by a general inflammatory state induced by the ROS in the TIME [66]. Our result that cytolytic activity was elevated in ROS high tumor agrees with a recent report that suggested that ROS contribute to immunogenic cell death and T-cell-mediated immunity [67], and it was shown that immune cells infiltrated in breast cancer with high ROS, not because of the characteristics of TNBC which have high ROS, as it has been shown that even ER-positive/HER2-negative breast cancer with high ROS attracts high level of immune cells. Furthermore, we found that not only tumor cells but also myeloid cells have high ROS score in breast cancer. Since there are several types of myeloid cells with different functions, it is of interest to investigate which type of myeloid cells has high ROS in breast cancer in the future. It is well known that breast cancer with high infiltration of immune cells has better prognoses, especially in TNBC [68]. On the other hand, the type of infiltrating immune cells is the key in relationship to patient prognoses [69]. We have previously reported that breast cancers with high CD8 + T cell infiltrations have better prognoses [51]. On the other hand, breast cancer with high Treg infiltration was not associated with patient survival, although they showed significant association with NAC response [42]. In the case of high ROS breast cancer, we found that overall pro-cancerous immune cells appear to outperform the anti-cancer immune cell infiltration. The same trend was seen in immune response. IFN-γresponse and Inflammatory response are known to be associated with better outcome, particularly in TNBC [24], whereas IL-6 is known to negatively affect anti-immune response [70], and we observed that both are elevated in high ROS tumor. Based upon our observation in high mutation load breast cancer, where aggressive biology and high immune response are both counterbalanced [14], we cannot help but speculate that IL-6 is outperforming the favorable effects of IFN-γresponse and Inflammatory response in high ROS tumor.
Unlike the ER-positive subtype, a high ROS score did not correlate with worse survival in either TNBC or HER2positive subtype. Given that ROS score was significantly high in TNBC as well as HER2-positive breast cancer, and the fact that both subtypes are biologically aggressive and initially respond to chemotherapy better than ER-positive subtype, we speculate that the ROS signaling alone was not strong enough to predict clinical outcomes of both tumor subtypes. We speculate that ROS score may have the potential to be used as a predictive biomarker, and is a useful tool to investigate new drug treatment for breast cancer.
There were several limitations in our study. The degree of ROS signaling was determined by GSVA algorithm with hallmark collection in MSigDB, that may or may not include all the genes related to ROS pathway. Since this is a retrospective study that utilized multiple publicly available large patient cohorts, we did not have access to some of the pertinent clinical data including the details on systemic treatments received and it is assumed that all the patients received "Standard of Care." Our results are based on tumor transcriptomics alone and we are incapable of generating any data on mechanistic role of ROS activity. The novelty of our study is to assess the role of ROS in the TIME using a score combining multiple genes. A prospective trial is needed to confirm the utility of ROS score as a predictive biomarker in breast cancer.

Conclusions
In conclusion, ROS abundance is associated with cancer aggressiveness, enhanced immune response, and with worse survival in ER-positive/HER2-negative breast cancer.
Author contributions Conceptualization was performed by AY, IE, KT, MO, and RM. Methodology was performed by KT, MO, and YT. Formal Analysis was performed by MO. The first draft of the manuscript was written by MO, and all authors commented on previous versions of the manuscript. Supervision and project administration were performed by KT. All authors read and approved the final manuscript.
Funding This research was supported by National Institutes of Health, USA grant number R37CA248018, R01CA250412, R01CA251545, R01EB029596, as well as US Department of Defense BCRP grant number W81XWH-19-1-0674 and W81XWH-19-1-0111 to K.T. National Cancer Institute, cancer center support grant P30CA016056 supports Roswell Park Comprehensive Cancer Center. Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under award numbers KL2TR001413 and UL1TR001412. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Data availability
The datasets generated during and analyzed during the current study are available from the original source as they are publicly available deidentified databases.

Conflict of interests
The authors have no relevant financial or nonfinancial interests to disclose.
Ethical approval Institutional review board (IRB) approval at Roswell Park Comprehensive Cancer Center (Buffalo, New York, United States of America) was waived as publicly available deidentified databases were used. Consent to participate Not applicable.