Prognostic Predictors Do Not Equate to Predictors of Chemotherapy Benet in HR-Positive, HER2-Negative Breast Cancer: A Population-Based Study

Background: Patients with hormone receptor positive (HR+), human epidermal growth factor receptor 2 negative (HER2-), early breast cancer had a favorable prognosis. We conducted a study to identify patients beneting from adjuvant chemotherapy in this cohort. Methods: Patients with HR+, HER2-, early breast cancer were identied from the SEER database and were classied into the chemotherapy and non-chemotherapy groups. A propensity score matching (PSM) was performed and subgroup analyses were conducted in the after-matched patients to explore which subgroups beneted from chemotherapy. Gene expression RNA-seq data and phenotypic data of HR+, HER2-, early breast cancer were identied from the TCGA database, differentially expressed genes (DEGs) from paired tumor and normal samples were identied, and DGEs were ltered through interactions between chemotherapy and gene expression level to obtain genes associated with chemotherapy benets. A chemotherapy predictive clinical score (CPCS) and a chemotherapy predictive genetic score (CPGS) were established respectively based on variables or genes identied. Results: In total, 86158 patients with HR+, HER2-, early breast cancer were identied from the SEER database and there were 15259 patients in each after-matched group. Race, T stage and N stage were associated with chemotherapy benets and entered into the establishment of CPCS. Amongst the 29 normal samples and 254 tumor samples of HR+, HER2-, early breast cancer in the TCGA database, 29 pairs of samples were identied to lter 1709 DEGs, of which 84 DGEs were associated with chemotherapy benets. SOWAHA (cid:0) RP11-205M3.3 (cid:0) IRX6 (cid:0) PPBP and EMX1 were chosen to establish the CPGS according to their P-values for interaction. Conclusions: Nodal status was an important indicator for chemotherapy decisions in HR+, HER2-, early breast cancer and most patients without nodes involved could be spared chemotherapy. In addition, a CPCS and a CPGS were established respectively using a completely new method to identify patients


Background
Breast cancer remains the most common cancer in female, while prognosis of certain patients with breast cancer is favorable due to adjuvant therapy, such as adjuvant chemotherapy [1]. With deeper understanding of biological and pathological characteristics of breast cancer, there has been a trend towards de-escalating of adjuvant chemotherapy for breast cancer in recent years [2]. For example, the PLANB trial suggested that the TC6 regimen was comparable with the EC-T regimen in disease free survival (DFS) for patients with high-risk HER2-breast cancer, introducing the "non-anthracycline regimen" of adjuvant chemotherapy in breast cancer [3].
Studies regarding endocrine therapy, such as the SOFT study and the TEXT study, emphasized the effectiveness of endocrine therapy in early breast cancer, which also has lowered the position of adjuvant chemotherapy [4][5][6][7]. Studies about neoadjuvant therapy have proved that hormone receptor positive (HR+), HER2-breast cancer, even early breast cancer, was insensitive to neoadjuvant chemotherapy, and the pathologic complete response (pCR) rate was lower than 20%, which was much lower than the rates of HER2 + breast cancer or triple negative breast cancer [8].
When choosing patients who can be safely spared the use of adjuvant chemotherapy, clinicopathological characteristics and genetic biomarkers are often taken into consideration. From the clinicopathological perspective, risk factors, such as the tumor size and nodal statuses, were indicators for chemotherapy in previous studies. On the other hand, clinicians often face dilemma when making clinical decisions about chemotherapy, especially for patients with 1-3 positive nodes.
Genetic biomarkers play a role in this situation. According to the American Society of Clinical Oncology (ASCO) Clinical Practice Guideline, many gene assays such as the 21-gene recurrence score (Oncotype DX), the 12-gene risk score (EndoPredict), and the PAM50 risk of recurrence (ROR) score were recommended for guiding decisions on adjuvant chemotherapy, while in the National Comprehensive Cancer Network (NCCN) guidelines, only the Oncotype DX was suggested to be capable of predicting chemotherapy bene ts [9][10][11]. However, the 21-gene assay is initially designed for predicting prognosis of estrogen receptor-positive (ER+), N0 breast cancer, and evidence for its effectiveness in predicting chemotherapy bene ts in HR+, HER2-, early breast cancer is lacked because no prospective study has validated that patients with high 21-gene scores bene t from the addition of adjuvant chemotherapy (all patients with HR+, HER2-, early breast cancer with high 21-gene scores were assigned to receive adjuvant chemotherapy except for those who were not adherent to trial designs in previous studies) [12,13].
With these considerations, we performed a population-based study to identify patients with HR+, HER2-, early breast cancer who would bene t from adjuvant chemotherapy. In addition, we aimed to identify genes associated with chemotherapy bene ts in this type of cohort. We also established two chemotherapy predictive scores (one score based on clinicopathological characteristics and another based on gene expression levels) for guiding decisions on adjuvant chemotherapy.

Data sources from the SEER database
Patients with HR+, HER2-, early (AJCC 7th stage I-III) breast cancer were identi ed from the Surveillance, Epidemiology, and End Results (SEER) database. The SEER database collects information on cancer from registries covering around 34.6% of the US patients, and we obtained permission to access the database with the reference number 12296-Nov2018. Our study was approved by the review board of the A liated Jinhua Hospital, Zhejiang University School of Medicine.
Criteria for including the subjects were as follows: patients diagnosed during 2010-2014 (because HR and HER2 statuses were not recorded in the database until 2010); patients with age of 20-80 years old; patients undergoing cancer-directed surgery; patients with tumor site ICD-O-3 codes C-50.1-50.5, C-50.8 and histology ICD-O-3 codes 8500, 8520, 8521, 8522, 8523, 8524 (In ltrating duct carcinoma, Lobular carcinoma, In ltrating ductular carcinoma, In ltrating duct and lobular carcinoma, In ltrating duct mixed with other types of carcinoma, and In ltrating lobular mixed with other types of carcinoma).
Criteria for excluding the subjects were as follows: patients with incomplete recordings, such as the AJCC stage; patients with multiple primary tumors; patients with survival time less than 1 month (because these patients were at risk of death from perioperative complications).
The age and year at diagnosis, marital status, race, histological type, differentiated grade, AJCC 7th T stage and N stage, chemotherapy information, causes of death and survival months were retrieved from the SEER database. Patients was classi ed into the chemotherapy group and the non-chemotherapy group according to whether chemotherapy is performed or not. The chemotherapy group included the patients receiving chemotherapy (irrespective of whether receiving hormone therapy or not) and the nonchemotherapy group included the patients receiving hormone therapy or the patients who did not receive any therapy after surgery.

Propensity score matching
To reduce biases from confounders and achieve balance between the chemotherapy and nonchemotherapy groups, a propensity score matching (PSM) was performed. Based on demographic and clinicopathological characteristics (i.e., age, marital status, race, histological type, differentiated grade, AJCC 7th T stage and N stage), patients were matched with a 1:1 ratio using the nearest neighbor method. Further analyses were conducted on the after-matched patients [14].

Survival analysis and subgroup analysis
Considering that the prognosis of HR+, HER2-, early breast cancer is favorable and the risk of non-cancer speci c death (non-CSD) cannot be ignored, we chose competing risk analyses to calculate the cumulative incidences of cancer speci c death (CSD) using the Gray tests [15]. The time to CSD was calculated from the date of diagnosis to the date of death of cancer; the time to non-CSD was calculated from the date of diagnosis to the date of death of other causes. CSD was regarded as the outcome event, and non-CSD was regarded as the competing event. In the construction of prognostic model, subdistribution hazard ratios (SHRs) and 95% con dential intervals (CIs) were calculated.
To explore whether chemotherapy improved prognosis in a certain group, subgroup analyses were conducted in patients strati ed by age, race, marital status, histological type, differentiated grade, T stage and N stage respectively. A "forest plot" was drawn to present the results. The interaction between chemotherapy and age, race, marital status, histological type, differentiated grade, T stage or N stage were calculated respectively using the generic inverse variance method, which has been reported in a previous study, and P-values for interaction were calculated [16]. 4. Data sources from the TCGA database Gene expression RNA-seq data and phenotypic data of breast cancer were downloaded from The Cancer Genome Atlas (TCGA) database (https://genome-cancer.ucsc.edu/). Samples from patients with HR+, HER2-, early breast cancer having both complete RNA-seq data and complete phenotypic data entered into further analyses. In total, 29 normal samples and 254 tumor samples were identi ed.

Differentially expressed genes
Amongst the 29 normal samples and 254 tumor samples, tumor samples and normal samples were paired up according to the TCGA barcode, and 29 pairs of samples were identi ed. Differentially expressed genes (DEGs) between the paired samples were ltered using the R package "DESeq2". We set |Log 2 fold change (FC)| > 2.0 and false discovery rate (FDR) < 0.05 as the thresholds to obtain the DEGs.

Chemotherapy predictive genes
For each one of the identi ed DEGs, patients were classi ed into the high and the low expression groups according to the median of the expression count. Because the TCGA database did not record causes of death, we could not perform competing risk analyses and chose Cox proportional hazard analyses instead. Death or tumor recurrence were regarded as the DFS event, and the time to DFS was calculated.
Univariate Cox proportional hazard analyses for chemotherapy were performed in the high expression group and the low expression group respectively, and hazard ratios (HRs) and 95% CIs were calculated.
The interaction between chemotherapy and gene expression level as a categorical variable was examined, and the P-value for interaction and the I 2 value were calculated. We set the P-value for interaction < 0.05 as the threshold to identify genes associated with chemotherapy bene ts, which were called "Chemotherapy predictive genes" (CPGs) in our study. With the I 2 value increasing, the statistical strength of the association between the gene expression level and magnitude of chemotherapy effect increases.

Chemotherapy predictive score
In the establishment of the chemotherapy predictive clinical score (CPCS), a variable was regarded to be associated with chemotherapy bene ts and entered into the establishment when the variable's P-value for interaction was less than 0.05. In the establishment of the chemotherapy predictive genetic score (CPGS), we chose 5 most statistically signi cant genes according to their P-values for interaction to establish the CPGS.
The processes of establishing the chemotherapy predictive score were as follows: First, coe cient values of chemotherapy in each subgroups of a variable (demographic and clinicopathological characteristics in the CPCS and gene expression levels in the CPGS) were calculated respectively; Second, for coe cient values from the same variable, the values were transformed as the highest value was equal to 0 and the other values were equal to the absolute difference between their values and the highest; then, the transformed coe cient values were put together, and the highest transformed values was set as 100 in scale and the lowest transformed values was set as 0 in scale, with other transformed values were set as scales between 0 and 100 proportionally (Appendix Fig. 1).
After establishing the CPCS, a subpopulation treatment effect pattern plot (STEPP) analysis was further conducted to explore the chemotherapy-CPCS interaction, and a chow test was conducted to identify the cut-off value of the CPCS [17]. After establishing the CPGS, each patient was scored using our established CPGS, and the patients were classi ed into a low score group and a high score group according to the median of the calculated scores. The chemotherapy-CPGS interaction was tested to validate performance of the CPGS in predicting chemotherapy bene ts.

Genes from previous established multiple gene assays
To test the predictive performance of genes from previous established multiple gene assays, we extracted genes in the Oncotype Dx, 70-gene (MammaPrint), ROR, 28-gene signature, EndoPredict and Breast Cancer Index (BCI), and then drew a heatmap of their expression count in the 254 tumor samples [12,[18][19][20][21][22]. The samples were hierarchical clustered to see whether samples with same pathological stages could be clustered together.

Patient characteristics from the SEER database
In total, 86158 patients with HR+, HER2-, early breast cancer were identi ed from the SEER database, and around 30% of the patients received chemotherapy (Appendix Table 1). Compared to the patients not receiving chemotherapy, the patients receiving chemotherapy had poorer differentiated grade, higher T stage and N stage (P all < 0.001). 38.2% of the tumors were poorly differentiated in the chemotherapy group, in contrast to the 11.1% in the non-chemotherapy group. Most tumors were T1 stage (80.7%) and N0 stage (86.8%) in the non-chemotherapy group, while only 46.9% of the tumors were T1 stage and 41.0% of the tumors were N0 stage in the chemotherapy group. Patients in the chemotherapy and non-chemotherapy groups were matched 1:1, and there were 15259 patients in each after-matched groups. The demographic and clinicopathological characteristics of the after-matched groups were shown in Table 1. The P-values of every variables were ≥ 0.05, indicating that the demographic and clinicopathological characteristics were well balanced after matching.

Survival analysis for patients from the SEER database
The median follow-up period was 49 months (IQR: 34-65 months) in the after-matched patients. During the follow-up period, 873 patients suffered CSD and 630 patients suffered non-CSD.

Chemotherapy predictive clinical score
According to the P-values for interaction, race, T stage and N stage were associated with chemotherapy bene ts and entered into the establishment of CPCS. The coe cient values, transformed coe cient values and corresponding scores of the three variables were shown in Table 2.
The chemotherapy-CPCS interactions were shown in the STEPP (Fig. 2), from which we observed that with the CPCS increasing, the CSD of patients receiving chemotherapy was increasingly lower than that of patients receiving no chemotherapy. Results of the chow test showed that the 103.12 was the optimal cut-off value.

Differentially expressed genes and chemotherapy bene t genes
Amongst the 29 pairs of tumor samples and normal samples, 1709 DEGs were ltered. The DEGs were displayed in a heatmap and a volcanic map respectively (Appendix Fig. 2, Appendix Fig. 3).
For each one of the identi ed DEGs, the tests of statistical interactions between chemotherapy and the gene expression level were performed, and we nally identi ed 84 genes associated with chemotherapy bene ts, which were summarized in Appendix Table 3.  Table 3.
According to the median of the CPGS, 125 patients were classi ed into the low expression group (62 patients receiving chemotherapy and 63 patients not) and 129 patients were classi ed into the high expression group (57 patients receiving chemotherapy and 73 patients not). The HR of chemotherapy was 8.95 (95% CI: 1.12-71.56, P < 0.01) in the low score group, which was 0.05 (95% CI: 0.01-0.36, P < 0.01) in the high score group, and the P-value for interaction between chemotherapy and CPGS as a categorical variable was less than 0.01 (Fig. 3).
6. Genes from previous established multiple gene signatures Figure 4 showed the expression count of 130 genes from previous gene assays (values were scaled in the row direction), and we failed to observe that patients with similar age, AJCC stage, T stage or N stage to be clustered together using the hierarchical clustering.

Discussion
HR and HER2 statuses are two well-established prognostic factors in breast cancer, and the prognosis of patients with HR+, HER2-, early breast cancer is favorable [23,24]. Our population-based study found that the non-CSD accounted for 41.92% of all causes of death, indicating that non-CSD should be emphasized and clinicians should be cautious when performing adjuvant chemotherapy for this cohort considering the side effects brought by chemotherapy.
In our survival analyses, we found that the entire cohort failed to bene t from the addition of chemotherapy, indicating some patients may be spared the use of chemotherapy. In our further subgroup analyses, N stage was associated with chemotherapy bene ts, which was consistent with guidelines that lymph nodal status was an indicator for choosing adjuvant chemotherapy. We concluded from our analyses that, in patients with HR+, HER2-, early breast cancer, most patients with N0 breast cancer might be spared the use of adjuvant chemotherapy and patients with N2 and N3 cancer should be arranged to receive the adjuvant chemotherapy, while the chemotherapy bene ts were uncertain for patients with N1 cancer just according to the clinical characteristics.
There has been a tendency to equate factors associated with prognosis with those related to chemotherapy bene ts. However, in our study, we found these two types of factors were not entirely same. In the multivariable analyses, age, marital status, race, histological type, differentiated grade, T stage and N stage were all associated with CSD, while only race, T stage and N stage were signi cant when considering the interactions between adjuvant chemotherapy and demographic and clinicopathological characteristics. Therefore, we concluded that only race, T stage and N stage were associated with chemotherapy bene ts, and factors like histological type and differentiated grade were not (even though histological type and differentiated grade were associated with prognosis). In addition, we established a score for predicting chemotherapy bene ts with the 3 factors (race, T stage and N stage). Different from previous predictive scores, which were transformed from models for predicting prognosis, we established our score using a totally new method.
The 21-gene recurrence score assay was the only genetic assay recommended for predicting chemotherapy bene ts in the NCCN guidelines, while its predictive ability in HR+, HER2-, early breast cancer is doubtful [9]. The 21-gene assay is initially established for predicting prognosis of estrogen receptor-positive (ER+), N0 breast cancer rather than chemotherapy bene ts of HR+, HER2-, early breast cancer.
Evidences supporting the 21-gene assay for predicting chemotherapy bene ts were mainly from the NSABP B20 trial, the TAILORx trial and the SWOG-8814 trial. The NSABP B20 trial enrolled 2363 patients with ER+, N0 breast cancer, who were randomly assigned to chemotherapy or not [25]. Among the 2363 patients, a total of 651 patients were assessed using the 21-gene recurrence score assay and were classi ed into three groups according to the score: low-RS (RS < 18), intermediate-RS (18 < RS < 30) and high-RS (RS ≥ 31). Results of the study suggested that patients with low RSs failed to receive chemotherapy bene ts and patients with high RSs bene ted from chemotherapy, and chemotherapy bene ts were uncertain for patients with intermediate RSs. Note that the study included patients with both HER2-and HER2 + breast cancer, therefore it was doubtful to conclude that patients with HR+, HER2-, N0 breast cancer bene ted from chemotherapy because the HER2 score accounts for a certain proportion of the 21-gene RS [12].
In a secondary analysis of the NSABP B20 trial, gene expression results of the 21-gene assay were obtained in the 651 patients, and 569 of them were identi ed as HER2-according to the HER2-gene expression from the RT-PCR assay [26]. Then chemotherapy bene ts for the 569 patients were assessed using the Kaplan-Meier plots, and patients were reported to bene t from the addition of chemotherapy in the high-RS group (whether high-RS was set as 31 or more or as greater than 25), while no signi cant bene ts in the low-and intermediate-RS groups. Considering the retrospective trait and the relatively low sample size in the high-RS group (97 patients in the RS ≥ 31 group or 121 patients in the RS > 25 group), results of this secondary analysis need further validation. In addition, the NSABP B20 trial was conducted in an early era when the effectiveness of endocrine therapy is limited, and this effectiveness has increasingly improved nowadays.
The TAILORx trial considered the impact of HER2 status on survival and chemotherapy bene ts, and enrolled patients with HR+, HER2-, N0 breast cancer [13]. The 6711 patients with a midrange RS of 11 to 25 by the 21-gene assay were randomly assigned to receive endocrine therapy alone or endocrine therapy plus chemotherapy. Although some patients aged 50 or younger seemed to bene t from the chemotherapy, the entire cohort failed to bene t from the chemotherapy (endocrine therapy alone versus endocrine therapy plus chemotherapy: invasive disease recurrence or death, P = 0.26; recurrence at a distant site, P = 0.48). The 1389 patients with a RS of 26 or higher were all assigned to receive chemoendocrine therapy, thus the study did not directly demonstrate that patients with HR+, HER2-, N0 breast cancer who had a high RS bene ted from chemotherapy.
In a secondary analysis of the TAILORx trial, 1389 patients with HR+, HER2-, N0 breast cancer who had a high RS of 26 or more were identi ed and 89 of the patients had no chemotherapy [27]. The study reported that patients without chemotherapy had a worse prognosis in invasive disease-free survival (IDFS) than patients with chemotherapy (HR with 95% CI: 0.48, 0.29-0.80), rather than freedom from recurrence at a distant site (HR with 95% CI: 0.74, 0.32-1.69). However, in the non-chemotherapy group, 57 of the 89 patients underwent breast conservation surgery while fewer patients (29 patients) received postoperative radiation therapy, indicating these non-chemotherapy patients might be of poor adherence. The radiation therapy has been validated to reduce local recurrence after breast conservation and been suggested for patients treated with conservation surgery. Therefore, results of this secondary analysis that chemotherapy rendered bene ts in IDFS for patients with HR+, HER2-, N0 breast cancer who had a high RS need to be further validated.
According to the ASCO guidelines, the 70-gene assay is also recommended for treatment decision making, and the MINDACT trial is the highest level of evidence supporting its predictive values [28,29]. In this randomized, phase 3 study, patients with early breast cancer were classi ed into four groups according to their clinical risk (using a modi ed version of Adjuvant! Online) and genomic risk (using the 70-gene signature), and the groups with discordant risk results (high clinical and low genomic risk or low clinical and high genomic risk) were randomly assigned to receive chemotherapy or not. Results of this study showed that there were no statistically signi cant chemotherapy bene ts with respect to survival without distant metastasis, disease-free survival or overall survival, indicating that high genomic risk may not be an accurate indicator for chemotherapy bene ts. Additionally, subgroup analyses of the MINDACT trial of patients with ER+, HER2-, N-showed this subgroup failed to bene t from the addition of chemotherapy (DMFS for chemotherapy versus no chemotherapy in the two discordant risk groups, in the HR+, HER2-, N-subgroup, P = 0.456 in high clinical risk/low genomic risk; P = 0.333 in low clinical risk/high genomic risk), which was consistent with our results.
The other multiple gene assays are also not suitable for predicting chemotherapy bene ts in HR+, HER2-, early breast cancer because they all did not take the interaction between chemotherapy and gene expression level into consideration. Results that genes from previous multiple gene assays (including genes in the Oncotype Dx, MammaPrint, PAM 50, 28-gene signature, EndoPredict and BCI) failed to cluster patients with similar pathological stages also proved that these previous genes were not suitable for the prediction of HER2-, HR+, early breast cancer [12,[18][19][20][21][22]. Therefore, we adopted a new method, which considered the interaction, and identi ed that SOWAHA RP11-205M3.3 IRX6 PPBP and EMX1 were genes most closely related to chemotherapy bene ts in the HER2-, HR+, early breast cancer, and we established a chemotherapy predictive genetic score based on the 5 genes.
In colorectal cancer, the Iroquois Homeobox 6 (IRX6) expression has been reported to be lower in tumor samples and was a prognostic predictor [30]. Pro-platelet basic protein (PPBP) was also known as C-X-C motif chemokine ligand 7 (CXCL7), the overexpression of which has been reported to promote cell proliferation in vivo and in vitro, and it has been proposed as a novel biomarker for the diagnosis of lung cancer and renal cell carcinoma [31][32][33]. Zhou et al. reported that Empty spiracles homeobox 1 (EMX1) was a potential target of miR-497, which was recognized as a tumor suppressor in many cancers, and the high expression of EMX1 was related to advanced clinicopathologic characteristics in endometrial cancer [34]. Our study con rmed that the high expression of EMX1 was associated with chemotherapy bene ts in breast cancer, indicating the inconsistency between prognostic and predictive values of EMX1.
There are several limitations in our study. Firstly, our study was a retrospective study and biases such as selection bias and treatment bias were inevitable in our study. In addition, our established CPCS and CPGS need further external validations to prove their effectiveness in predicting chemotherapy bene ts. Finally, some important information such as regimens and courses of chemotherapy are not available in the SEER and TCGA database, which limits our further analyses.

Conclusion
For patients with HR+, HER2-, early breast cancers, our study found that adjuvant chemotherapy failed to render survival bene ts in the entire cohort. Nodal status helped to identify patients bene ting from chemotherapy, and most N0 patients and some N1 patients might be spared the adjuvant chemotherapy. In addition, we adopted a new method to establish a chemotherapy predictive clinical score and a chemotherapy predictive genetic score for predicting chemotherapy bene ts in the HR+, HER2-, early breast cancer.

Declarations
Ethics approval and consent to participate The study was approved by the review board of the A liated Jinhua Hospital, Zhejiang University School of Medicine.

Consent for publication
Not applicable Availability of data and materials

Not applicable
Competing interests