QRT-PCR-based DNA Homologous Recombination Associated 4-Gene Score Predicts Pathologic Complete Response to Platinum-Based Neoadjuvant Chemotherapy in Triple-Negative Breast Cancer

Cumulative evidences suggested the addition of platinum agents as neoadjuvant chemotherapy (NACT) could improve pathologic complete response (pCR) in triple-negative breast cancers (TNBC). Previous studies showed DNA homologous recombination deciency (HRD) was a potential biomarker predicting pCR in ER-negative breast cancer. It would be helpful to personalize the use of platinum agents if a predictive biomarker for platinum sensitivity could be developed. Therefore, we tried to develop a HRD gene expression score to predict tumor sensitivity to platinum-based NACT in TNBC.

Cumulative evidences suggested that TNBC are more sensitive to interstrand crosslinking agents that damage DNA such as platinum [6,7]. The recent largest meta-analysis included nine randomized controlled trials showed that platinum-based NACT is associated with signi cant higher pathologic complete response pCR rates compared to platinum-free NACT: pCR rate increased from 37-52.1% [8]. Nevertheless, available results from single study remained controversial to recommend the use of a platinum agent as standard component for NACT in unselected TNBC patients [8]. Additionally, the longterm outcomes such as overall survival (OS) or disease free survial (DFS) associated with the incorporation of carboplatin are not yet known. Therefore, the newly updated National Comprehensive Cancer Network (NCCN) guideline of breast cancer does not recommend addition of carboplatin to routine neoadjuvant standard chemotherapy for patients with TNBC, but suggests it to be considered in select patients [9].
On this regard, proper biomarkers in predicting sensitivity to platinum agents is needed to better identify the group of TNBC patients, who can bene t from the use of platinum-based NACT. In the early study from Pitroda et. al., a Recombination Pro ciency Score (RPS) system could provide prognostic data in individual cancers. The RPS was calculated by the combination of the expression levels of four genes (Rif1, PARI, RAD51, and XRCC5) involved in DNA homologous recombination (HR) repair pathway [10].
The RPS system was initially studied in patients with non-small cell lung carcinomas (NSCLC). Low RPS was associated with poor prognoses in NSCLC, however, it might be negated by chemotherapy because low-RPS tumors are especially sensitive to platinum-based chemotherapy [10]. This validated the ability of RPS to determine sensitivity to platinum-based chemotherapy. Although the predictive effect of RPS was subsequently evaluated in breast cancers, cases involved in the study underwent anthracyclinebased NACT without the inclusion of platinum agent [11]. Thus, it is necessary to carry out further study to elucidate whether similar scoring system can predict e cacy to platinum-based NACT in TNBC.
In our study, we built an algorithm model to correlate the nal pCR results with the expression level of HR related genes (RIF1, PARI, RAD51, XRCC5, BRCA1, PARP1, C-Met, and E2F1). Beside the four genes (Rif1, PARI, RAD51, and XRCC5) referred to Pitroda's work, four additional genes (BRCA1, PARP1, C-Met, E2F1) were added in the primary analysis to explore whether these genes also have predictive effect to platinum-based NACT in TNBC. As reported previously, these genes played important role in homologous recombination repair process [2,[12][13][14][15][16][17][18][19][20]. Our nal model modi ed the inclusion of the important genes and the combined 4 gene score showed the ability to predict the sensitivity of platinum-based NACT in TNBC.

Sample selection and data collection
This study of human TNBC specimens was approved by the institutional review board of Fudan University Shanghai Cancer Center. 129 formalin-xed para n-embedded (FFPE) core needle biopsy samples (before platinum-based NACT) of TNBC patients from 2012-2017 were collected. The detection of HR related genes was conducted by Shuwen (Shuwen Biotech Co., LTD) with blinding clinical information. 127 samples were included in the scoring model building, while 2 samples with less than 20% tumor contents were excluded. The clinical and pathological data of the patients were collected and reviewed. Tumor-node-metastasis (TNM) stage was evaluated according to the eighth edition of American Joint Committee on Cancer (AJCC) cancer staging system. In our study, the extent of residual disease was assessed using the Miller-Payne (MP) grading system [21]. Pathologic complete response (pCR) in breast/axilla (ypT0/Tis ypN0) was de ned as the absence of invasive cancer both in breast and axillary lymph nodes. Pathologic complete response (pCR) in breast (ypT0/Tis) was de ned as the absence of residual invasive cancer only in the breast, corresponding to MP grade 5.

Expression Of Target Gene And Reference Gene
Hematoxylin and eosin (HE) staining was used for one slide of each sample to con rm tumor content above 20% and used as the reference. RNAs were extracted from macrodissected samples with commercial available RNXtract® RNA Extraction Kit (BioNTech Diagnostics). The extracted RNA content was measured using Qubit™ 3 Fluorometer (Life Technologies) and stored at -20ºC until use. Expression of target genes [RAD51, XRCC5, RIF1, PARI, PARP1, BRCA1, C-Met, and E2F1] and a set of 4 reference genes (CALM2, B2M, TBP and GUSB) were detected using TaqMan-based qRT-PCR reactions in 96-well plates, performed with the use of 7500 Real-Time PCR Systems(Applied Biosystems). Cycle threshold (Ct) value of target gene and reference gene were recorded.

Dna Recombination De ciency Scoring
Expression of each target gene were normalized using Ct value of each target gene minus the mean Ct value of 4 house-keeping genes We built random forest (RF) model to estimate the weight of each gene expression level and clinical-pathological factors. Samples were randomized into training set and validation set with deferent splitting percentage from 50%:50-90%:10%. The training set was used to modulate parameters and select the best model using 5-fold cross validation. The performance of the nal model was evaluated in the validation set.

Statistical analysis
All analyses were performed using R software and SPSS 20.0 (IBM Corp, Armonk, NY, USA). The predictive model training and validating was performed using R package caret, Classi cation and Regression Training, R package, version 3.6.1.
One-way Analysis of Variance (ANOVA) showed the expression levels of RAD51, XRCC5, PARP1, and BRCA1 were signi cantly different among distinct MP grade subgroup (all P < 0.05, Table S1). Spearman's rank correlation analysis indicated that expression of RAD51, XRCC5, and BRCA1 were correlated with Miller-Payne grade. (Table S2).
To determine the weight of each gene, we built a random forest (RF) model to estimate the variable importance of each gene. In the RF model, the predictors we started included 8 target gene expression levels, as well as clinical and pathologic factors with age (< 48 vs ≥ 48), T stage, N stage, histological grade, and Ki-67 index (< 40% vs ≥ 40%). When samples split into 50%:50% as training set and validation set, we got the maximized accuracy (74.6%), sensitivity (67.9%) and speci city (80.0%) (Table S3). Variables with importance scores < 70 (range of 0 to 100) were eliminated, therefore, RAD51, XRCC5, PARP1, and BRCA1 were included in the nal RF model. In the nal model, the performance at splitting percentage of 70%:30% was optimal with accuracy of 76.3% and sensitivity of 76.5% (Table S4). Then the variable importance scores estimated in the nal model was used to calculate the weight of each gene expression level, the sum of each gene weight coe cients equaled to one. Given the inverse relationship of RAD51 and BRCA1 with MP grade (Table S2) independent predictive factors of pCR in breast, however, the 4-gene score (OR = 3.241; P < 0.001) is the only independent predictive factor of pCR in breast/axilla ( Table 2). Spearman correlation analysis showed 4-gene score is positively correlated with Ki-67 index (P = 0.002), but negatively correlated with positive lymph nodes (P = 0.003) ( Table 3).  To investigate the predictive value of the 4-gene score in identifying pCR in breast and pCR in breast/axilla, receiver operating characteristic (ROC) curve analysis was used and yielded an area under the curve (AUC) of 0.816 and 0.782 for pCR in breast and pCR in breast/axilla respectively (Fig. 1). The point on the curve that maximized sensitivity (93.0%: pCR in breast; 91.8%: pCR in breast/axilla) corresponded to a cut-off value of -2.644. The optimal cut-off value maximizing speci city (85.7%: pCR in breats; 80.8%: pCR in breast/axilla) on the curve was 1.969 (Table 4). When the cut-off value was set at -2.644, a negative result might rule out the patients less sensitive to platinum regimen given the low negative likelihood ratios. When the cut-off value of -1.969 is used, a positive result may increase the possibility of the patients to achieve pCR in breast and pCR in breast/axilla by 4.54 and 3.40 folds respectively (Table 4). With the two cut-off values (-2.644, -1.969), the subjects were divided into three subgroups ( Table 5). The distributions of pCR in breast (P < 0.001), pCR in breast/axilla (P < 0.001) and Ki-67 index (P = 0.012) were signi cant different among each group. There was an increasing trend of rates both in pCR in breast and pCR in breast/axilla, with higher 4-gene score yielding better response rate. To better illustrate the trend of pCR in breast, the distributions of patients with different Miller-Payne grading in these three subgroups were presented in Fig. 2. Additionally, it is noteworthy that if the 4-gene score higher than − 1.969, 91.5% of the patients showed high Ki-67 index more than 40%. While, if the 4-gene score less than − 2.644, the rate of high Ki-67(≥ 40%) index signi cant decreased to 65.9% (Table 5).

Discussion
Platinum agents (i.e. carboplatin and cisplatin) are cytotoxic DNA damaging agents causing DNA strand breaks and consequent cell apoptosis [12]. This mechanism makes them useful in cancers with DNA HR repair de ciency, especially those harbouring deleterious mutations in the BRCA1/2 genes [6,7]. BRCA gene abnormality and TNBC are closely related. Up to 20% of TNBC patients are carriers of a BRCA germline mutation [22]. Moreover, several other mechanisms involving in the HR pathways have been indicated to platinum response, de ning the concept of BRCAness. This genetic signature is de ned by epigenetic inactivation of BRCA, mutations in other genes or post-translational modi cations of key proteins involved in HR system [23,24]. Since the functions of BRCA1/2 gene and the phenomenon of BRCAness in HR repair mechanism was revealed, several studies attempted to investigate the role of the platinum-based chemotherapy in the neoadjuvant TNBC setting. A series of phase II studies, such as the GEICAM/2006-03, the GeparSixto and the CALGB 40603 trials [25][26][27], suggested a possible activity and survival bene t of the addition of platinum in the NACT, however, available results are mixed and controversial. According to current breast cancer guidelines, the routine use of platinum agents as part of neoadjuvant therapy for TNBC is not recommended for most patients, however, it may be considered in select patients requiring better local control [9]. Therefore, predictive biomarkers for platinum in BC represent an unmet clinical need.
Recently, three independent DNA-based measures of genomic instability resulting from HR repair defects were developed. The test was based on genomic alterations with loss of heterozygosity (LOH) [28], telomeric allelic imbalance (TAI) [29] and large-scale state transitions (LST) [30]. The combination of the three scores, called HRD score,could distinguish homologous recombination de cient tumors from nonde cient tumors [31,32]. HRD is de ned by a threshold of HRD score equal or over 42 [33,34]. It has been reported that HRD-high tumors were sensitive to platinum-containing regimens, indicating a clinical utility of HRD score for the selection of patients who were more likely to respond to platinum [34]. However, the BrighTNess study showed that higher pCR rates was not related to HRD status in the platinum-containing NACT [35]. Additionally, Tutt et al. also found similar response rate to carboplatin between HRD-high and HRD-low tumors in the metastatic settings (TNT trial) [36]. Therefore, it is necessary to carry out further studies to con rm the clinical utility of HRD scores as a predictor for response to platinum-containing regimens.
Genomic alterations detected by NGS technique for HRD status actually re ect the pre-transcriptional events, while gene expression pro ling can provide a current transcriptional state of tumor samples, since the quantity of RNA varies dynamically during cellular process. In 2014, Pitroda et. al. developed a recombination pro ciency score (RPS) system with gene expression pro ling to evaluate HRD status and tumor sensitivity to chemotherapy [10]. The RPS score from Pitroda et. al. is calculated by the expression levels of four genes (Rif1, PARI, RAD51, and XRCC5) involved in DNA repair pathway. Initially, the RPS system was studied in patients with non-small cell lung carcinomas (NSCLC), which showed low-RPS tumors are especially sensitive to platinum-based chemotherapy [10]. This indicated RPS has the potential to determine sensitivity to platinum-based chemotherapy. And in their later study, such RPS was used to evaluate sensitivity of breast cancers to anthracycline-based NACT [11]. Compared with the work of Pitroda et.al., our study focused on the response to platinum-based NACT, not on anthracycline-based NACT. Additionally, the nal genes used in our study was obtained from the correlation analysis with the pCR results, while the gene list from Pitroda's study were selected from cell line database on topotecan sensitivity.
In our study, eight target genes (RIF1, PARI, RAD51, XRCC5, BRCA1, PARP1, C-Met, and E2F1) were included in primary analysis. Beside the four genes (Rif1, PARI, RAD51, XRCC5) referred to Pitroda's work, four additional genes (BRCA1, PARP1, C-Met, E2F1) were added as well. As reported previously, these genes played important role in HR repair process. BRCA1 protein was an upstream effector and considered as a permanent factor during the whole process despite of the complex mechanism involved in double-strand break repair pathway [12,13]. E2F1 was a transcription regulator participating various pathways such as cell cycle, proliferation, apoptosis, development and differentiation [15]. E2F1 accumulation was a response to DNA double strand damage in tumor cells and promotes DNA repair [14]. Compared with RIF1 and PARI, PARP1 not only mediates excision repair in single-strand break but also acted as a sensor of DNA double-strand break involving the control and recruitment of important HR proteins [16]. In addition, TNBC were shown to express PARP1 more frequently than other breast cancer subtypes. And high levels of PARP1 expression in breast cancer correlated with improved response to chemotherapy [2]. Importantly, it has been revealed that active PARP-1 could enhance E2F1 transcription factor activity in HR related DNA repair process [17]. Furthermore, several studies showed that C-MET inhibition reduced RAD51 phosphorylation by impairing its nuclear translocation and decreased the formation of the RAD51/BRCA2 complex in DNA damage response [18,19]. Additionally, C-Met is not only a clinical prognostic marker but also a predictive marker of response to chemotherapy in patients with breast cancer. Interestingly, previous studies have also indicated that c-Met related with and phosphorylated PARP-1 at Tyr907, and inhibiting both c-Met and PARP-1 could synergize to suppress the growth of breast cancer cells [20]. Considering the evidence showed above, these eight genes were put into primary analysis. As a result, the expression level of BCRA1 and PARP1 presented signi cant correlation with different MP grading subgroups, while RIF1 and PARI didn't. Thus, two genes (RIF1 and PARI) referred in Pitroda's nal formula were replaced by BRCA1 and PARP1.
Our study showed that, TNBC with higher score had nearly quadruple likelihood to achieve pCR to platinum-based NACT compared with a lower score. Moreover, a test result below − 2.6440 might be used to rule out the patient with less sensitivity to platinum regimen; and a result above − 1.9692 might be rule in the patient with increased possibility to achieve pCR. However, if the subset acquired the value between − 2.6440 and − 1.9692, the predictive ability of 4-gene score was signi cant attenuated.
We also showed that high level Ki-67(≥ 40%) correlated with pCR in breast in TNBC. As demonstrated in other studies, the de nitions of Ki-67 cutoff values differed widely, ranged from 10-61% in TNBC [37]. In our study, the median Ki-67 index was 70%. It is consistent with that baseline Ki-67 values for TNBC are much higher than those for luminal tumors [37]. Due to interobserver variations there was always misclassi cation in assessment of Ki67 when the level of expression of Ki67 lay in grey zone. In the PACS01 study, when Ki-67 expression ranged between 10% and 25%, there was a risk of misclassi cation with 37%; while the risk of misclassi cation was only 11% with Ki-67 expression either < 10% or ≥ 25% [38]. In our study the median Ki-67 index was signi cant high with 70%, thus it is almost impossible to misclassi ed the value of Ki-67. However, using such a high threshold to make further analysis was unwise. Several studies suggested using high Ki-67(≥ 30%) proliferative index could identify those patients with signi cantly higher breast-related events [39,40]. Thus, we de ned Ki-67 as a classi cation variable at a 40% threshold. This cutoff value is in line with that reported earlier by another TNBC study from Wei Wang et.al [41]. Additionally, the 4-gene score is positively correlated with Ki-67, indicating that higher 4-gene score represented higher proliferation rate of tumors.

Conclusions
In summary, the RT-qPCR-based 4-gene score showed its potential in predicting both pCR in breast and pCR in breast/axilla with platinum-based NACT in TNBC patients. Despite the current prevalent large genomic characterizations of breast cancers [42], which can provide enormous information for a given patient, our 4-gene score analysis gives several practical points as well. Firstly, our study indicates that the expression level of HR related genes is a proper biomarker to identify tumors with an increased likelihood of response to platinum-based neoadjuvant therapy in TNBC patients. Secondly, the 4-gene score analysis can be performed on formalin-xed para n-embedded material from limited core-needle  The distributions of patients with different Miller-Payne grading in three subgroups.