Prediction of pathologic complete response to neoadjuvant chemotherapy using machine learning models in patients with breast cancer

The aim of this study was to develop a machine learning (ML) based model to accurately predict pathologic complete response (pCR) to neoadjuvant chemotherapy (NAC) using pretreatment clinical and pathological characteristics of electronic medical record (EMR) data in breast cancer (BC). The EMR data from patients diagnosed with early and locally advanced BC and who received NAC followed by curative surgery were reviewed. A total of 16 clinical and pathological characteristics was selected to develop ML model. We practiced six ML models using default settings for multivariate analysis with extracted variables. In total, 2065 patients were included in this analysis. Overall, 30.6% (n = 632) of patients achieved pCR. Among six ML models, the LightGBM had the highest area under the curve (AUC) for pCR prediction. After hyper-parameter tuning with Bayesian optimization, AUC was 0.810. Performance of pCR prediction models in different histology-based subtypes was compared. The AUC was highest in HR+HER2- subgroup and lowest in HR−/HER2- subgroup (HR+/HER2- 0.841, HR+/HER2+ 0.716, HR−/HER2 0.753, HR−/HER2- 0.653). A ML based pCR prediction model using pre-treatment clinical and pathological characteristics provided useful information to predict pCR during NAC. This prediction model would help to determine treatment strategy in patients with BC planned NAC.


Introduction
Breast cancer (BC) is the most common cancer in women worldwide, and 2.1 million women were newly diagnosed with BC in 2018 [1]. While BC specific mortality has gradually decreased, it still causes the highest portion of cancer mortality in females [1,2]. Because most BC specific mortality occurs with metastatic breast cancer (MBC) [2], early detection of BC recurrence and effective treatment to eradicate metastasis are the most accurate strategies to reduce BC mortality.
Neoadjuvant chemotherapy (NAC) has been performed in patients with early and locally advanced BC (LABC). NAC is used to reduce tumor size in inoperable cases and eliminate micrometastasis [3]. In addition, response to NAC is one of the most powerful surrogate markers to predict BC prognosis [4]. Pathologic complete response (pCR), defined as no residual tumor cells in either breast or axilla Ji-Yeon Kim and Eunjoo Jeon have contributed equally to this work. after NAC, represents long term survival without BC recurrence [5]. In addition, residual cancer burden (RCB) score, based on residual tumor volume after NAC, can accurately predict BC prognosis [6].
Clinical trials have indicated that human epidermal growth factor receptor 2 (HER2) positive BC had about 60-70% pCR for NAC with dual anti-HER2 blockade [7,8], while triple negative breast cancer (TNBC) achieved about 30% pCR after anthracycline and taxane based NAC [9]. These clinical trials aid in determining overall patient prognosis but do not clarify the individual fates of patients with BC who receive NAC. There is also insufficient knowledge of baseline characteristics that affect NAC response at the time of BC diagnosis, regardless of BC subtype.
Current machine learning (ML) methods improve the accuracy of BC diagnosis and help to predict BC recurrence and prognosis [10][11][12]. If precise prediction of NAC response is possible using a ML algorithm, it may also be used to predict NAC responses of individual patients. This may change treatment plans for LABC during NAC and maximize the pCR rate and improve patient outcome. Therefore, we aimed to develop ML based model to accurately predict pCR to NAC using pretreatment clinical and pathological characteristics of electronic medical record (EMR) data in a neoadjuvant BC cohort in this study.

Study population
In this retrospective study, clinical information of 2168 patients with BC treated with NAC followed by curative surgery at Samsung Medical Center between January 2010 and September 2018 was collected. Among 2168 patients, 103 were excluded: 49 who had bilateral synchronous BC, 52 who did not receive curative surgery due to patient refusal, 1 who did not have complete surgical pathology results, and 1 who did not have information regarding NAC regimen. The Institutional Review Board (IRB) of Samsung Medical Center, Seoul, Korea (IRB No. 2019-04-021) reviewed and approved this study with an informed consent waiver due to use of retrospective clinical data. This study was performed in accordance with the Declaration of Helsinki.

Measurements
Detailed information of 16 baseline characteristics of patients with BC was obtained. Age, body mass index (BMI), menopausal status, smoking status, comorbidities at BC diagnosis, baseline CEA and CA-15-3, clinical and pathologic T and N stages, NAC regimen, and information regarding pathologic diagnosis, including immunohistochemistry (IHC) for estrogen receptor (ER), progesterone receptor (PR), HER2, and Ki-67.
BMI was calculated as weight in kilograms divided by height in meters squared at the time of BC diagnosis. Women were considered post-menopausal if they were age < 60 and amenorrheic for ≥ 12 months in the absence of chemotherapy, tamoxifen, or ovarian suppression; had undergone prior bilateral oophorectomy; or were aged ≥ 60 years [12]. Pathologic and clinical stages were based on criteria of the American Joint Committee on Cancer, 7th Edition [13]. Two experienced pathologists reviewed and determined primary tumor characteristics based on biopsy and surgical specimens. Primary tumor characteristics based on size, axillary nodal status, and resection margin were determined using the surgical specimen. Morphologic diagnosis and receptor status (ER, PR, and HER2) were determined by IHC staining. ER positivity and PR positivity were defined as Allred scores of 3-8 based on IHC staining with antibodies against ER (Immunotech, France) and PR (Novocastra, UK), respectively. Hormone receptor (HR) positivity was defined as ER or PR positivity. HER2 status was evaluated using the appropriate antibody (Dako, CA) and/or silver in situ hybridization (SISH). HER2 grades 0 and 1 indicated a negative result, while grade 3 indicated a positive result. Amplification of HER2 was confirmed by SISH for results of 2 +.

Data pre-processing
Menopause status was divided into premenopausal or postmenopausal, and smoking status was into current/ex-smoker or never-smoker. Serum CEA and CA-15-3 were used as continuous variables. Morphology was defined as invasive ductal carcinoma (IDC) or non-IDC. ER and PR ranges were between 0 and 8, and Ki-67 range was between + 1 and + 4. HER2 was handled as a binary variable with negative or positive results. The chemotherapy regimen of each patient was reconstructed into six subcategories and reviewed by an oncologist: (1) anthracycline based chemotherapy (AC: doxorubicin plus cyclophosphamide) followed by taxane (T), (2) chemotherapy with trastuzumab as an anti-HER2 targeted agent, (3) chemotherapy with dual anti-HER2 blockade (combination of trastuzumab and pertuzumab), (4) AC followed by T with platinum, (5) AC alone, and (6) others. pCR was defined as absence of residual invasive cancer on pathologic evaluation of the resected breast specimen and all sampled regional lymph nodes (ypT0/isN0).

Feature selection
Features were selected by univariate and correlation analyses. Univariate analysis was conducted to identify significant differences between the pCR and non-pCR groups using t test after testing for homogeneity of variance (Levene's test) or chi-square test. If similar features were associated with Pearson correlation and CramerV > 0.5, the feature with the higher effect size (only) was included to minimize collinearity among features. If variables were not significant in univariate analysis and the correlations were high, other features were additionally selected after expert review of the hospital workflow.

Model selection
Six ML models were compared using default settings for multivariate analysis with extracted variables: (1)

Model development and statistical analysis
Since missing data comprised less than 10% of the total data, a simple imputation technique was used. Missing data were imputed using mean for continuous features and median for categorical features. Stratified tenfold cross-validation (StratifiedKFold function in Python Scikit-Learn) was used to prevent overfitting and generate comparison results with lower bias and lower variance [15]. In tenfold cross-validation, the entire dataset was divided into ten mutually exclusive subsets with approximately the same class distribution as the original dataset (stratified). The dataset (2065 records) was randomly divided into ten disjoint subsets (folds), with each fold containing approximately the same number of records (208 records). Sampling was stratified by class label to ensure the subset proportions were the same as the original set. For each fold, a classifier was trained using 9 of the 10 folds and was tested on the 10th one.
In the training phase, Bayesian optimization using Bayes-Opt [16] was employed with 100 iterations to select hyperparameters of LightGBM: number of leaves, feature fraction, bagging fraction, max depth, lambda1, lambda2, min split gain, and min child weight. After choosing the best parameters, the LightGBM model was trained at learning rate 0.01 for 2000 iterations using an early sopping round of 100. The performance of classification models was evaluated by four standard metrics, which provide better results than accuracy evaluation for unbalanced data sets. The four standard metrics were area under the receiver operation characteristics (ROC) curve (AUC), accuracy, sensitivity, and specificity.
To avoid feature importance bias, a weakness of the mean decrease in importance, the permutation importance of features was calculated. This indicates the difference between baseline and the decrease in overall accuracy caused by permuting the column [17].

Baseline characteristics
In total, this study included 2065 patients diagnosed with early or locally advanced BC treated with NAC followed by curative surgery (Table S1). Data on every characteristic were collected for all patients except baseline serum CEA (null counts: n = 35, 1.7%), CA-15-3 (n = 30, 1.4%), and ER score (n = 1, 0.1%

Feature selection
Univariate analyses of the relationship between baseline patient characteristics and pathologic response to NAC were performed (Table 1). Baseline patient characteristics of age, menopausal status, baseline CA-15-3, Allred scores of ER and PR, HER2 status, expression level of Ki-67, clinical stage, T and N stages, and NAC regimens significantly affected pCR status (p < 0.05, respectively). Among these factors, age and clinical stage were excluded due to interference with other factors: age and menopausal status (correlation coefficient: 0.75) and clinical stage and N stage (correlation coefficient: 0.92) ( Figure S1). Both ER and PR (correlation coefficient: 0.84) were clinically important and were included for further analysis. Lastly, CEA and histology were added for further analysis after literature review, even though these two factors were not significantly associated with pCR [18][19][20]. Therefore, 11 features were selected: menopausal status, CEA, CA-15-3, histology, ER score, PR score, HER2 status, Ki-67, T stage, N stage, and NAC regimen.

Discussion
We developed a ML model to predict pCR after NAC in patients with BC using clinicopathologic characteristics. Moreover, a pCR prediction model was also developed according to BC subtype. This ML model suggested that NAC regimen was the most significantly contributing feature to predict pCR regardless of BC subtype. Features affecting pCR differed by BC subtype. Baseline serum CEA in HR+ HER2-BCs, PR score in HR+ HER2+ BCs, clinical N stage in HR-HER2+ BCs, and NAC regimen in TNBC were the most significantly contributing factors among all analyzed clinical features. Advances of chemotherapeutic agents was the most important factor affecting NAC response irrespective of BC subtype. In the era of cytotoxic agents, patients with BC treated with anthracycline and taxane have high pCR rate and survival outcome compared to those treated with anthracycline alone [21,22]. While the effect of additional carboplatin in the NAC regimen remains controversial, some patients with TNBC experience a survival benefit from platinum based therapeutic agents [23]. Our study suggested that NAC regimen affected pCR rate except in HR+ HER2-BC subtype. In particular, HR-HER2+ BCs were most influenced by NAC regimen. Recent advances in anti-HER2 antibodies have remarkably improved the clinical outcomes of patients with HER2+ BCs. The addition of trastuzumab, an anti-HER2 antibody, to chemotherapy increased pCR rate by about two times compared to chemotherapy alone in HER2+ BCs [24]. Moreover, dual anti-HER2 blockade with chemotherapy achieved an about 60% pCR rate in HER2+ BCs [8]. Therefore, the result of our research would reflect the impact of advances of NAC regimen on pCR rate in real world practice.
Most studies of associations between baseline characteristics and NAC response showed BC subtypes were mostly associated pathologic characteristics to pCR. Among BC subtypes, HER2+ BCs and TNBCs had higher pCR rates compared with HR+/HER2-BC subtype [25]. Despite a   [26]. In addition, pCR is a surrogate marker of survival regardless of BC subtype. Therefore, another approach for predicting NAC response by BC subtype is urgently needed to enable precise treatment of BC patients with NAC.
We developed a ML model for pCR prediction according to BC subtype. This pCR prediction model showed that accuracy and baseline clinical factors affecting pCR varied by BC subtype. In terms of TNBC, slightly low accuracy of our prediction model was observed. TNBC consists of several TNBC subtypes based on microarray data with different fates but this information could not be integrated into our model [27]. Therefore, further deep learning models should be developed using additional pathologic information to precisely predict NAC response, especially TNBC group.
In HER2+ BC subtype, HR+ HER2+ subtype has lower AUC value compared with that of HR-HER2+ subtype. Even though HER2+ BC was treated with chemotherapy in combination with anti-HER2 antibody regardless of HR status, the factors affecting pCR were different according to HR status.
We suggested that this would result from HR+ HER2+ subtype had more heterogeneous gene expression profile compared with HR-HER2+ subtype [28]. Therefore, PR, which be associated with intrinsic subtype, was the most significantly affecting factor to NAC response in HR+/HER2+ BC and slightly lower AUC value was observed compared with in HR−/HER2+ BC.
Baseline serum CEA and CA-15-3 affected pCR status in HR+ HER2-BCs. Baseline serum CEA and CA-15-3 are considered prognostic factors in patients with BC who undergo curative surgery, irrespective of BC subtype [19,29]. A previous study showed that the prognosis of ER+ or HER2-BCs was related to levels of both CEA and CA-15-3, while prognosis of ER-or HER2+ BCs was related to the level of CA-15-3 [29]. In our study, CEA level was associated with HR+/HER2-and HR+/HER2+ subtypes, while CA-15-3 was associated with HR-HER2+ subtype.
Currently, the American Society of Clinical Oncology (ASCO) does not recommend the use of CEA and CA-15-3 to screen, diagnose, stage, or monitor treatment in BC due to low sensitivity and specificity [30]. However, this guideline suggests that there is a significant prognostic contribution for CA-15-3 in early BC although there is no role of CA-15-3 in the management of early BC. Therefore this guideline does not recommend tumor marker measurement at diagnosis. In addition, this guideline suggests that CEA can be only used in conjunction with diagnostic imaging, history, and physical examination due to less commonly elevation of CEA compared with CA-15-3.
In our study, initial tumor marker elevation had significant predictive value of NAC response in specific BC subtypes and therefore we might suggest that tumor marker tests were valuable in NAC setting in spite of the lack of role in management plan. CEA had significant prognostic value in cases of HR+/HER2-and HR+/HER+ BC subtypes while CA-15-3 did in HR−/HER2+ BC subtype. In addition, the Fig. 2 Permutation feature importance. Chemo-Chemotherapy regimen, ER_scoreestrogen receptor, Menopause-menopausal status, HER2-human epidermal growth factor receptor 2, IDC_ or_Not-histology, PR_scoreprogesterone receptor costs of serum tumor marker tests are usually not expensive, about ten to twenty dollars. Therefore, we suggested that even though tumor markers was not recommended for BC diagnosis in ASCO guideline, they had significant predictive value of NAC response with regards to BC subtypes and these tumor markers would be performed to improve prediction value of NAC response in our ML model. ER and PR IHC scores affected pCR in HR+ BCs regardless of HER2 status. ER score had more significant influence than PR score in HR+ /HER2-BCs, but PR score had more significant influence on pCR in HR+/HER2+ BCs. A previous study analyzing NAC response suggested that BCs with low ER score had similar pCR rate to ER negative BCs [31]. In addition, IHC4 score determined for HR, HER2, and Ki-67 status in pre-treated biopsy tissue was associated with NAC response in ER positive BCs [32]. In terms of PR score, lower PR score was associated with higher pCR rate regardless of HER2 status but was also associated with worse survival outcome in patients with BC who did not achieve pCR after NAC [33]. Other studies revealed that high growth factor activity may be associated with decreased PR score, and one-quarter of HER2+ BCs were ER+ /PRcompared with 10% of ER+/PR+ in HER2-BCs [34,35]. Our study also suggested different contributions of ER and PR according to HER2 status, as demonstrated in previous studies in a neoadjuvant setting.
Interestingly, menopausal status influenced NAC response in HR+/HER+ BCs. We might suggest that serum estrogen level influenced ER and HER2 pathways to affect pCR status.
Pathologic complete response is the most important clinical characteristic to predict relapse free survival and overall survival [4]. Therefore, pCR is the standard primary end point of studies in the setting of NAC [36]. Most studies for NAC response have focused on identifying biomarkers to predict NAC response [37][38][39]. According to these translational research studies, presence of ctDNA was associated with worse prognosis. However, we cannot use results of translational research to predict pCR in real world practice at the present time. In this study, we suggested that different factors were associated with pCR in different BC subtype. This would be helpful to decide to treat early and locally advanced BC patients with NAC or not. In addition, in TNBC and HER2+ BC subtypes, chemotherapeutic regimen was significantly associated with NAC response. Therefore, our ML model would guide to decision making for NAC treatment as well as chemotherapeutic regimen to improve NAC response.
We validated the prediction model internally, not externally using another cohort of BC treated with NAC. To improve the performance of our prediction model, external validation is warranted.
In conclusion, we developed a ML model of pCR prediction using baseline clinical characteristics in patients with early and locally advanced BC treated with NAC followed by curative surgery. This model indicated that different clinical characteristics with different weights affect pCR according to BC subtype. We suggest that our prediction model would help physicians to treat patients with BC in real world practice. Further external validation using another BC cohort would be warranted.
Author contributions YI and JK conceived and planned the experiments. JK, EJ and HJ carried out analyses and experiments. JK, JEL, SJN, YHP, JSA and YI contributed to collection of samples and clinical data. JK, EJ, HJ, SK, SJ and YP contributed to the interpretation of the results. JK, EJ and YI took the lead in writing the manuscript. YI supervised the project. All authors reviewed and confirmed the manuscript. Data availability Data can be made available upon request; some restrictions will apply.

Conflict of interest
The authors have no competing interests to declare.
Ethical approval This study was reviewed and approved by the Institutional Review Board (IRB) of Samsung Medical Center, Seoul, Korea (IRB No. 2019-04-021).

Fig. 4
Permutation feature importance according to breast cancer subtype: A HR+/HER2-, B HR+/HER2+, C HR-HER2+, D Triple negative breast cancer. ER_score-estrogen receptor, PR score-pro-gesterone receptor, Menopause-menopausal status, Chemo-chemotherapy regimen, HER2-human epidermal growth factor receptor 2, IDC_or_Not-histology Consent to participate The requirement for individual informed consent was waived because of the retrospective clinical data review. This study was performed in accordance with the Declaration of Helsinki.
Consent to publish All authors have agreed to the content of the manuscript and provided consent regarding its publication.