Digital image analysis and machine learning-assisted prediction of neoadjuvant chemotherapy response in triple-negative breast cancer

Background Pathological complete response (pCR) is associated with favorable prognosis in patients with triple-negative breast cancer (TNBC). However, only 30–40% of TNBC patients treated with neoadjuvant chemotherapy (NAC) show pCR, while the remaining 60–70% show residual disease (RD). The role of the tumor microenvironment (TME) in NAC response in patients with TNBC remains unclear. In this study, we developed a machine learning-based two-step pipeline to distinguish between various histological components in hematoxylin and eosin (H&E)-stained whole slide images (WSIs) of TNBC tissue biopsies and to identify histological features that can predict NAC response. Methods H&E-stained WSIs of treatment-naïve biopsies from 85 patients (51 with pCR and 34 with RD) were separated through a stratified 8-fold cross validation strategy for the first step and leave one out cross validation strategy for the second step. A tile-level histology label prediction pipeline and four machine learning classifiers were used to analyze 468,043 tiles of WSIs. The best-trained classifier used 55 texture features from each tile to produce a probability profile during testing. The predicted histology classes were used to generate a histology classification map of the spatial distributions of different tissue regions. A patient-level NAC response prediction pipeline was trained with features derived from paired histology classification maps. The top graph-based features capturing the relevant spatial information across the different histological classes were provided to the radial basis function kernel support vector machine (rbfSVM) classifier for NAC treatment response prediction. Results The tile-level prediction pipeline achieved 86.72% accuracy for histology class classification, while the patient-level pipeline achieved 83.53% NAC response (pCR vs. RD) prediction accuracy. The histological class pairs with the strongest NAC response predictive ability were tumor and tumor tumor-infiltrating lymphocytes for pCR and microvessel density and polyploid giant cancer cells for RD. Conclusion Our machine learning pipeline can robustly identify clinically relevant histological classes that predict NAC response in TNBC patients and may help guide patient selection for NAC treatment.


INTRODUCTION
Triple-negative breast cancer (TNBC) is an aggressive breast cancer subtype that lacks expression of estrogen, progesterone, and human epidermal growth factor 2 receptors (1). TNBC accounts for 15%-20% of all breast cancers, affecting nearly half a million women in the US each year (2,3). The 5-year survival rate of TNBC patients is 15% lower than that of patients with other breast cancer subtypes (1). At the time of diagnosis, TNBCs tend to have a more advanced histologic grade and larger size compared to hormone-positive breast cancers (4). TNBCs also have higher recurrence and metastasis rates and typically metastasize to the brain, lung, and liver (5,6). No targeted or endocrine therapy is available for TNBC, and neoadjuvant chemotherapy (NAC) is the standard of care. NAC involves the use of chemotherapy prior to surgery to reduce tumor size, downgrade tumors amenable to resection, and improve long-term clinical outcomes. The primary endpoint of NAC is a pathological complete response (pCR), de ned as the absence of residual invasive disease (RD) in the breast and axilla.
pCR is an important predictor of disease-free survival and overall survival in patients with TNBC (7). Only 30-40% of TNBC patients achieve pCR with conventional NAC; the rest (~ 70%) either do not respond or respond partially to NAC. Non-responders and partial responders can be spared treatment side effects and offered alternative treatment regimens (e.g., a combination of NAC and immunotherapy) to improve outcomes and decrease morbidity (8-10). More recently, immunotherapy has shown success in TNBC management, and the FDA has approved pembrolizumab for use in combination with NAC in high-risk patients with early-stage TNBC (11).
Although the mechanisms underlying chemoresistance in TNBCs remain elusive, the marked inter-and intratumoral heterogeneity in TNBCs may contribute to variability in NAC response. Currently, there is a lack of multi-modal biomarkers that can stratify TNBC patients into NAC responders, partial or nonresponders, hindering personalized approaches for TNBC management. Furthermore, there is limited information on the robustness and accuracy of current biomarkers, e.g., Ki67, pH3, tumor-in ltrating lymphocytes (TILs), and histological features in predicting NAC treatment response individually or in combination. Traditional staining techniques provide limited information about the immune landscape (e.g., type of TILs). The low reproducibility and objectivity of traditional scoring methods also impair the clinical adoption of these markers. TNBCs are heterogeneous, and their tumor microenvironment (TME) represents a complex ecosystem of cellular components, such as tumor, stromal, and immune cells.
Communication between TME components and their spatial relationships affect cancer progression, treatment response, and disease outcomes (12,13). Studies have shown that the histomorphological components of the TME, such as a tumor, microvessels (MVD), polyploid giant cancer cells (PGCCs), immune cells, and necrotic areas, can help predict NAC response in TNBC (14). Advances in computing, imaging, and pathology have created new opportunities to explore the relationships between histology, molecular events, and clinical outcomes to help predict NAC response in patients with TNBC (15).
Manual histomorphological characterizations of hematoxylin and eosin (H&E)-stained tissues is timeconsuming and prone to inter-and intra-observer variability and fails to capture the TME spatial architecture, limiting their clinical value. Machine learning (ML) can more accurately and e ciently characterize the TME (16-18). ML outperforms humans in terms of accuracy and speed and can identify novel predictive features and spatial patterns beyond human recognition (16-18). The aim of this study was to develop an ML-based model to effectively predict NAC response (pCR or RD) in TNBC patients using spatial histological features from whole slide images (WSIs) of H&E-stained biopsy tissue sections.

Study population
H&E-stained pre-NAC core needle biopsies from treatment-naïve patients with TNBC were acquired from the Decatur Hospital, Georgia, USA, and the University of Galway, Ireland. The Decatur cohort was used as a discovery cohort for model development, and the Galway cohort was used as a validation cohort for the developed model. Patient samples with little to no tissue area, staining issues, or plating artifacts were excluded from the analysis. After this screening process, the nal sample sizes of the model development and validation cohorts for prediction analyses were 85 and 79, respectively (19).

Tile-level preprocessing
Before model training, representative histology regions in WSIs were annotated with contours for each histology class. For each contour, a bounding box was created within the ground truth area to extract the annotated tissue region. A sliding window of size 224 x 224 pixels was used to partition each WSI into image tiles. Only tiles overlapping the annotated areas by at least 90% were retained (Fig. 1C). The spatial containment query was invoked to identify the histology class for each tile. All image tiles were normalized by the stain color prior to model development (21). Additionally, the image channel associated with the hematoxylin stain was separated from each color image tile by color deconvolution (Fig. 1C) (22).

Tile-level histology feature extraction and classi cation
Tile-level histology features were derived from tissue tiles of different histology classes. After the preprocessing step, 468,043 tiles were produced from WSIs of model development cohort. The tile histology classi cation performance was evaluated by a strati ed 8-fold cross validation strategy (23)(24)(25). Additionally, an independent validation set was established with all tiles from the validation cohort. 80 tile-level features were derived from each tile by six image texture extraction methods: gray-level cooccurrence matrix (GLCM; Method S1A), Gabor lter (Method S1B), local binary patterns (LBP; Method S1C), Tamura (Method S1D), lower-order histogram (Method S1E), and higher-order histogram (Method S1E) (26-34). These 80 features were further reduced by excluding features that had 0, not applicable (NA), or repetitive values. 55 texture features were retained after this process (Table S1). Four ML classi ers were used to classify image tiles by the resulting feature set, including 1-nearest neighbor (1NN), linear support vector machine (linSVM), radial basis function SVM (rbfSVM), and ensemble tree (ensembleTree) with the RUSBoost method (35)(36)(37). For model development and validation, the 8-fold cross-validation mechanism was used. Each time, seven folds of data were used for training while the remaining one fold was used as the testing set (Fig. 1E). The strati ed 8-fold cross-validation method ensured that each data fold contained representative samples from each class and reliably assessed the tile histology class prediction performance. For each testing image tile, a trained classi er produced 16 probability values, one for each histology class of interest. Using the predicted tile histology class labels and tile spatial coordinates, we assembled tile level histology class labels spatially and produced tile level histology classi cation maps for each patient (Fig. 2). Each histology class was represented by a unique color in the classi cation maps, enabling the visualization of TME histology component distributions within a tissue context (Fig. 1F).

Spatial TME feature extraction and NAC response prediction
The leave-one-out cross-validation (LOO-CV) method was used to evaluate the patient-level NAC response prediction performance (38, 39). Similar to the tile-level histology class classi cation step, the model development and validation cohorts were used for model development and validation, respectively. As tumor cells interact closely with immune cells, stroma, PGCCs, and adipocytes in the TME (40-45), histology classi cation maps were generated for each patient. To better model the spatial relationships of these histology components in biological systems (46-53), we created TME graphs to characterize tissue TME states and the spatial interactions of tissue regions of paired histology classes. For each pair of histology classes (e.g., tumor and PGCC maps), a TME graph was constructed from the corresponding tile-level histology maps. A simple graph is undirected and unweighted, with V as the graph node set and E as the graph edge set (46, 49). Each tile cluster was determined as a spatially connected tile component, with all connected tiles sharing the same histology class label. The centroids of the resulting tile clusters were used as nodes in a graph (47). A graph edge between a pair of nodes and , i.e., was established using the Euclidean distance for a given pair of node histology classes. In this way, the spatial histology class distribution was represented by a TME graph structure (46). Next, a set of TME features was extracted from each TME graph. In total, 20 graph features related to texture feature averages, local node con guration, and global graph connectivity were produced for each patient (Table S2) (47). Speci cally, the texture feature averages were derived from the tile-level feature set (Method S2A). Local node con guration features were used to characterize the local neighborhood information (Method S2B), while graph connectivity features represented the global graph structures (Method S2C).
For an optimal prediction performance, we retained the top eight TME graph features using the importance weight method. The importance weight of graph features was determined using the ReliefF algorithm, a lter-method approach designed to solve classi cation problems with discrete or numerical features (54)(55)(56)(57). A feature has a lower importance weight if a feature value difference is observed in a neighboring instance pair with the same histology class (i.e., a 'hit'). By contrast, a feature presents a higher importance weight if a feature value difference is observed in a neighboring instance pair with different histology classes (i.e., a 'miss'). The top eight TME graph features ranked by importance weights were retained for patient-level NAC response prediction(58-66).
Four ML classi ers, including 1NN, linSVM, rbfSVM, and ensemble tree with the RUSBoost method, were used for patient-level NAC response prediction. Each patient was represented by the top eight TME graph derived features associated with histology class pairs. Each classi er produced two NAC response class probability values, one for pCR and the other for RD. A patient was predicted to belong to the NAC response class associated with a larger class probability.

Statistical analysis
Statistical analysis was performed by Python (Python Software Foundation, https://www.python.org/), MATLAB 2020a (Natick, MA, USA), and R (R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/). The ReliefF importance weights were used to assess the signi cance of the selected TME graph features (54-56, 67, 68). The resulting prediction performance was represented by a confusion matrix. The NAC response class pCR and RD were considered positive and negative group, respectively. A false positive (FP) was a RD case incorrectly predicted as pCR, while a false negative (FN) case was a pCR case incorrectly predicted as RD. Multiple evaluation metrics were computed, including Accuracy, Sensitivity (i.e. Recall), Speci city, Precision, and F1-Score. The tile-level histology classi er was evaluated by the strati ed k-fold cross-validation (k = 8), while the patient-level NAC response prediction was assessed by the LOO-CV.

ML classi er provides accurate tile-level histology classi cation in H&E-stained WSIs
Our results suggest that the 16 histology classes of interest were well differentiated by 55 tile texture features for the model development cohort (Table S1). With the strati ed 8-fold cross validation strategy, we used 1-fold of image tiles to train the classi er and test it with the remaining 7-folds in each round.
The average training accuracies of tile level histology classi cation by 1NN, linSVM, and ensemble tree were 63.86%, 61.33%, and 80.08%, respectively. Of the four classi ers that we trained and tested, the best performance was achieved by the rbfSVM with an average training and testing accuracy of 87.16% and 86.72%, respectively (Fig. 3) (Fig. 4). It was noticed that the rbfSVM classi er was good at recognizing classes such as stroma, tumor and adipocytes, but weak at recognizing apocrine or mucinous change.
When the best tile level histology classi er (rbf-SVM), trained using the model development cohort, was applied to the validation cohort, an average validation Accuracy of 83.59% was achieved (Fig. S1). The validation Accuracy for individual histology classes in the validation cohort ranged from 69.74-87.88%. Additionally, the classi er reached an average Recall ranging from 72.0-90.32%, an average Precision from 81. 43-84.19%, and an average F1-score from 76.11-86.92% for all histology classes (Fig. S2). The true histology class, the predicted histology class, and the image tile spatial coordinates with respect to each WSI were recorded for each tile. Histology classi cation maps were generated for visualizing spatial distribution of histological classes. Each histology class was assigned a unique color and the predicted histology class results were spatially assembled by the image tile spatial order (Fig. S3).
Spatial TME features of paired histology classes predict NAC response Using the importance weights from the ReliefF algorithm, we ranked all TME features from paired histology classes. For each histology class pair, we computed 20 TME features that were related to texture feature averages using a random geometric construct, local node-based features using a spectral construct, and global graph-based features using a minimum spanning tree construct (Table S2). The top eight histology pairs are presented in Fig. 5. For example, the histological class pair of tumors and tTILs with the largest importance weight is known to be strongly associated with pCR. In contrast, the histological class pair of microvessel density and PGCCs strongly correlate with RD. These results are in line with previously published studies (58-66).
With each TNBC patient represented by selected TME features, we trained and tested the classi er for NAC response prediction using LOO-CV strategy. The prediction Accuracies by 1NN, linSVM, and ensemble tree were 72.31%, 62.70%, and 70.09%, respectively. Similar to the tile-level histology classi cation, rbfSVM achieved the best NAC response prediction at the patient level with a prediction Accuracy of 83.53%. Out of 51 cases, 42 were correctly predicted as pCR. 29 out of 34 patients were correctly predicted as RD (Fig. 6A). The FP and FN groups included ve and nine misclassi ed patients, respectively, resulting in a speci city of 85.29% and sensitivity of 82.35% (Fig. 6A). Additionally, the receiver operating characteristic (ROC) curve of the best performing NAC prediction pipeline is presented in Fig. 6B with area under the curve (AUC) reaching 0.83.
The top eight graph derived features capture histological information from the TME that is critical for NAC response prediction. Figure S4 highlights representative tissue regions from where image tiles of the associated paired histology classes were derived for TME graph nodes. The resulting attention maps are tissue areas with high discriminating values for NAC response prediction, i.e., pCR vs RD (Fig. S4).
Validating the NAC response prediction performance in an independent cohort We validated the prediction performance of the pipeline in an independent cohort consisting of 79 WSIs.
Using the same image quality checks and preprocessing steps, we analyzed each WSI using the previously trained classi ers. The positive and negative group included 41 and 38 patients, respectively. Our prediction method correctly predicted 33 and 31 patients from the positive and negative group, respectively, with a prediction Accuracy of 81.01%. The FP and FN groups included seven and eight misclassi ed patients, respectively, resulting in a speci city of 81.58% and sensitivity of 80.49% (Fig. 7A).
Additionally, the ROC curve of the best NAC prediction pipeline is presented in Fig. 7B, with AUC reaching 0.83.

DISCUSSION
Women with TNBC exhibit signi cantly worse 5-year survival rates than those with non-TNBC, regardless of the tumor stage at diagnosis (69). No targeted or endocrine therapy is available for TNBC, and NAC is the cornerstone of treatment. However, only 30-40% of TNBC patients achieve pCR with NAC, and there is a dire need for early identi cation of the nearly 70% of patients who should be offered alternative regimens to improve treatment outcomes. In this study, we used ML approaches to predict NAC response and stratify patients into NAC responders and non-responders based on H&E-stained WSIs of tissue biopsies. We developed a two-step prediction model: in the rst step, the histology class of each H&E image tile was determined using a tile-level classi cation pipeline; in the second step, the spatial graphderived features associated with histology class pairs were used to predict patient-level NAC response (pCR vs RD). Our model unveils and leverages novel NAC response predictive features and spatial patterns of TME histology components from WSIs of TNBC tissue biopsies. This study also highlights the role of various TME components in accurately predicting NAC response. TME components and their interactions can in uence NAC response in patients with TNBC (70)(71)(72). Traditional methods using human annotations are unable to capture these spatial relationships. In contrast, our approach incorporates the spatial relationships of various TME components to predict NAC response in patients with TNBC. Using a graph structure for spatial TME characterization, we identi ed eight histology component pairs that accurately predicted NAC response. We expect that an investigation with higher-order combinations (e.g., tertiary and quaternary) can further increase NAC response prediction accuracy. The top three TME features captured the spatial interactions between (1) tumor cells and tTILs, (2) stroma and sTILs, and (3) tTILs and PGCCs. Studies have shown the predictive importance of tumor area, immune activation markers, and TILs in TNBC biopsies (73)(74)(75)(76). Our results provide further evidence that the interrelationships between TILs, stroma, adipocytes, and tumor cells can predict NAC response in patients with TNBC. Other recently published studies that have relied on WSI models (77,78) include one that used a federated learning model to predict NAC response in TNBC, and found hemorrhage, TILs, and necrosis as predictive of pCR and apocrine change, brosis, and noncohesive tumor cells being predictive of RD (77). Another study quanti ed the stromal and tumor features in a WSIbased multi-omic (WSI, clinical, pathological) ML model and found that high collagenous stroma was best associated with lower pCR rates (78). Our study used expert annotations that effectively guided the ML models to identify speci c histological patterns in spatial TME contexts. While our supervised ML model identi ed the common histological component of TILs, it did not rank hemorrhage, necrosis, brosis, and apocrine change as important predictors due to the lack of annotated training data.
Our NAC response prediction pipeline provides classi cation accuracy and attention maps that can be highly useful in clinical practice. Attention maps help pathologists and researchers by identifying tissue regions in a WSI that are highly predictive of NAC response, thereby improving slide review, reducing visual fatigue, and facilitating image data interpretation. Information from attention maps can be combined with other WSI derived data such as, Ki67 and pH3 immunohistochemistry-stained serial tissue sections, to train deep-learning models for enhanced prediction (79)(80)(81). Ki67 and pH3 are clinical biomarkers with demonstrated NAC response predictive value in TNBC tumors (82, 83). Furthermore, our predictive model is promising for integrating data from various sources, such as electronic health records, laboratory test results, and demographic information, to provide predictions based on the overall view of the health status of patients.
Limitations of the study include small sample sizes, slide quality issues, and expensive computational processes. Quality checks are necessary to ensure inclusion of adequate samples to develop effective training classi ers. The different slide staining protocols, artifacts, and plating variances from different institutions (e.g., cutting glass slide edges) may have resulted in inconsistencies in slide quality. Thus, although we had a larger number of WSIs to begin with, the nal validation cohort was whittled down. Because the sample size was small, there was an imbalance of histology classes presented among different patient slides. More histology classes (e.g., microcalci cation, muscle) should be included to improve the training of the tile-level histology classi er in all histology classes. We had two pathologists independently annotate the WSIs, however, more experts can be included in the future to validate the annotations and reduce interobserver variability. Additionally, our pipeline is computationally expensive because multiple processes occur throughout the pipeline such as partitioning gigapixel WSIs, calculating various feature measures for each tile, constructing graphs based on spatial relationships. Computational constraints can stem from institutional high-performance computing (HPC) server data loss, standard maintenance, and outages. Re ning the code for faster processing times (parallel processing) based on an advanced computer architecture could help support ML processes and data management. We cannot identify important spatially related histological features using image viewing software alone because the software is not scalable for large datasets. Each digital pathology software is limited in the amount of data processed through its graphical user interfaces before exceeding the computational capabilities.
Future work will include model validation in a larger cohort. Future work will also include the development of prediction models with higher-order feature combinations and graph convolution networks. It is important to develop an e cient pipeline to increase the amount of image data and decrease the computational time. Additionally, combinations of features with the highest predictive value will be used to increase the predictive power of the full feature set. For example, attention map regions can be leveraged to focus on regions of interest, which can be used for more complex analyses, such as imaging mass cytometry, to distinguish between the various TIL subtypes and to further re ne NAC response prediction. We also plan to extend our pipeline to incorporate other tissue stains, including immunohistochemistry. A more e cient pipeline can reduce the frequency of false negatives and thus, minimize risk of undertreating patients which can result in early relapse and poor outcomes.

CONCLUSIONS
Using feature engineering and supervised ML, we demonstrated the strong discriminating power of TME histological components and their spatial relationships in predicting NAC response in patients with TNBC. Among 120 histology feature pairs, we identi ed eight with the highest predictive value. The most predictive histology feature pair for pCR was tumor and tTILs, whereas microvessel density and PGCCs was the feature pair most strongly correlated with RD. The proposed ML pipeline can help identify tissue areas in H&E-stained WSIs with a high predictive value for NAC response prediction and can help in clinical decision-making.

Declarations
Ethics approval and consent to participate: All aspects of this study were approved by the Institutional Review Boards of the institutions involved. Patient consent was not required because all samples were archival.

Consent for publication: NA
Availability of data and materials: The data underlying this article will be shared on request to the corresponding author.
Competing interest: The authors declare no con icts of interest or nancial interest.

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download.