Both MRI- and CEM-based models are able to distinguish the breast cancers with different IHC-based subtypes using a radiomics-based machine learning approach. CEM-based model performs numerically better than the MRI-based model (71.5% vs. 70.2%), although this is not statistically significant (p-value = 0.82). The similar performance between CEM- and MR-based models is somewhat surprising given that CEM only provides a two-dimensional view of the tumor, whereas MRI provides a complete three-dimensional view. We also notice that the overall predictive performance of the HR-HER2- in terms of accuracy, PPV, recall, sensitivity, and AUC is superior to other classes both across CEM and MRI test cases. These results provide evidence that CEM imaging could be as informative as MRI from a machine learning perspective. The unexpected performance of CEM-based model could be potentially attributed to its higher resolution (nearly ten times) in comparison to MR images. High-resolution images preserve the details of the tumor, especially the geometric (or shape-based) features that consistently remain the prominent features in all the classifications, and resolves the presence of microcalcifications.. Alternatively, this could be explained by the larger size of the CEM cohort, which could lead to the slightly better model training and prediction.
While the problem of IHC subtype classification has been recently studied in the literature [27; 28], radiomics-based predictive models are still emerging. Because radiomic features can be automatically extracted from segmented images, they allow fast, quantitative, and reproducible features. This is varied from the current state of BI-RADS classification, which requires trained experts, and has been shown to demonstrate both inter-and intra- reader expert variability [30; 31]. Similar studies have emerged in recent years that focus on the classification of tumor subtypes using radiomics, clinical features, BI-RADS, or a combination. For instance, Wu et al. [32] employed BI-RADS features to classify four different IHC subtypes: Luminal A, Luminal B, HER2, and basal-like breast cancer achieving an accuracy of 74.1% on a cohort of 363 patients. Leithner et al. [28] employed radiomic signatures extracted from CEM images to develop a predictive model using 91 patients from one institution and validated on another institution consisting of 52 patients with an accuracy of 79.4% for Luminal A vs. Luminal B and 77.1% for Luminal B vs. TNBC. However, the authors did not report the recall and specificity of their performance. More recently, Son et al. [27] performed the prediction of IHC subtypes using radiomics signatures of synthetic mammography constructed from the digital breast tomosynthesis (DBT) for a cohort of 365 patients with an accuracy of 81.7%, 76.1%, and 56.3% for TNBC, HER2, and luminal A and B, respectively in an one class vs. others framework using the craniocaudal (CC) view. There was no improvement in the performance when the features from the CC and MLO views were combined. Similarly, our work also demonstrated no significant difference between performance of the model when using CC versus MLO views.
We also studied the importance of radiomic features (without performing kernel PCA) using a game-theoretic approach known as Shapley values [33]. In agreement with existing studies, we note that several shape-based features emerged as prominent features in both the MRI- and CEM-based models. Some of the features that were consistently prominent across all the predictive models include: shape sphericity, axis length, shape flatness, and shape surface area. Specifically, we noted that for HR-HER2- patients, the tumors were consistently round and spherical in shape, whereas for HR + HER2- patients, the tumors were irregular. These observations are aligned with the findings reported in the literature and observed in clinical practice. For instance, Son et al. [27] reported that triple-negative tumors tend to be round or oval. In addition to the shape-based features, we also noted several intensities and correlation-based features to be significant in model prediction, particularly, first-order features such as correlation and entropy extracted from gray level co-occurrence matrix and gray level dependence matrix.
From the present study, as well as other recent reports in the literature, it is evident that radiomic features are effective in distinguishing IHC subtypes. Limitations include classic large \(p\) small \(n\) problem in machine learning, [34] caused by limited number of patients (\(n =\) 170 for CEM and \(n=\)124 for MRI) studied in relationship to the high-dimensionality of features in the dataset (\(p\)= 960 for MRI features). This also limits the development of multi-class predictive models [35]. Class imbalance leads to poor precision and recall performance of the predictive models, and while synthetic resampling strategies could help augment the existing datasets, they seldom improve the predictive performance. The issue of class imbalance could partially be addressed by analyzing larger cohorts. To avoid a problem of data harmonization from inter-scanner and/or inter-radiologist variability, we use data collected from a single scanner annotated by the same set of radiologists, which leads to limited generalizability. The high model complexity and black-box nature of machine learning models employed limits their interpretability. The authors' ongoing work is focused on making these machine learning models more interpretable so that the inference generated from these models may not only lead to more understanding of the biology but also to informing practitioners in the decision-making process. Interpretability and model fairness also allow for monitoring against potential biases associated with the underrepresentation of racial minorities in most datasets. In general, these limitations are being addressed via multi-institutional collaboration for the generation of much larger and diverse datasets for generation, and comparison of different models.
For the breast MRI image analyses in this study, we utilized only dynamic contrast-enhanced images and did not incorporate the associated T2 weighted imaging in our analysis [36; 37]. The expectation would be that these sequences would provide additional surrogates for biological data of the tumors and will be included in future studies. The patients included in the study had biopsy-proven invasive breast cancer prior to undergoing MRI and CEM contrast-enhanced imaging, where by post biopsy change may confound our results. The heterogeneous enhancement that can occur in a post biopsy bed can alter the appearance of the native tumor. However, this is standard clinical care and beneficial to build models as imaging in true clinical practice is available.