Deep learning model for predicting the presence of stromal invasion of breast cancer on digital breast tomosynthesis

To develop a deep learning (DL)-based algorithm to predict the presence of stromal invasion in breast cancer using digital breast tomosynthesis (DBT). Our institutional review board approved this retrospective study and waived the requirement for informed consent from the patients. Initially, 499 patients (mean age 50.5 years, age range, 29–90 years) who were referred to our hospital under the suspicion of breast cancer and who underwent DBT between March 1 and August 31, 2019, were enrolled in this study. Among the 499 patients, 140 who underwent surgery after being diagnosed with breast cancer were selected for the analysis. Based on the pathological reports, the 140 patients were classified into two groups: those with non-invasive cancer (n = 20) and those with invasive cancer (n = 120). VGG16, Resnet50, DenseNet121, and Xception architectures were used as DL models to differentiate non-invasive from invasive cancer. The diagnostic performance of the DL models was assessed based on the area under the receiver operating characteristic curve (AUC). The AUC for the four models were 0.56 [95% confidence intervals (95% CI) 0.49–0.62], 0.67 (95% CI 0.62–0.74), 0.71 (95% CI 0.65–0.75), and 0.75 (95% CI 0.69–0.81), respectively. Our proposed DL model trained on DBT images is useful for predicting the presence of stromal invasion in breast cancer.


Introduction
Breast cancer is the most common cancer affecting women worldwide, and its incidence and mortality rates are expected to increase [1]. Stromal invasion in cancer is closely related to local recurrence rates and metastatic potential [2] and is an important factor in determining 5-year survival rates [3]. Stromal invasion in breast cancer is currently assessed histologically using specimens obtained after surgery [4]. Based on the presence of stromal invasion by cancer cells, breast cancer is categorized as non-invasive or invasive, which affects the clinical flow of treatment [5]. As tumor invasiveness is closely correlated with metastatic potential and directly linked to treatment policy (e.g., additional lymph node dissection and chemotherapy), preoperative diagnostic reliability is essential for determining the presence of stromal invasion [4]. Nevertheless, preoperatively diagnosed non-invasive cancer sometimes yields a final diagnosis of invasive cancer. Since a preoperative biopsy assesses only a partial segment of the cancer, it is not always sufficient to make a complete diagnosis of non-invasive or invasive cancer [4].
Recently, digital breast tomosynthesis (DBT), a tomographic imaging technique newer than full-field digital mammography (FFDM), has been increasingly applied in clinical practice [6,7]. DBT allows volumetric reconstruction of the entire breast from several two-dimensional projections obtained using different X-ray tube angles [8]. The cancer detection sensitivity in DBT has been shown to be higher than that in conventional FFDM because it reduced breast tissue overlap and provides better visualization of lesions [9,10]. Therefore, DBT is better than conventional FFDM for assessing the morphological features of tumors [11].
Artificial intelligence, particularly deep learning (DL) algorithms, is gaining extensive attention because of its excellent performance in image diagnosis and prediction [12,13]. Recent studies have demonstrated the usefulness of DL models to diagnose breast cancer using DBT [14][15][16].
Many studies have reported that the morphological features of tumors are closely associated with their invasiveness [17,18] and DBT is one of the best imaging techniques for assessing the morphological features of breast cancer. We hypothesized that a DL model trained on DBT images would accurately predict stromal invasion in breast cancer.
This study aimed to develop a DL model to predict stromal invasion in breast cancer using DBT.

Patients
The Institutional Review Board approved this retrospective study and waived the requirement for informed consent. The dataset used in this study is identical to that used in a previous reported study [16]. A previous study aimed to develop a DL model to compare bilateral differences in breast tissue. This study aimed to predict the presence of stromal invasion in patients with breast cancer using DBT. The patient selection flow in this study differed from that in a previous study. Figure 1 shows the inclusion and exclusion criteria for patient enrollment. Initially, 499 patients (mean age 50.5 years, age range, 29-90 years) who were referred to our hospital under suspicion of breast cancer were admitted and underwent DBT between March 1, 2019, and August 31, 2019, were recruited for this study. Of the 499 patients, 186 with pathologically confirmed breast cancer who underwent surgery were enrolled. Among these 186 patients, the following were excluded: those whose tumors were detected by methods other than DBT (n = 24); those who underwent neoadjuvant chemotherapy (n = 7); those who underwent partial mastectomy (n = 4); those who were diagnosed with phyllodes tumor (n = 4), lobular carcinoma in situ (n = 2), or microinvasive carcinoma (n = 2); and those with clip markers after vacuum-assisted biopsy (n = 3). Ultimately, 140 patients with breast cancer were included in the analysis. Based on pathological reports, 140 patients were classified as follows: non-invasive cancer (ductal carcinomas in situ [DCIS]) (n = 20) and invasive cancer (n = 120; invasive ductal carcinomas in 96 patients, invasive lobular carcinoma in 14, mucinous carcinoma in five, invasive micropapillary carcinoma in three, and invasive cribriform carcinoma in two). Based on the TNM classification (8th edition), the pathological tumor stages (pT) of the carcinomas were as follows: pTis in 20 patients, pT1 in 86, pT2 in 30, and pT3 in four.

Clinical interpretation of tomosynthesis images
Anonymized images of both breasts in the mediolateral oblique (MLO) view were used. All DBT images were acquired using a 3-Dimensions Mammography System (Hologic Inc., Bedford, MA, USA). The scanning parameters of the DBT images were as follows: peak ranges of 26-45 kV; current ranges of 140-200 mA; exposed time ranges of 154-489 ms; force ranges of 26.7-191.4 N; thickness ranges of 13-108 mm; and absorbed dose ranges of 0.0092-0.0688 Gy. The total tomographic angle range was 15° (-7.5° to 7.5°), consisting of 15 projection views taken at increments of 1. The interslice interval was 1 mm, and the resolution was 70 μm × 70 μm per pixel.
All breast lesions were diagnosed by a radiologist with 30 years' experience. The radiological findings of all 140 patients (a total of 2860 images) were as follows: 81 lesions (a total of 1177 images, including seven non-invasive cancers and 74 invasive cancers) were depicted as mass; 50 lesions (a total of 568 images, including 13 non-invasive cancers and 37 invasive cancers) were depicted as calcification; 78 lesions (a total of 807 images, including four non-invasive cancers and 74 invasive cancers) were depicted as distortion; and 20 lesions (a total of 308 images, including three non-invasive cancers and 17 invasive cancers) were depicted as focal asymmetric density (FAD). When multiple findings were observed in a patient, the images with each finding were counted independently. To generate an image dataset for the DL model, the lesions were annotated by a radiologist with over 5 years of experience. All lesions were cropped and resized to 256 × 256 pixel grayscale images with a 16-bit depth.

Deep learning model training and implementation
In this study, we used four well-established convolutional neural network architectures for the DL model: VGG16 [19], ResNet50 [20], DenseNet121 [21], and Xception [22], which are widely recognized for their performance in computer vision tasks. These models have been extensively applied in various research domains, demonstrating their robustness and adaptability to diverse applications. The weight of the network was initialized according to a pretrained model on ImageNet [23]. Bootstrap sampling and random undersampling were used to train the DL model without bias. Bootstrap sampling and random undersampling methods have been shown to train DL models symmetrically and without bias, even for datasets with class imbalance [24,25]. Figure 2 illustrates the use of these two methods to resolve class imbalance in minibatch learning. The DL model was trained with the same ratio of invasive to non-invasive cancers in all minibatches. The training parameters were set to batch sizes of 200 and 400 epochs. RAdam [26] was used for optimization, and binary cross entropy was used as the loss function. The networks were implemented using a machine with the following specifications: Intel Core i7-7800X Central Processing Unit with a 6 K core memory, and a 48 GB NVIDIA RTX 8000 Graphics Processing Unit. The operating system was an Ubuntu 18.04.5 long-term support Xenial Xerus. All analyses were performed using Python, version 3.8.2 (Python Software Foundation, http:// www. python. org). Keras 2.2.0 with TensorFlow 1.9.0 as the backend was used as the DL framework. Figure 3 illustrates the DL model assessment and overall research framework used in this study. To assess the performance of our model, we used five-fold cross-validation, in which 80% of the data were used for training and 20% for validation. In the cross-validation assessment, validation accuracy was calculated five times by changing the DBT images included in the training and validation datasets. To ensure a comprehensive assessment of the model performance while addressing the difficulties arising from the limitations of the available data, we adopted a fivefold cross-validation strategy. In this process, all data were separated by patient to avoid obtaining DBT images of the same patient in both the training and validation sets. To assess the diagnostic performance of the DL model, quantitative classification measurements were performed using the area under the receiver operating characteristic curve (AUC).

Deep learning model assessment
For additional assessment of the diagnostic accuracy of the radiological findings of breast cancer, we divided the validation datasets into four groups: a validation dataset of mass lesions (mass dataset), validation dataset of calcification lesions (calcification dataset), validation dataset of distortion lesions (distortion dataset), and validation dataset of FAD lesions (FAD dataset). The diagnostic accuracy of the best-performing DL model was assessed independently using an accuracy metric to investigate the relationship between radiological findings and diagnostic performance. In this analysis, the cutoff value was set at 0.5. The accuracy was calculated as follows: where TP, FN, FP, and TN represent the true positives, false negatives, false positives, and true negatives, respectively. TP indicates that invasive cancer was correctly diagnosed Accuracy = (TP + TN)∕(TP + FN + FP + TN) × 100% , Fig. 3 Flowchart of the five-fold cross-validation and research framework. In this process, all data were separated patient-wise to avoid mixing DBT images of the same patient in the training and validation data. K is the number of folds in the cross-validation as invasive cancer, whereas TN indicates that non-invasive cancer was correctly diagnosed as non-invasive cancer.  Table 1, was Xception. Figure 4 displays the ROC curves of the VGG16, Resnet50, DenseNet121, and Xception models, representing discrimination aimed at distinguishing invasive cancer. Table 2 provides the associated confusion matrix for the Xception model, which exhibited the best diagnostic performance among the models. In this analysis, the cutoff value was set at 0.5.

Results
The diagnostic accuracies of the four groups with different radiological findings, as shown in Table 3, were as follows for the Xception model, with cutoff value of 0.5:0.900 for the mass dataset, 0.799 for the calcification dataset, 0.941 for the distortion dataset, and 0.933 for the FAD dataset.

Discussion
In the present study, our DL model trained on DBT images accurately predicted the presence of stromal invasion in breast cancer, which is related to the morphological features of the tumors [17,18]. Our DL model predicted invasion with high performance, which suggests that it could extract morphological characteristics that reflected the radiologicalpathological correlation between morphology and invasion.
The performances of these models, as measured by the AUC, are presented in Table 1. The observed variation in the AUC can be discussed from the perspective of model complexity and diagnostic accuracy. In this study, the models with higher AUC values tended to have fewer parameters. Specifically, VGG16 had the largest number of parameters, whereas Xception had the smallest. It has been suggested that models with a large number of parameters are prone to overfitting, especially when dealing with limited or unbalanced datasets such as medical images. For VGG16, a large number of parameters could have led to overfitting, resulting in a lower AUC value. In contrast, models other than VGG with a smaller number of parameters may be better able to generalize unseen data, leading to a higher AUC value. Additionally, architectural differences among the models may have contributed to their varying performances.   ResNet50 and DenseNet121 incorporate skip connections that enable a more efficient gradient flow during training, potentially leading to better feature learning and improved classification performance. Xception, with its depthwise separable convolutions, was designed to efficiently capture spatial and channelwise information, which may have contributed to its superior performance in this task. In Fig. 5, we present examples of cases in which our DL model successfully discriminated between non-invasive and invasive breast cancers by illustrating TP, FN, FP, and TN cases. Figure 5A shows a case of TP in which the DL model accurately predicted invasive cancer. The image shows massforming breast cancer with spiculated margins, which is a characteristic feature of invasive cancers. This typical example of an invasive cancer was correctly classified using the DL model. Figure 5B shows an FN case in which the model incorrectly predicted non-invasive cancer as an invasive cancer case. The image displays a well-defined, smoothly contoured mass in the dense breast tissue. Although the margin of the tumor was obscured by the dense breast tissue, it may not have been detected by the DL model. Figure 5C shows an FP case in which the model incorrectly predicted invasive cancer as a non-invasive cancer case. The image shows a well-defined, lobulated mass. Noncalcified fibroadenoma may be considered a differential diagnosis; however, lobulated breast cancer cannot be excluded by imaging interpretation alone. Consequently, it may be inferred that this case presented a diagnostic challenge for the DL model as well. Figure 5D shows a TN case in which the model correctly identified non-invasive cancers. The image shows an area with amorphous-to-pleomorphic calcifications arranged in a segmental distribution suggestive of DCIS with a predominant intraductal component. In such cases, the DL model could be classified accurately.
Our results suggest that the diagnostic performance for detecting stromal invasion in breast cancer differs according to the radiologic findings. The mass and distortion datasets showed higher diagnostic accuracy in identifying stromal invasion than the average accuracy of all validation datasets. Invasive cancers commonly form solid components that typically show characteristic radiological findings indicating their invasiveness, such as irregular margins and distortion of the surrounding breast tissues. Although distortion is sometimes difficult for radiologists to detect during clinical assessments, it strongly suggests the pathological presence of stromal invasion in breast cancer. Our DL model successfully learned such radiological-pathological correlations to detect stromal invasion in breast cancer. Although calcification is often pathognomonic in the diagnosis of breast cancer during clinical radiological assessments, the calcification dataset showed a lower diagnostic accuracy in identifying stromal invasion. O'Flynn et al. [27] reported that widespread microcalcification is associated with the risk of breast cancer invasion. In the present study, data on the wide spread of microcalcifications were not reflected in the DL model because all cropped DBT images were resized to 256 × 256 pixels. Therefore, the spread of calcification may be lost, leading to a decline in accuracy. In future studies, the diagnostic accuracy is expected to improve by training DL models to learn the spreading of microcalcifications. Although lesions presenting as FAD are sometimes difficult to differentiate in terms of malignancy from benign lesions, in clinical radiological assessments, the FAD dataset showed a higher diagnostic accuracy than all other radiological findings in identifying stromal invasion. However, this result may be biased because the FAD lesions used in this study were all pathologically proven breast cancers, and common FAD lesions with asymmetric differences in normal breast tissue were not included.
Our findings suggest that the DL model is a useful tool for predicting stromal invasion in breast cancer preoperatively. Early suggestions regarding the risk of stromal invasion are useful for helping breast surgeons determine treatment plans ahead of a final pathological diagnosis after surgery and for enhancing patient comfort. Preoperative pathological examinations sometimes show false-negatives because only a partial area of breast cancer is sampled through preoperative needle biopsies. In such cases, treatment plans may be changed through additional chemotherapy or lymph node sampling/resection after the final pathological diagnosis of invasive breast cancer is made from a surgical specimen. In contrast, our DL model may provide a more comprehensive assessment of the entire tumor rather than focusing on the limited tissue samples available for preoperative diagnosis. When a discrepancy regarding the presence of stromal invasion is detected between the results of the DL model and the preoperative pathological diagnosis, preoperative pathological resampling and reconfirmation may be considered. In our study, we adopted a slightly different approach from conventional DL applications in breast cancer screening research, focusing on the important clinical challenge of differentiating between non-invasive and invasive cancers, which may influence decision-making in treatment strategies. Our approach, which is grounded in clinical evidence, considers the significance of addressing this issue to aid in treatment planning and predicting patient outcomes, thereby emphasizing the novelty of our research.
Our study had several limitations. First, this was a retrospective study, and because of the limited size of the data, no assessments were performed using an independent test dataset. The limited size of our dataset, necessitating the use of fivefold cross-validation instead of the standard practice of splitting data into training, validation, and test datasets, might have affected the generalizability of our results. While training and evaluation were undertaken using cross-validation, bootstrap sampling, and random undersampling approaches to ensure objectivity and minimize overfitting induced by disparities in the number of cases, it is imperative to conduct further prospective investigations. Specifically, before considering clinical applications, these should be conducted with more extensive, independent test datasets to validate our model and confirm its robustness and generalizability. Second, we applied the cropped images of the tumor to the training data of the DL model. Detection of the location of the lesion from an entire breast image is essential for the clinical application of the DL model as well as for tumor classification. Third, the study was limited to patients with breast cancer. Benign lesions and normal breast tissues, such as FAD, were not included in this study. Therefore, although the identification of breast cancer from noncancerous lesions is of significant importance in clinical applications, our DL model cannot differentiate between breast cancer and noncancerous lesions. Nevertheless, our DL model appears to be beneficial for clinical applications when a treatment plan is considered after a lesion yields a diagnosis of breast cancer.

Conclusion
The results of this study suggest that our DL model trained on DBT images predict the presence of stromal invasion in breast cancer. Although model calibration and further validation in a larger population are required, our DL model is potentially applicable as a clinical decision support tool for preoperative treatment planning for breast cancer.