Clinicopathologic characteristics of all breast lesions
The clinicopathologic characteristics and subgroups of all 1271 breast lesions are summarized in Table 1. Among the 1271 BI-RADS 4 lesions, 749 (58.9%) were malignant and 522 (41.1%) were benign, as shown in Table 2. The mean age of the entire cohort was 45.40±9.65 years (range, 19–79 years); the mean age in the malignant group was greater than that in the benign group (51.29±7.29 vs. 39.56±6.73 , P < 0.001), and there was no significant difference in the mean age between the training and the validation cohorts (P > 0.05 for all). The maximum diameter of the malignant group was larger than that of the benign group (19.98±5.22 mm vs. 14.55±7.74 mm, P < 0.001), but there was no difference between the training cohorts and the validation cohorts (P > 0.05 for all). The lesions were divided into subgroups depending on the maximum diameter. The subgroup with the maximum diameter of lesions (MD) ≤15 mm included 218 lesions (167 in the training cohort and 51 in the validation cohort; 89 benign and 129 malignant); the subgroup with MD between 15 mm and 25 mm included 779 lesions (598 in the training cohort and 181 in the validation cohort; 327 benign and 452 malignant); and the subgroup with MD >25 mm included 274 lesions (206 in the training cohort and 68 in the validation cohort; 112 benign and 162 malignant).
Quantitative SWE parameters of all breast lesions
Considering 1271 lesions, SWV values of the malignant group were significantly higher than those of the benign group, including the intratumoral stiffness (SWV1: 3.76±0.78 m/s vs. 1.85±0.65 m/s; P < 0.01) and the peritumoral stiffness (SWV5: 4.02±0.82 m/s vs. 1.67±0.74 m/s; P < 0.01).
SWV5 values were significantly higher than SWV1 values in the malignant group; SWV5 values were lower than SWV1 values in the benign group (all P < 0.01). SWV5 and SWV1 values in the subgroup with 15 mm <MD ≤25 mm were higher than those in the subgroups with MD ≤15 mm and MD >25 mm, because there were more malignant lesions in this subgroup (all P < 0.01).
Consistency analysis of lesion segmentation manually vs. CNN model
We calculated the average values of the segmentation metrics’ intra- and interrater consistency of the three radiologists, as shown in Table 3. Wilcoxon signed-rank tests were conducted, and Pearson’s correlation coefficients were calculated using geometric features extracted from pairwise comparison metrics in the radiologists’ segmentations. Wilcoxon tests indicated that the CNN segmentation model satisfactorily matched the performance of the radiologists regarding sensitivity, specificity, the Dice coefficient, Cohen’s kappa, and 95% Symmetric Hausdorff Distance (P > 0.05). The Dice coefficient and Cohen's kappa had the best values in Radiologist 2-CNN (0.83; 0.82); the specificity had the best values in Radiologist 3-CNN (0.99); and the 95% Symmetric Hausdorff Distance had the best values in Radiologist 1-CNN (1.19 mm) (Table 3). The box-plot diagrams show the intra- and interrater consistency of the radiologists and the CNN model compared to the radiologists (Fig. 6). The Pearson’s correlation coefficients showed a strong relationship between the area, major and minor axis length between all of the observers (radiologists vs. CNN model r = 0.98, 0.97, 0.99). As shown in Bland–Altman plots, the differences between the CNN model and the three radiologists in segmentation area, major axis length, and minor axis length were almost 0 (Fig. 7).
Diagnostic performance of the quantitative SWE parameters, US CNN model, and SWE CNN models for predicting breast cancer in the training and the validation cohorts
The diagnostic performances of quantitative SWE parameters, US CNN models, and SWE CNN models in both the training and the validation cohorts are summarized in Table 4.
Among these three single models, the 1.0 mm SWE image CNN model had the highest AUC, ACC, sensitivity, and specificity for predicting breast cancer in the subgroup with MD ≤15 mm both in the training and in the validation cohort (0.81, 79.26%, 68.86%, 82.52% vs. 0.75, 74.49%, 62.97%, 78.53%).
In the subgroups with 15 mm <MD ≤25 mm and MD >25 mm, the 2.0 mm SWE image CNN model had the highest AUC, ACC, sensitivity, and specificity both in the training cohort (15 mm <MD ≤25 mm: 0.85, 82.64%, 66.24%, 80.33%; MD >25 mm: 0.84, 80.34%, 69.34%, 82.63%) and the validation cohort (15 mm <MD ≤25 mm: 0.81, 78.87%, 63.44%, 76.85%; MD >25 mm: 0.78, 77.73%, 65.73%, 77.64%).
Regardless of the grouping method, the AUCs and ACC of the SWE image CNN models, except for the 0.5 mm and the internal SWE image CNN models, were all better than SWV5 and the US CNN model in predicting breast cancer (all P < 0.05). There was no significant difference between the SWV5 and the US CNN model in sensitivity, specificity, and AUCs (all P > 0.05, Fig. 8).
Diagnostic performance of the US + SWE dual-modal CNN model for predict breast cancer in the training and the validation cohorts
In both the training and the validation cohort, the overall performances of the US + SWE image CNN models were slightly higher than those of the corresponding single CNN models.
The US CNN + 1.0 mm SWE model achieved the highest AUC for MD ≤15 mm both in the training cohort (0.94) and in the validation cohort (0.91) (Fig. 8). In the subgroup with MD ≤15 mm, the US + 1.0 mm SWE CNN model achieved the highest ACC, sensitivity, and specificity in the training cohort (88.67%, 78.53%, 85.63%, respectively) and the validation cohort (85.54%, 77.72%, 82.52%, respectively).
In the subgroups with 15 mm <MD ≤25 mm and MD >25 mm, the US CNN + 2.0 mm SWE model achieved the highest AUCs both in the training cohort (0.96, 0.95, respectively) and in the validation cohort (0.93, 0.91, respectively) (Fig. 8). Comparing the results obtained by the US + SWE dual-modal CNN model to those of the US images CNN model, there were 6.2%, 5.7%, and 8.7% average percentage increases for ACC, sensitivity, and specificity, respectively, where the improvement in specificity was significant, indicating that the dual-modal CNN model can improve specificity without loss of sensitivity for classifying breast cancer.
Similarly, the US + 2.0 mm SWE CNN model achieved the highest ACC, sensitivity, and specificity in the training and the validation cohorts for 15 mm <MD ≤25 mm (91.34%, 75.63%, 86.98% and 90.76%, 72.53%, 84.22%) and MD >25 mm (86.65%, 80.32%, 87.44% and 84.23%, 77.38%, 84.75%).