Based on our previous study that initially identified 27 independent variables , we selected 7 independent variables, namely, direction, margins blur, margins angulation, margins microlobulation, margins burr, posterior echoes, and surrounding tissue edema, in this study to develop six machine learning models for BC diagnosis. The logistic model showed superior performance, with an ROC of 0.771 and 0.906 in the test set and the validation set and Brier scores of 0.18 and 0.165, respectively. As such, we recommend using a logistic regression model fitted with ultrasound imaging features for BC diagnosis, particularly in primary hospitals.
Logistic regression can identify important predictors of BC using odds ratios and generate confidence intervals that provide additional information for decision-making . In our logistic regression model, tumor margins burr and the direction of tumor growth had a relatively large impact on the judgment of benign and malignant tumors. The odds ratio (OR) were 3.267 (2.013–5.303) and 4.281 (3.098–5.917), respectively. This is consistent with the findings reported by Chhatwal et al.  that the most important predictors associated with BC as identified by this model were spiculated mass margins. In the current study, the OR value represents the ratio of the risk of malignant BC based on the existence and absence of a certain ultrasound feature. The greater the OR value (OR > 1), the greater the risk of malignancy in the presence of the feature. Direction of tumor growth, non-identifiable and burr at the margins, and edema of the surrounding tissue showed the highest OR values, indicating that non-parallel growth, non-identifiable margins burr, and edema of the surrounding tissue are the most important factors for predicting malignant BC. This is consistent with the findings of previous studies. Nianan  reported that non-parallel growth and irregular morphology are the most important predictors of BC in the new version of the BI-RADS. Some studies have also shown that axillary lymphadenopathy is indicative of the probability of metastasis in BC [22–23].
The average AUC of models in the test set was 0.741 ± 0.052, and the average AUC in the validation set was 0.880 ± 0.025. The overall performance of the model in the validation set was better than that in the test set. Compared with internal verification, external verification is more concerned with model transportability and generalizability [24–26]. Thus, we believe that the predictive model can be applied generally across population samples and has good promotion significance.
When compared with clinician diagnosis, the logistic regression model showed lower accuracy (0.906 vs. 0.772) and AUC (0.913 vs. 0.906). When model performance was evaluated by type of hospital (tertiary class A hospitals and primary hospitals), the model performed better in primary hospitals than in tertiary class A hospitals. This may be due to the different distribution of benign and malignant tumors in both groups. The proportion of benign tumor patients was significantly higher in primary hospitals (n = 892, 85.93%) than that in tertiary class A hospitals (n = 575, 62.02%). For complex malignant tumors, predictions based on models alone is more likely to be biased. In primary hospitals, the accuracy of clinician diagnosis was higher than that of the logistic model (0.929 vs. 0.806), and the AUC of clinician diagnosis was also slightly higher (0.913 vs. 0.906). Similarly, the accuracy of clinician diagnosis in tertiary class A hospitals was higher than that of the logistic model (0.880 vs. 0.734). The AUC of clinician diagnosis was also slightly higher than that of the logistic model (0.890 vs. 0.875). The high sensitivity of clinician diagnosis in tertiary class A hospitals indicates that clinicians have a greater probability of accurately diagnosing malignant tumors, and the possibility of missed diagnosis is lower. Meanwhile, the high specificity of clinician diagnosis in primary hospitals indicates that clinicians in these hospitals can accurately diagnosis benign tumors, and the possibility of misdiagnosis is lower. Although there was no significant difference in AUC between the models and clinician diagnosis, the accuracy was markedly different. This may be caused by the imbalance in the distribution of samples between the malignant group and the benign group. Subsequent studies should validate the usefulness of the model by using equally distributed samples, particularly in primary hospital population alone. This will ultimately help establish the use of the model in primary hospitals.
Our models enable the prediction of BC and can thus be used by clinicians to make appropriate patient management decisions. As shown in Fig. 3, the predictive capability of the models ranged from 0.2 to 0.4. We analyzed the model prediction probabilities according to 1%, 2%, 5%, 10%, 50%, 90%, 95%, 98%, and 99% and applied the logistic model in the clinic for preliminary evaluation of BC. If the predicted probability was lower than 1% of the population (corresponding to a predicted probability of 0.2158926), it is highly likely that patients do not have to undergo pathological biopsy. Malignancy can be largely ruled out, and the patient can undergo regular follow-up. When the predicted probability is higher than 90% of the population (corresponding to a predicted probability of 0.8769365), it is highly indicative of malignant lesions, and clinicians are required to intervene. Patients should immediately undergo a pathological biopsy to confirm malignancy. For patients whose predicted probabilities are in between these values, a short-term follow-up (within 1 year, preferably 3 to 6 months) can be recommended . The clinicians can further use the models to assist in decision-making according to the follow-up outcomes. However, the cut-off value of the predictive probability needs to be verified and calculated in studies with a larger sample size to improve the accuracy.
This study has some limitations. First, this study was mainly an external verification of the previous model. The independent variable in the model population is different from the verification population, which may cause a selection bias. Second, this study did not modify and improve the model because of the imbalance in the distribution of the predictor variables and classification, and thus the model has low accuracy. Future studies should take measures to account for accuracy in the modeling process. Third, this study did not collect demographic information and baseline data of the patients, and it was difficult to balance the patient baseline in the pre-modeling stage. This may have affected the performance of the model and introduced confounding factors. Further studies are needed to improve model accuracy and to establish a more balanced clinical prediction model that can be used not only during diagnosis, but also at follow-up.
In conclusion, of the six machine learning models, the logistic regression model showed the highest predictive capability and generalizability, indicating its potential for application in primary hospitals. The model showed similar predictive performance to clinicians. Further, it had better predictive capability in primary hospitals than in tertiary class A hospitals model. Collectively, these findings indicate that the model can help clinicians in distinguishing between benign and malignant breast tumors.