Improving the diagnostic performance of ultrasound in classifying breast lesions: the potential value of S-detect for residents-in-training

Background To explore the potential value of S-Detect™, a high-end computer-assisted diagnosis (CAD) software system for residents-in-training. Methods Routine breast ultrasound (US) examinations were conducted and assessed by an experienced radiologist. Archived images of the lesions (including grayscale, color Doppler ow and elastography images) were retrospectively assessed by each of ve in-training residents who were blinded to the histopathological ndings and any other US diagnosis. The diagnostic performances of S-Detect™ and the ve residents were measured and compared. Afterwards, category 4a lesions assessed by the residents were downgraded when classied as possibly benign by S-Detect™. The diagnostic performance of the integrated results was compared with the original results of the residents. Results A total of 195 focal breast lesions were consecutively enrolled, including 82 malignant lesions and 113 benign lesions. S-Detect™ presented higher specicity and area under the curve (AUC)than the residents. After combination with S-Detect™ in category 4a lesions, the specicity and AUC of the ve residents were signicantly improved. The intraclass correlation coecient (ICC) of the ve residents also increased after integration. Conclusions With the help of the CAD software, the specicity, overall diagnostic performances and interobserver agreements of the residents greatly improved. S-Detect™ can be utilized as an assistant tool for residents-in-training in classifying breast lesions.


Background
On account of the increasing incidence rate of breast cancer in the past decade, it has become a growing public health concern worldwide 1 .Early detection of breast cancer can largely improve patient prognosis 2,3 . As an important adjunctive tool to mammography, ultrasound (US) has shown great potential for diagnosing breast masses, especially in dense breast tissue, allowing identi cation of the masses that are occult on mammography 4 . In consideration of the accessibility and cost-effectiveness, US has become the most popular imaging method for breast screening in China, which has also been proved to perform superior or not inferior to mammography. 5 Nevertheless, low speci city and high interobserver variability remain problematic disadvantages for US, especially for residents who have only received short-term training in breast US [6][7][8][9] .Although the Breast Imaging Report and Data System (BI-RADS) lexicon was put forward by the American College of Radiology [10][11][12] , residents-in-training are still inclined to have relatively poor diagnostic performance when assessing breast lesions 13 .BI-RADS subcategory 4a lesions, presenting a few suspicious features but mainly benign characteristics, can make those inexperienced residents very confused in classifying, and making wrong decisions, leading to subsequent overtreatment.
Computer-aided diagnosis (CAD) systems have played a growing part in many elds of medical imaging, including breast US [14][15][16][17][18] . S-Detect™ for Breast is a cutting-edge CAD system that acts as an assistant tool for US imaging diagnosis of breast lesions. The diagnostic e cacy of the CAD software for classifying breast lesions has been validated by several studies [19][20][21] . Furthermore, S-Detect™ has been proven to be of value in increasing the diagnostic performance of the in-training residents 19,22 .BI-RADS 4a lesions posed a potential challenge for breast US. And as far as we know, the feasibility of S-Detect™ in improving the diagnostic accuracy of the residents-in-training in BI-RADS 4a lesions has not been investigated by previous studies.
In this study, we evaluated the diagnostic performance of S-Detect™ and ve residents-in-training for classifying breast lesions. The results of the residents were reevaluated after some of the category 4a lesions were downgraded by CAD. The aim of the study was to further explore the potential role of S-Detect™ to aid the in-training readers and determine how this system can help to improve diagnostic performance, especially for BI-RADS category 4a lesions.

Methods
This study was a cross-sectional observational study. The ethics approval of the study has been acquired from the Institutional Review Board of Peking Union Medical College Hospital. Written informed consent was also obtained from the adult patients of the study. And for the patients under 18 year-old, the writtern informed consent was signed by their guadians who accompanied them to receive US examination.

Patients and imaging
A total of 195 focal breast lesions from the patients aged between 15-82 years, with a mean age of 45.7 years and a median of 45.0 years, were enrolled consecutively in this study.
The inclusion criteria for the study were as follows: (1) palpable masses veri ed by breast imaging; and (2) nonpalpable masses found by breast imaging, with or without other symptoms; The exclusion criteria were as follows: The patients underwent US examinations before they received further treatment. All lesions were biopsied and had a nal pathological diagnosis. The pathological results were deemed the gold standard for the study.
2.2 Study protocol 2.2.1 Image assessment of S-Detect™ for Breast and the ve in-training residents A single grayscale US image demonstrating the lesion with the maximum size was manually selected for S-Detect™ for Breast analysis. First, the radiologist clicked the center of the target mass, and the contour of the lesion was segmented by S-Detect™ automatically. The outline of the lesion was adjusted manually by the radiologist when necessary. Then, the classi cation of each lesion in a dichotomic form (possibly benign and possibly malignant) was provided by S-Detect™. US descriptors extracted by S-Detect™ were also displayed, including shape, orientation, margins, pattern and posterior acoustic features.
Five in-training residents with 1-3 years of working experience were invited to assess the US lesions independently. All images of the lesions (including grayscale, color Doppler ow and elastography images) were retrospectively reviewed by ve in-training residents, and they were asked to classify the lesions based on BI-RADS lexicon. The residents were blinded to S-Detect™ and pathology results. R1-5 was used to represent the ve residents. R1, R2 and R3 were third-year residents, and each had one-year of experience with breast US. R4 and R5 were second-year residents, each with six months of experience with breast US. The ve residents had all received a standard training program for breast US, and have passed the exams of basic US organized by our medical center.
A cutoff value was set at category 4 to transform the residents' results into a dichotomic form. Category 2 and 3 lesions were deemed as possibly benign, and category 4 and 5 were considered possibly malignant. The diagnostic performances of S-Detect™ and the ve residents were evaluated, and comparisons were made between S-Detect™ and the residents.

Integration of the result of the ve residents and S-Detect™ for Breast
To evaluate the potential of S-Detect™ to help improve the diagnostic accuracy of residents, the results of the ve in-training residents were integrated with those of S-Detect™ in category 4a lesions. We compared the results of S-Detect™ and those of the residents for each lesion. If the lesion was diagnosed as category 4a by the residents but possibly benign by S-Detect™, the decision of S-Detect™ was adopted, thus downgrading category 4a lesions to the possibly benign group. Due to the high sensitivity of the residents presented in the preliminary experiments, we did not change the category 3 lesions when they were classi ed as possibly malignant by S-Detect™. In addition, the rest of the classi cations made by the residents remained unchanged.
Diagnostic performances of the integrated results were calculated, and compared with the original results of the residents without S-Detect™. Interrater variability before and after integration with S-Detect™ was assessed using intraclass correlation coe cients (ICCs).

Statistical analysis
The diagnostic performances of the residents, S-Detect™ and the integrated results of the residents and S-Detect™ for category 4a lesions were evaluated using the sensitivity, speci city, positive likelihood ratio (PLR), negative likelihood ratio (NLR), positive predictive value (PPV), negative predictive value (NPV), receiver operating characteristic (ROC) curve and area under the receiver operator characteristics curve (AUC). In addition,2 × 2 contingency tables were delineated to measure these indicators. We made comparisons of sensitivity and speci city between residents using the chi-square test. The AUC values were compared using the Z test.
ICC with 95% con dence intervals was calculated to evaluate the interrater variability of multiple raters. In this study, each subject was rated by the same raters, and ICC was deemed the absolute agreement of the raters, as the systematic differences among the raters were relevant. ICC value was interpreted as follows.

Results
A total of 195 focal breast lesions including 82malignant lesions and 113 benign lesions from 195consecutive patients (mean age, 45.7 years; median age, 45.0 (15-82) years) who were referred to the medical center were consecutively enrolled.
The diagnostic performances of S-Detect™ and the ve residents, and the Comparisons of the sensitivity, speci city and AUC between S-Detect™ and the residents are listed in Table 1. Table 1 highlights that the residents had an incline in presenting a high sensitivity, but an evidently low speci city. All the residents showed a relatively high sensitivity (92.68-100.00%). While, the speci city of S-Detect™ (77.88%) was higher than that of R2-5 (19.47-48.67%), with a p value of < 0.05. The AUC value of S-Detect™ (0.82) was signi cantly higher than those of the ve residents (0.62-0.74), with a p value of <0.05 for all the residents, as shown in Table 1. In this study, S-Detect™ had overall better diagnostic performance than the residents-in-training with limited breast US experiences.
The number of downgraded lesions that were classi ed as category 4a lesions by the residents but possibly benign by S-Detect™ are listed in Table 2. The sensitivity of the integrated results still remained at a relatively high level (92.68-100.00%). The speci cities of all the residents were signi cantly improved after using the results of S-Detect™ (46.02-76.11%), with a p value of <0.001 for all the residents.
The ROC curves of the ve residents, S-Detect™ and the residents combined with S-Detect™ are presented in Fig 1-5. From the ROC curves of the residents, we could determine that the curve was elevated at the top left after combination with S-Detect™. Additionally, the AUC value of the residents with S-Detect™ had an evident increase (0.71-0.85), with statistical signi cance (p<0.001), indicating the improvement of the overall diagnostic performances of the ve residents (Table 1).
To evaluate the interobserver variability of the ve residents, we calculated the ICC value of the integrate results and original results. Systematic differences among the ve raters were found to be relevant after ANOVA (p < 0.05), and the ICC was regarded as a measure of absolute agreement. The single measure of ICC of the ve residents increased from 0.480 (0.415-0.549) to 0.643(0.586-0.700) after integration with the results of S-Detect™, indicating that the agreement level increased from moderate to substantial.

Discussion
US is one of the most commonly used modalities in breast imaging. As a convenient and cost-effective imaging method, US has played an essential role in the detection and evaluation of breast lesions in many countries, as well as in China 23 . However, despite the promotion of BI-RADS lexicon, operator dependence and interobserver variability are still the major aws of US 6-9 .The performance of the BI-RADS lexicon can be largely affected by the clinical experiences of the operators. The speci city of a resident-in-training has been reported to be signi cantly inferior to that of a high-level radiologist, when using the BI-RADS lexicon in the assessment of breast lesions 8 .As a result, methods to enhance diagnostic e ciency of inexperienced readers and to decrease the interobserver variability for breast US ndings are in demand.
CAD systems have emerged as powerful tools for medical imaging with the dramatic advancement of arti cial intelligence technology 14 .The feasibility of using CAD systems to aid in the diagnosis of breast lesions has been veri ed by previous studies 24,25 . S-Detect™, which is a dedicated CAD software integrated on a high-end US unit, is constructed based on deep learning algorithms and trained by largescale clinical databases. The diagnostic process of S-Detect™ is free from the interference of manidenti ed features. The potential use of S-Detect™ to assist doctors in improving diagnostic performance, especially for those who lack experience, has been elucidated in previous studies. According to the results of our study, S-Detect™ was distinguished by its high speci city, compared with that of the ve in-training residents with limited US experience, who presented a remarkable sensitivity but a low speci city. Therefore, we speculated that S-Detect™ could help in improving the residents' speci city. Breast lesions classi ed into BI-RADS 4a were de ned as having a low suspicion of malignancy. In the clinical settings, category 4a is a relatively complicated subgroup of the BI-RADS classi cations, of which the malignant rate is 3-10%, and the PPV value is 6% 28 . In this study, the ratio of malignancy in 4a lesions classi ed by the ve residents were 9.38%, 8.11%, 15.38%, 10.25%, 9.09%, respectively, most of which were within the range de ned by the guidelines. Most category4a lesions are benign, but may undergo unnecessary biopsies. To better address the tradeoff of 4a lesions, new modalities, such as elastography, have been put into clinical use to lower the false-positive rate 29,30 .And statistically signi cant improvement in the speci city and AUC was acquired for the residents after using S-Detect™ for category 4a lesions, suggesting that a dedicated CAD system might also provide additional diagnostic information. A CAD system could also be an effective method to downgrade benign category 4a lesions, and reduce unnecessary biopsies. It is noteworthy that the malignant rate of the CADdowngraded 4a lesions presented in the current study was not satisfying. They were 0%, 9.68%, 7.89%, 6.25%, 4.76% respectively. It implied that the further improvement of S-detect is necessity for the clinical applications.
ICC of the ve residents improved after integration with S-Detect™ from a moderate level of agreement to a good level. This result veri ed that S-Detect™ could also be effective in decreasing interobserver variability in breast US for inexperienced raters.
In the clinical practice, residents are required to undergo systematic training programs before entering clinical work. S-Detect™ can act as a powerful assistant tool to audit the diagnoses made by inexperienced US readers. Notably, the work ow of S-Detect™ is less time-consuming than that of the double reading process. In addition, the US features extracted by S-Detect™ are displayed for readers, providing a useful reference for residents to learn the images case by case, thus S-Detect™ may possess potential value in the training of inexperienced US readers.
There were several limitations in this study. First, the underestimation of the performance of theresidents should be mentioned. In a regular US examination ow, radiologists often evaluate a breast lesion based on overall diagnostic information. Apart from dynamic real-time US images, medical history and mammography results, are taken into consideration. While in this retrospective study, only static images were provided for classi cation to the residents. Moreover, the good performance of S-Detect™ was guaraenteed by the high-quality US images used for classi cation, which were collected by an experienced radiologist who participated in the study. This condition might not be realized in real clinical settings when utilized in different medical centers in other regions, which may impair the diagnostic performance of S-Detect™.

Conclusions
In this study, S-Detect™ had better diagnostic performance for classifying breast lesions than the ve residents. Aftercategory 4a lesions were reclassi ed by S-Detect™, the diagnostic performances of the residents were signi cantly enhanced, with higher speci city without sacri cing the sensitivity signi cantly. It is promising for S-Detect™ to improve the speci city of inexperienced readers andavoid unnecessary biopsies of category 4a lesions. S-Detect™ can also help to decrease interobserver variabilityamong different readers.