MUSICA pre-processing performance
After MUSICA, the contrast uniformity between the bone and lung tissues was significantly improved (Fig. 3). Although the raw images behaved differently with considerable differentiation of contrast and detail, the two processed images appeared to be similar in image quality and contrast.
Deep learning YOLOv3 network performance
The training set included 918 patients with 2647 fractures, and the CNN model detected 3,580 fractures, of which 2,435 were detected correctly, 212 were missed, and 1,145 were false. The test set included 162 patients with 437 fractures. The model detected 488 fractures, of which 398 were detected correctly, 39 were missed, and 90 were mistakenly detected. In the training set, the sensitivity (fractures detected correctly/marked fractures) was 92.0%, and the precision (fracture detected correctly/fracture detected) was 68.0%. In the test set, the sensitivity was 91.1% (Table 1). In the testing set, the multi-lesion detection rate was also verified with FROC; when the false-positive rate was set as 0.56, the sensitivity of the whole lesion detection reached 91.3% (Fig. 6b).
Table 1
Sensitivity and precision of the CNN model in the training and testing sets
Data | Marked fractures | Detected fractures | Correct detected fractures | Sensitivity | Precision |
Training set | 2647 | 3580 | 2435 | 92.0% | 68.0% |
Testing set | 437 | 488 | 398 | 91.1% | 81.6% |
Note: Sensitivity = fractures detected correctly / fractures marked; Precision = fractures detected correctly / fractures detected |
Radiographs without rib fractures were added to the test set to evaluate the ability of the model to detect rib fractures. Finally, 395 radiographs with 162 fractures and 233 without fractures were included in the study. The CNN model detected 199 radiographs with fractures and 196 radiographs without fractures. The accuracy was up to 85.1%, and the sensitivity and specificity were 93.2% and 79.4%, respectively (Table 2). ROC analysis showed that the AUC reached 0.92 (95% CI: 0.86–0.96) (Fig. 6a).
Table 2
Detection rate of the CNN model in the testing set based on case level
CNN model | Chest radiographs | Total |
With rib fractures | Without rib fractures |
Detected fractures | 151 | 48 | 199 |
Undetected fractures | 11 | 185 | 196 |
Total | 162 | 233 | 395 |
Note: Sensitivity = TP / (TP + FN) ×100%=151/162×100%=93.2%, Specificity = TN/(TN + FP) ×100%=185/233×100%=79.4%, Positive predictive value (PPV) = TP/(TP + FP) ×100%=151/199×100%=75.9%, Negative predictive value (NPV) = TN/(TN + FN) ×100%=185/196×100%=94.4%, Accuracy = (TP + FN)/(TP + FN + TN + FN) ×100% =(151 + 185)/395 ×100%=85.1% |
Reading experiment
Regarding the experimental results at the fracture level, the CNN model detected 97 radiographs with 437 fractures, of which 351 were detected correctly, 51 were missed, and 86 were false. The senior radiologist recognised 125 radiographs with 392 fractures, of which 323 were correctly detected, 79 were missed, and 69 were false. The junior radiologist identified 130 radiographs with 361 fractures, of which 295 were correct, 107 were missed, and 66 were false. The sensitivity and precision of the detection by the CNN model, senior radiologist, and junior radiologist were 87.3% and 80.3%, 80.3% and 82.4%, and 73.4% and 81.7%, respectively. The sensitivity of detection was significantly higher in the CNN model than among the junior radiologist (P = 0.01), indicating that the CNN model had better detection ability. Meanwhile, there was no significant difference between the senior and junior radiologists or between the CNN and senior radiologist (P > 0.05) (Table 3).
Table 3
Comparison of sensitivity and precision in the independent testing group based on fracture level
Data | Marked fractures | Detected fractures | Correct detected fractures | Sensitivity | Precision |
CNN model | 402 | 437 | 351 | 87.3% | 80.32% |
Senior radiologist | 402 | 392 | 323 | 80.3% | 82.40% |
Junior radiologist | 402 | 361 | 295 | 73.4% | 81.72% |
P1 | NA | NA | NA | 0.15 | 0.57 |
P2 | NA | NA | NA | 0.13 | 0.43 |
P3 | NA | NA | NA | 0.01 | 0.43 |
Note: Sensitivity = fractures detected correctly/fractures marked; Precision = fractures detected correctly/fractures detected. P1 = P value for senior vs. junior radiologists. P2 = P-value for CNN vs. senior radiologist. P3 = P-value for CNN vs. junior radiologist. Comparisons are performed using the chi-squared test. |
NA = not available |
For the model’s detection ability at the case level, the CNN model detected 130 radiographs with fractures and 71 without fractures. The senior radiologist identified 125 fractures and 76 without fractures. The junior radiologist identified 97 fractures and 104 without fractures. The accuracy and sensitivity of the identification by the CNN model, senior radiologist, and junior radiologist were 91.5% and 96.7%, 94.0% and 96.7%, and 85.1% and 77.7%, respectively (Table 4).
Table 4. Detection rate of marked fractures in the independent testing set at the case level
a. CNN model
CNN model
|
Chest radiographs
|
Total
|
With rib fractures
|
Without rib fractures
|
Detected fractures
|
117
|
13
|
130
|
Undetected fractures
|
4
|
67
|
71
|
Total
|
121
|
80
|
201
|
b. Senior radiologist
Senior radiologist
|
Chest radiographs
|
Total
|
With rib fractures
|
Without rib fractures
|
Detected fractures
|
117
|
8
|
125
|
Undetected fractures
|
4
|
72
|
76
|
Total
|
121
|
80
|
201
|
c. Junior radiologist
Junior radiologist
|
Chest radiographs
|
Total
|
With rib fractures
|
Without rib fractures
|
Detected fractures
|
94
|
3
|
97
|
Undetected fractures
|
27
|
77
|
104
|
Total
|
121
|
80
|
201
|
d. Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy in the independent testing set based at the case level
Model
|
Sensitivity
|
Specificity
|
PPV
|
NPV
|
Accuracy
|
CNN model
|
96.7%
|
83.8%
|
90.0%
|
94.4%
|
91.5%
|
Senior radiologist
|
96.7%
|
90.0%
|
93.6%
|
94.7%
|
94.0%
|
Junior radiologist
|
77.7%
|
96.3%
|
96.9%
|
74.0%
|
85.1%
|
Note: Sensitivity= TP / (TP +FN) ×100%, Specificity=TN/(TN + FP) ×100%, Positive predictive value (PPV)=TP/(TP + FP) ×100%, Negative predictive value (NPV)=TN/(TN + FN) ×100%, Accuracy = (TP + FN)/(TP + FN + TN + FN) ×100%