Clinical characteristics of thyroid nodules
The clinical characteristics of the thyroid nodules are presented in Table 1. Of the 451 enrolled thyroid nodules, 300 nodules (66.5%) were surgically confirmed as malignant. Compared to the benign nodules, the malignant nodules were more frequently found in male patients (29.3% vs. 15.2%, p = 0.001) and were smaller on average (1.81 ± 1.0 vs. 2.52 ± 1.2 cm, p < 0.001). Patients’ mean age at the time of diagnosis was similar between groups. The cases of thyroid cancer were categorized as conventional papillary thyroid carcinoma (cPTC), follicular variant papillary thyroid carcinoma (fvPTC), follicular thyroid carcinoma (FTC), medullary thyroid carcinoma, poorly differentiated thyroid carcinoma, and anaplastic thyroid carcinoma. cPTC, fvPTC, and FTC accounted for 83.7%, 7.0%, and 7.3% of the malignant nodules, respectively. The tumor size was < 2 cm in 78.9% of cPTCs, while 65.1% of FTCs and fvPTCs combined (FTC/fvPTC) had a size of ≥ 2 cm (p < 0.001, Table S1). Of the benign nodules, 38.4% were follicular adenoma, 31.8% were nodular hyperplasia, 23.8% were NIFTP, and 6.0% were other benign lesions.
Table 1
Baseline Characteristics of Thyroid Nodules
|
Total
|
Benign
|
Malignancy
|
P-value
|
N (%)
|
451
|
151 (33.5)
|
300 (66.5)
|
|
Age of diagnosis, yrs
|
50.0 ± 14.3
|
51.7 ± 13.5
|
49.1 ± 14.6
|
0.064
|
Male sex, n (%)
|
112 (24.8)
|
23 (15.2)
|
89 (29.3)
|
0.001
|
Size, cm
|
2.05 ± 1.1
|
2.52 ± 1.2
|
1.81 ± 1.0
|
< 0.001
|
Histologic subtype, n (%)
|
|
|
|
|
cPTC
|
—
|
—
|
251 (83.7)
|
|
fvPTC
|
—
|
—
|
21 (7.0)
|
|
FTC
|
—
|
—
|
22 (7.3)
|
|
MTC/PDTC/ATC
|
—
|
—
|
6 (2.0)
|
|
Follicular adenoma
|
—
|
58 (38.4)
|
—
|
|
Nodular hyperplasia
|
—
|
48 (31.8)
|
—
|
|
NIFTP
|
—
|
36 (23.8)
|
—
|
|
Other benign lesions
|
—
|
9 (6.0)
|
—
|
|
cPTC, conventional papillary thyroid carcinoma; fvPTC, follicular variant papillary thyroid carcinoma; FTC, follicular thyroid carcinoma; MTC, medullary thyroid carcinoma; PDTC, poorly differentiated thyroid carcinoma; ATC, anaplastic thyroid carcinoma; NIFTP, noninvasive follicular thyroid neoplasm with papillary-like nuclear features, p-value for benign vs. malignancy |
Diagnostic performance of thyroid US CAD
The diagnostic performance of the CAD system is presented in Table 2 and Fig. 2. Overall, the AUC was 0.855 (Fig. 2A), and the sensitivity, specificity, PPV, and NPV, and accuracy were 85.3%, 63.6%, 82.3%, 68.6%, and 78.0%, respectively (Table 2). In the subgroup analysis, the CAD system showed higher diagnostic performance for thyroid nodules with a size < 2 cm than for larger nodules (≥ 2 cm) in terms of AUC (0.895 vs. 0.751, Fig. 2B and 2C), sensitivity (94.4% vs. 62.4%), PPV (84.9% vs. 73.6%), NPV (70.7% vs. 67.6%), and accuracy (82.9% vs. 70.2%). Since cPTC was significantly smaller than the other cancers (Table S1), we then analyzed the diagnostic performance of the CAD system according to histologic subgroup. Compared to FTC/fvPTC, a higher AUC was found for cPTC (0.925 vs. 0.499, Fig. 2D and 2E). For cPTC, the CAD system also showed higher sensitivity (94.4% vs. 34.9%), PPV (85.3% vs. 26.8%), NPV (84.1% vs. 72.5%), and accuracy (85.0% vs. 56.3%). Interestingly, within the cPTC group, the diagnostic performance of the CAD system was similar regardless of size (AUC, 0.919 for nodules < 2 cm, Fig. 3A; 0.907 for nodules ≥ 2 cm, Fig. 3B).
Table 2
Diagnostic Performance of Computer-Aided Diagnosis (CAD)
|
AUC
|
Sensitivity (%)
|
Specificity (%)
|
PPV (%)
|
NPV (%)
|
Accuracy (%)
|
Total
|
0.855
(0.820–0.889)
|
85.3
(0.822–0.881)
|
63.6
(57. 4-69.1)
|
82.3
(79.3–85.0)
|
68.6
(61.9–74.5)
|
78.0
(73.9–81.7)
|
Size group
|
|
|
|
|
|
|
Size < 2 cm
|
0.895
(0.857–0.932)
|
94.4
(91.6–96.7)
|
44.6
(35.4–52.0)
|
84.9
(82.4–87.0)
|
70.7
(56.1–82.5)
|
82.9
(78.6–86.3)
|
Size ≥ 2 cm
|
0.751
(0.678–0.825)
|
62.4
(54.6–69.0)
|
77.9
(70.2–84.5)
|
73.6
(64.4–81.5)
|
67.7
(61.0-73.4)
|
70.2
(62.5–76.8)
|
Histologic subtypes
|
|
|
|
|
|
|
cPTC vs. benign
|
0.925
(0.899–0.952)
|
94.4
(91.6–96.6)
|
64.3
(58.1–69.0)
|
85.3
(82.7–87.2)
|
84.1
(76.0-90.2)
|
85.0
(81.1–87.9)
|
FTC and fvPTC vs. benign
|
0.499
(0.399–0.599)
|
34.9
(21.0-50.9)
|
64.3
(54.9–73.1)
|
26.8
(15.8–40.3)
|
72.5
(62.8–80.9)
|
56.3
(49.7–63.7)
|
Values (95% confidence intervals). AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; cPTC, conventional papillary thyroid carcinoma; FTC, follicular thyroid carcinoma; fvPTC, follicular variant papillary thyroid carcinoma. |
Diagnostic performance of physicians before and after CAD assistance
Next, the diagnostic performance was compared between the CAD system and physicians with various levels of experience (the E0, E1, and E5 groups) (Table 3). The E0 group showed significantly lower sensitivity (78.9% vs. 85.3%, p = 0.003), Specificity (45.0% vs. 63.6%, p < 0.001), PPV (75.2% vs. 82.3%, p = 0.002), NPV (49.5% vs. 68.6%, p < 0.001), and accuracy (67.5% vs. 78.0%, p < 0.001) than the CAD system. The E1 group showed significantly lower sensitivity (80.3% vs. 85.3%, p = 0.015) and accuracy (74.6% vs. 78.0%, p = 0.050) than the CAD system. The E5 group showed higher sensitivity compared to that of the CAD system (89.4% vs. 85.3%, p = 0.017). CAD assistance significantly improved the diagnostic performance of the E0 group. The sensitivity and accuracy significantly improved after CAD assistance (sensitivity, 78.9% [before] vs. 85.3% [after], p = 0.002; accuracy, 67.5% [before] vs. 73.9% [after], p = 0.001, Table 3). Meanwhile, the diagnostic performance of the E1 and E5 groups did not significantly change after CAD assistance (Table 3).
Table 3
Diagnostic Performance of Physicians with Different Levels of Experience Before and After CAD Assistance
|
CAD
|
E0
|
E1
|
E5
|
Before
|
After
|
P a
|
P b
|
Before
|
After
|
P a
|
P b
|
Before
|
After
|
P a
|
P b
|
Sensitivity
|
85.3
|
78.9%
|
85.3%
|
0.003
|
0.002
|
80.3%
|
81.3%
|
0.015
|
0.356
|
89.4%
|
90.4%
|
0.017
|
0.292
|
Specificity
|
63.6
|
45.0%
|
51.2%
|
< 0.001
|
0.076
|
63.2%
|
64.2%
|
0.498
|
0.400
|
57.6%
|
58.3%
|
0.079
|
0.463
|
PPV
|
82.3
|
75.2%
|
78.7%
|
0.002
|
—
|
82.2%
|
82.8%
|
0.515
|
—
|
80.7%
|
81.1%
|
0.261
|
—
|
NPV
|
68.6
|
49.5%
|
66.5%
|
< 0.001
|
—
|
62.2%
|
63.5%
|
0.070
|
—
|
73.1%
|
75.2%
|
0.133
|
—
|
Accuracy
|
78.0
|
67.5%
|
73.9%
|
< 0.001
|
0.001
|
74.6%
|
75.6%
|
0.050
|
0.310
|
77.4%
|
79.6%
|
0.396
|
0.134
|
ACR-TIRADS 4 was used as the cut-off to calculate the diagnostic performance of physicians. CAD, computer-aided diagnosis; E0/E1/E5, physicians with 0 month/1 year/5 years of experience; Before, physicians before CAD assistance; After, physicians after CAD assistance. PPV, positive preeedictive value; NPV, negative predictive value. |
P a, CAD vs. before; P b, before vs. after. |
A subgroup analysis was performed according to the subtype of thyroid cancers. The AUC of the physicians was higher for PTC than for FTC/fvPTC (0.737−0.902 vs. 0.437−0.605), and CAD assistance significantly improved the AUC in most of the E0 group and a subset of experienced physicians (E1 and E5 groups) for the diagnosis of cPTC, but not for FTC/fvPTC (Table S2). With CAD assistance, the mean sensitivity and accuracy for diagnosing cPTC significantly improved in the E0 group (sensitivity, 83.4% [before] vs. 91.6% [after], p < 0.001; accuracy, 71.3% [before] vs. 79.4% [after], p < 0.001), but not in the E1 and E5 groups (Table 4). Additionally, the mean accuracy for diagnosing cPTC significantly increased after CAD assistance, regardless of nodule size (nodules < 2 cm, 74.3% [before] vs. 81.5% [after], p = 0.003; nodules ≥ 2 cm, 64.7% [before] vs. 74.6% [after], p = 0.014) in the E0 group (Table S3).
Table 4
Comparisons of Diagnostic Performances Between CAD and Physicians Before and After CAD Assistance According to the Pathologic Subtype
|
CAD
|
E0
|
E1
|
E5
|
Before
|
After
|
P a
|
P b
|
Before
|
After
|
P a
|
P b
|
Before
|
After
|
P a
|
P b
|
cPTC
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sensitivity
|
94.4
|
83.4%
|
91.6%
|
< 0.001
|
< 0.001
|
87.8%
|
88.8%
|
< 0.001
|
0.309
|
94.6%
|
95.5%
|
0.564
|
0.243
|
Specificity
|
64.3
|
45.0%
|
52.7%
|
< 0.001
|
0.060
|
65.7%
|
67.0%
|
0.384
|
0.452
|
60.1
|
61.7
|
0.202
|
0.388
|
PPV
|
85.3
|
78.0%
|
81.8%
|
0.002
|
—
|
85.6%
|
86.2%
|
0.460
|
—
|
83.7
|
84.5
|
0.272
|
—
|
NPV
|
84.1
|
53.2%
|
77.8%
|
< 0.001
|
—
|
71.9%
|
73.6%
|
0.006
|
—
|
83.1
|
86.5
|
0.470
|
—
|
Accuracy
|
85.0
|
71.3%
|
79.4%
|
< 0.001
|
< 0.001
|
80.9%
|
82.0%
|
0.019
|
0.307
|
83.6
|
84.9
|
0.265
|
0.265
|
FTC and fvPTC
|
|
|
|
|
|
|
|
|
|
|
|
|
Sensitivity
|
34.9
|
53.8%
|
50.6%
|
0.010
|
0.432
|
38.4%
|
39.5%
|
0.380
|
0.497
|
61.2%
|
62.0%
|
< 0.001
|
0.540
|
Specificity
|
64.3
|
45.0%
|
52.7%
|
< 0.001
|
0.060
|
65.7%
|
67.0%
|
0.414
|
0.452
|
56.5%
|
57.4%
|
0.053
|
0.460
|
PPV
|
26.8
|
25.3%
|
27.6%
|
0.449
|
—
|
30.4%
|
33.5%
|
0.335
|
|
34.2%
|
35.5%
|
0.152
|
—
|
NPV
|
72.5
|
73.3%
|
76.3%
|
0.546
|
—
|
74.1%
|
74.7%
|
0.397
|
|
79.2%
|
80.4%
|
0.066
|
—
|
Accuracy
|
56.3
|
47.4%
|
52.1%
|
0.015
|
0.139
|
58.2%
|
59.5%
|
0.345
|
0.402
|
57.6%
|
58.2%
|
0.403
|
0.469
|
ACR-TIRADS 4 was used as the cut-off to calculate the diagnostic performance of physicians. |
CAD, computer-aided diagnosis; E0/E1/E5, physicians with 0 month/1 year/5 years of experience; Before, physicians before CAD assistance; After, physicians after CAD assistance; PPV, positive predictive value; NPV, negative predictive value; cPTC, conventional papillary thyroid carcinoma; FTC, follicular thyroid carcinoma; fvPTC, follicular variant papillary thyroid carcinoma. P a, CAD vs. before; P b, before vs. after. |