Characteristics of patients and nodules. Of 3201 thyroid nodules, 1727 (54.0%) were diagnosed as benign and 1474 (46.0%) as malignant by surgical pathology (Table 1). The benign nodules were larger than the malignant nodules (20.4 mm ± 15.8 vs. 12.6 mm ± 11.7, P < 0.001).
Table 1
Summary of demographic for the patients with thyroid nodules
Characteristics
|
Histopathology diagnosis
|
Total
|
P
|
Benign
(n = 1727)
|
Malignant
(n = 1474)
|
Age, y
|
|
|
|
0.000
|
Mean
|
49.6 ± 12.1
|
44.7 ± 11.9
|
47.3 ± 12.2
|
|
Range
|
12–82
|
7–80
|
7–82
|
|
Gender, n(%)*
|
|
|
|
0.019
|
Male (510, 23.8%)
|
375 (11.7)
|
372 (11.6)
|
747 (23.3)
|
|
Female (1636, 76.2%)
|
1352 (42.2)
|
1102 (34.4)
|
2454 (76.7)
|
|
Size, mm
|
|
|
|
0.000
|
Mean
|
20.5 ± 15.8
|
12.6 ± 11,7
|
17.0 ± 14.8
|
|
Range
|
2.0–100.0
|
2–102.0
|
2 -102.0
|
|
< 10
|
582 (33.7)
|
829 (56.2)
|
1411 (44.1)
|
0.000
|
≥ 10
|
1145 (66.3)
|
645 (43.8)
|
1790 (55.9)
|
|
Note: Data are reported as mean ± standard deviation for continuous variables.
*Denotes number of patients within each category. Numbers in parentheses represent percentage within a given group (benign, malignant).
Gender was not significantly associated with malignancy risk (P > 0.05). Age under 55 years had significantly higher risk of malignancy compared with ≥ 55 years of age (P < 0.05). There were 1411 (44.1%) nodules < 10 mm in diameter, and 1790 (55.9%) nodules ≥ 10 mm. The malignancy risk of nodules < 10 mm was higher compared with nodules ≥ 10 mm (56.2% vs. 43.8%, P < 0.05).
Malignancy rates according to US categories in ACR TI-RADS and AI TI-RADS. The risks of malignancy of different categories in the two TIRADSs are shown in Table 2. All of the calculated risks of malignancy for the two TIRADSs were higher than their estimated risks of malignancy, except for TR3 (5.0% [25 of 497]) and TR5 (83.0% [1135 of 1367]) in ACR TIRADS and TR5 (84.0% [1230 of 2215]) in AI TIRADS.
Table 2
Malignancy risks according to categories in ACR TIRADS and AI TIRADS
System and categories
|
Pathological diagnosis
|
Calculated ROM (%)
|
Suggested ROM (%)
|
Benign
(n = 1727)
|
Malignant
(n = 1474)
|
ACR TI-RADS
|
1
|
66
|
7
|
9.6
|
≤ 2
|
|
2
|
461
|
25
|
5.1
|
≤ 2
|
|
3
|
472
|
25
|
5.0
|
≤ 5
|
|
4
|
496
|
282
|
36.2
|
5–20
|
|
5
|
232
|
1135
|
83.0
|
≥ 20
|
AI TI-RADS
|
1
|
583
|
35
|
5.7
|
≤ 2
|
|
2
|
233
|
17
|
6.8
|
≤ 2
|
|
3
|
304
|
29
|
8.7
|
≤ 5
|
|
4
|
373
|
163
|
30.4
|
5–20
|
|
5
|
234
|
1230
|
84.0
|
≥ 20
|
ROM: risk of malignancy. |
*The ROM was calculated as the percentage of malignant nodules in the total nodules of each category. |
Comparison of Diagnostic Performances between original TIRADS and modified TIRADS. In simulation (Table 3), We reduced the thresholds for FNA from ≥ 10 mm to 5/zero in TR5, from ≥ 15 mm to ≥ 10 mm in TR4, and from ≥ 25 mm to ≥ 20 mm in TR3 of ACR TIRADS. The RABM was 1-to-6.46/1-to-5.93 in TR5, 26.5-to-1 in TR4, and 1.67-to-1 in TR3, respectively. The changes of threshold for TR5 were accepted, rather than TR4 and TR3. As a consequence, the new threshold of TR5 was accepted to establish the modified ACR TIRADS. We also applied the same decreased thresholds to AI TIRADS, and the RABM was 1-to-6.26/1-to-6.35 in TR5, 2.4-to-1 in TR4, and 10-to-1 in TR3, respectively. Finally, the new threshold of zero for TR5 was accepted to establish the modified AI TIRADS.
Table 3
The original and simulated threshold for FNA in ACR TIRADS and AI TIRADS
Systems
|
Thresholds for FNA
|
Additional nodules for FNA
|
RABM
|
The final simulated thresholds for FNA
|
Original
|
Simulated
|
Benign
|
Malignant
|
ACR TIRADS
|
|
|
|
|
|
|
TR1
|
-
|
-
|
-
|
-
|
-
|
-
|
TR2
|
-
|
-
|
-
|
-
|
-
|
-
|
TR3
|
≥ 25 mm
|
≥ 20 mm
|
53
|
2
|
26.5-to-1
|
≥ 25 mm
|
TR4
|
≥ 15 mm
|
≥ 10 mm
|
77
|
46
|
1.67-to-1
|
≥ 15 mm
|
TR5
|
≥ 10 mm
|
0
|
112
|
664
|
1-to-5.93
|
0
|
|
|
≥ 5 mm
|
28
|
181
|
1-to-6.46
|
|
AI TIRADS
|
|
|
|
|
|
|
TR1
|
-
|
-
|
-
|
-
|
-
|
-
|
TR2
|
-
|
-
|
-
|
-
|
-
|
-
|
TR3
|
≥ 25 mm
|
≥ 20 mm
|
30
|
3
|
10-to-1
|
≥ 25 mm
|
TR4
|
≥ 15 mm
|
≥ 10 mm
|
60
|
25
|
2.4-to-1
|
≥ 15 mm
|
TR5
|
≥ 10 mm
|
0
|
115
|
730
|
1-to-6.35
|
0
|
|
|
≥ 5 mm
|
84
|
526
|
1-to-6.26
|
|
Note: RABM, the ratio of additional benign to malignant nodules being biopsied when using the simulated thresholds for FNA
The diagnostic performances were calculated according to the FNA indications and were compared between the original TIRADS and modified TIRADS (Fig. 1). The modified ACR TIRADS had higher sensitivity, PPV, NPV, and accuracy, but lower unnecessary biopsy rate and missed malignancy rate than ACR TIRADS (83.8% vs. 38.7%, 65.5% vs. 51.5%, 81.8% vs. 56.8%, 72.2% vs. 55.0%, 34.5% vs. 48.5%, 18.2% vs. 43.2% respectively, all P < 0.05). However, higher specificity was found in ACR TIRADS (68.8% vs. 62.4%, P < 0.05). Similar trends were seen in AI TIRADS versus modified AI TIRADS. Higher specificity was seen in AI TIRADS, while higher sensitivity, PPV, NPV, and accuracy, along with lower unnecessary biopsy rate and missed malignancy rate were seen in modified AI TIRADS (P < 0.05 for all). The diagnostic performances according to FNA indications with ACR TIRADS and AI TIRADS are shown in Table 4.
Table 4
Diagnostic Performances between ACR TIRADS and AI TIRADS according to FNA indications
Systems
|
Sensitivity
|
Specificity
|
PPV
|
NPV
|
Accuracy
|
UFR
|
MMR
|
ACR
TIRADS
|
38.7%
(571/1474)
|
68.8%
(1189/1727)
|
51.5%
(571/1109)
|
56.8%
(1189/2092)
|
55.0%
(1760/3201)
|
48.5%
(538/1109)
|
43.2%
(903/2092)
|
AI
TIRADS
|
39.2%
(578/1474)
|
76.4%
(1319/1727)
|
58.6%
(578/986)
|
59.5%
(1319/2215)
|
59.3%
(1897/3201)
|
41.4%
(408/986)
|
40.5%
(896/2215)
|
P value
|
0.349
|
0.000
|
0.001
|
0.071
|
0.000
|
0.001
|
0.071
|
Note: PPV, positive predictive value; NPV, negative predictive value; UFR, unnecessary FNA rate; MMR, missed malignancy rate
Comparison of diagnostic performance between the “<10 mm” and “≥10 mm” groups. We calculated the diagnostic performances in the two groups (< 10 mm and ≥ 10 mm) according to US-based final assessment categories (Table 5).
Table 5
Diagnostic Performance of different sizes between ACR TIRADS and AI TIRADS
Systems
|
Sensitivity
|
Specificity
|
PPV
|
NPV
|
Accuracy
|
AUC
|
ACR TIRADS
|
|
|
|
|
|
|
< 10 mm
|
97.7%
(96.6–98.6)
|
49.5%
(45.5–53.3)
|
73.4%
(70.7–76.1)
|
93.8%
(90.9–96.4)
|
77.8%
(75.8–80.0)
|
0.843
(0.823–0.862)
|
≥ 10 mm
|
94.1%
(92.1–95.8)
|
62.1%
(59.2–65.0)
|
58.3%
(55.2–61.3)
|
94.9%
(93.3–96.4)
|
73.6%
(71.3–75.6)
|
0.870
(0.854 − 0.85)
|
P value
|
0.000
|
0.000
|
0.000
|
0.446
|
0.006
|
< 0.05
|
AI TIRADS
|
|
|
|
|
|
|
< 10 mm
|
97.0%
(95.8–98.2)
|
56.4%
(52.2–60.1)
|
76.0%
(73.3–78.5)
|
92.9%
(90.1–95.8)
|
80.2%
(78.0-82.1)
|
0.862
(0.843–0.880)
|
≥ 10 mm
|
91.3%
(89.1–93.5)
|
69.2%
(66.6–71.8)
|
62.5%
(59.2–65.7)
|
93.4%
(91.6–95.0)
|
77.2%
(75.3–79.2)
|
0.880
(0.864–0.895)
|
P value
|
0.000
|
0.000
|
0.000
|
0.763
|
0.035
|
> 0.05
|
Note: Numbers in parentheses are 95% confidence intervals; PPV, positive predictive value; NPV, negative predictive value
The AUC of ACR TIRADS in “nodules ≥ 10 mm” was higher than that in “nodules < 10 mm” (0.870 vs. 0.843, P < 0.05). While the AUCs of AI TIRADS were not statistically different between “nodules < 10 mm” and “nodules ≥ 10 mm” (0.862 vs. 0.880, P > 0.05).