Patients information and nodules characteristics
A total of 548 nodules (306 benign nodules/242 malignant nodules) from 548 patients were enrolled in this study from June 2016 to June 2019.Thirty hundred and thirty-four nodules of 334 patients (mean age: 49.39±12.35 years, range: 22-80 years) in training cohort and 144 nodules of 144 patients (mean age: 47.84±13.69 years, range: 18-78 years) in external validation cohort from center 1, and 70 nodules of 70 patients (mean age: 50.17±12.00 years, range: 22-76 years) in internal validation cohort from center 2 and 3 were enrolled in the study.There were 123 nodules in TR4 (benign: 107, malignant: 16) and 187 nodules in TR5 (benign: 60, malignant: 127) were included in a total of 334 nodules (benign: 188, malignant: 146) . There were no significant differences (Table 1) between two cohort in the respects such as the age and gender of patients as well as the nodules characteristics (size, location, ratio of benign and malignant results and elastic parameters) (p>0.05).
CUS findings
Nodules classified by ACR TR4, TR5 classification were showed on Table 2. In the training cohort, significant features for differentiation between benign and malignant nodules were echogenicity, shape, margin and echogenic foci (all p<0.000). Composition of nodules is the only feature related to malignancy in TR4 classification (p=0.009). Features of the shape and margin achieve significant differences between benign and malignant nodules in TR5 classification (all p<0.000) (Table 3).
SE score and 2D SWE
The elastography results of thyroid nodules were presented in Table 3. Elastic parameters including SE and VTIQ max were statistically significant differences between benign and malignant nodules as well as TR5 classification (all p<0.000). VTIQ max has the prior advantage in diagnosing TR4 nodules (p=0.012) rather than SE and CUS features (Table 3). In all training cohort samples, 91 nodules had the VTIQ max value greater than the cut-off value (2.855 m/s), which included 87 benign and 4 malignant nodules. In the TR4 training cohort, 32 nodules (benign: 20, malignant: 12) had the VTIQ max value greater than the redefined cutoff value (3.225 m/s). The results show that VTIQ max in TR4 classification improved the detection rate of malignant thyroid nodules (Fig 2, 3).
Predictive models
1. Prediction models based on CUS and elastography features
Based on the training cohort, nodules characteristic in CUS and elastic images were included into the binary logistic regression predictive models:
ACR model: Binary analysis confirmed that the age had a significant negative correlation with an increased risk of thyroid malignancy. Additionally, the positive results of taller-than-wider (OR: 8.130, 95% CI: 3.947-17.867, p<0.000), lobulated or irregular boundary (OR: 3.728, 95% CI: 2.095-6.732, p<0.000), extra-thyroid extension (OR: 9.194, 95% CI: 2.393-47.348, p= 0.011), microcalcification (OR: 2.871, 95% CI: 1.680-4.969, p<0.001) and VTIQ max (OR: 4.802, 95% CI: 3.102-7.807, p<0.000) were associated with increased risks for malignancy in ACR model.
ACR TR4 model: The positive VTIQ max result (OR: 5.248, 95% CI: 2.390-13.974, p=0.001) was only independently associated with increased risks for malignant nodules in ACR TR4 model.
ACR TR5 model: The positive results with taller-than-wider (OR: 4.904, 95% CI: 2.421-10.602, p=0.010) and VTIQ max (OR: 4.412, 95% CI: 2.537-8.391, p=0.000) had significant positive correlations with the increased risks of thyroid malignancy (Table 4).
2. Three formulas of predictive models were established by combined independent risk factors of malignancy:
(1) ACR model: Logit p = - 1.862 - 0.446 * age + 3.420 * taller-than-wider + 2.223* lobulated or irregular boundary + 3.800 * extra-thyroid extension + 2.518 * microcalcification + 1.545 * VTIQ max
(2) ACR TR4 model: Logit p = - 3.278 + 1.657 * VTIQ max
(3) ACR TR5 model: Logit p = - 0.305 - 0.510 * age + 1.399 * taller-than-wider + 1.484 * VTIQ max
3. Evaluating performance and Goodness Test of prediction models
3.1 Discrimination
The performances of predictive models in the training cohort and validation cohorts evaluated by ROC curves (Fig 3) and measurements of AUC values were showed in Table 5. The ACR model yielded an AUC of 0.912 (95% CI 0.880–0.944), indicating a diagnostic accuracy (ACC) of 85.9% and was confirmed in the internal and external validation cohort respectively, which yield the AUC with 0.877 (95% CI 0.818–0.935), 0.935 (95% CI 0.884–0.986) and ACC with 77.7% and 88.2%. In the ACR TR4 model, positive VTIQ max result was the only variable associated with increased risk and yield an AUC of 0.809 (95% CI 0.684–0.935) and ACC of 89.4%, which yield an AUC of 0.842 (95% CI 0.719–0.962), 0.705 (95% CI 0.271–1) and ACC of 82.0% and 80.9% in the internal and external validation cohort respectively. In the ACR TR5 model, the AUC had a favorable value of 0.859 (95% CI 0.801–0.918), indicating a ACC of 82.3% and the performances were verified on the validation cohort: the AUC with 0.830 (95% CI 0.716–0.945), 0.906 (95% CI 0.816–0.995) and ACC with 79.2% and 86.2%.
3.2 Calibration
All three models showed considerable results for the calibration curves (all p > 0.05), meaning that all models showed good agreement between prediction and observation (Table 4).