Computed Tomography Radiomics for the Preoperative Prediction of Cervical Lymph Node Metastasis in Papillary Thyroid Carcinoma: Development and External Validation

Background Cervical lymph node (LN) status is a critical factor related to the treatment and prognosis of papillary thyroid carcinoma (PTC). The aim of this study was to investigate the preoperative predictions of cervical LN metastasis in PTC using computed tomography (CT) radiomics. Methods A total of 134 PTC patients who underwent CT examinations were enrolled in the study at two institutes between January 2018 and January 2020. Of these patients, 289 cervical LNs (institute 1: 206 LNs from 88 patients; institute 2: 83 LNs from 46 patents) were selected. All the cases had been conrmed by surgery and pathology. Each LN was segmented and 1408 radiomic features were calculated radiomic features in noncontrast and contrast-enhanced CT images. Features were selected using the Boruta algorithm followed by an iterative culling-out algorithm. We compared four machine learning classiers, including random forest (RF), support vector machine (SVM), neural network (NN), and naïve bayes (NB) for the classication of LN metastasis. The models were rst trained and validated by 10-fold cross-validation using data from institute 1 and then tested using independent data from institute 2. The performance of the models was compared using the area under the receiver operating characteristic curves (AUC). radiomic were


Background
Thyroid carcinoma is the most common malignant tumor of endocrine system and one of the fastestgrowing tumors in the world [1]. Among the thyroid carcinoma, papillary thyroid carcinoma (PTC) accounts for approximately 90% [2,3]. Although the 5-and 10-year survival rates of low-risk PTC patients approach 100%, 30-90% of them are associated with cervical lymph node (LN) metastasis at the time of diagnosis [4,5]. LN metastasis is the most important risk factor for local recurrence, which is more severe than the primary lesion [6]. Resection may be performed twice or more due to local recurrence, which affects the life quality of patients. Therefore, accurate preoperative identi cation of cervical LN metastasis is crucial to the selection of surgical methods and the prediction of local tumor recurrence, especially in high-risk groups with PTC metastasis [7,8].
Ultrasound (US) is a common imaging modality in the preoperative assessment of cervical LN metastasis in PTC. It is limited by operator dependence and examination eld. In addition, US is affected by the concealment of the trachea, esophageal gas, and sternum [9,10]. Previous studies have shown that the speci city and sensitivity of US for the diagnosis of PTC cervical LN metastasis are 85.0%-97.4% and 36.7%-61.0%, respectively [11][12][13]. CT overcomes the de ciencies of US and permits more comprehensive and intuitive visualizations of lesions and their relationships with surrounding structures.
In addition, the combination of noncontrast and contrast-enhanced CT allows better visualization and evaluation of the internal microcirculation of LNs [14]. Current guidelines recommend CT examination when US assessment is inadequate, large, or extensive for LN metastasis [15,16]. However, the sensitivity of traditional CT diagnosis method for predicting PTC cervical LN metastasis is also insu cient, approximately 55%-62% [17,18] In addition, the interpretation of image signs is highly subjective and diagnoses may not be accurate.
Radiomics may be used with large amounts of image data for thorough extraction and analysis of relevant features from regions of interest (ROI) in images. Machine learning and statistical analyses can be used to extract key information and establish objective analysis models. This can improve clinical diagnosis, which is currently highly subjective, consequently, and less evidence-based. The prediction model can assist clinical diagnosis and prognostic analysis and facilitate more accurate diagnoses and effective treatments [19,20]. The predictive value of PTC primary lesion on cervical LN metastasis by CT radiomic model has been reported [21]. However, to the best of our knowledge, there is rare reliable report of CT radiomics to predict cervical LN metastasis in PTC by analyzing LN itself. Based on our previous studies [22][23][24], this study established a CT radiomic model for predicting cervical LN metastasis from PTC. Internal and external cross-validation were conducted for the rst time to explore the reliability of CT radiomic models in predicting cervical LN metastasis

Study subjects
We retrospectively reviewed data of PTC patients at two institutes between January 2018 and January 2020. The inclusion criteria were: (i) An adult patient over 18 years old, who has not received surgery and antitumor treatment, (ii) PTC was con rmed by surgical pathology and LN surgical pathology was obtained by cervical LN dissection, (iii) preoperative noncontrast and contrast-enhanced CT scans were performed within 2 weeks prior to surgery and biopsy. The exclusion criteria were as follows: (i) incomplete clinical and imaging data, (ii) unclear LN display due to image quality or other problems.
In terms of our previous study and other reports [25][26][27][28], the following inclusion criteria were established for metastatic LNs: (i) no other de nite cervical LN lesion, such as tuberculosis and lymphoma; (ii) selected from the cervical LN group with two LN metastases con rmed by pathology; (iii) short diameter of LN ≥ 5 mm; (iiii) the LN with highest score and ≥ two points (the classic signs of PTC LN metastasis including maximum short diameter, short diameter/long diameter ≥ 1/2, highest enhancement, cystic degeneration/necrosis, and microcalci cation were assigned one point, respectively). The inclusion criteria for non-metastatic LNs met the following three points: (i) no other de nite cervical LN lesion, such as tuberculosis and lymphoma; (ii) selected from the cervical LN groups with no LN metastasis con rmed by pathology; (iii) the LN with largest diameter and a short diameter ≥ 5 mm within the groups mentioned in point (ii).

Ct Examinations
Each patient underwent noncontrast and contrast-enhanced CT examinations in the supine position with a 16-layer CT scanner (Institute 1: Lightspeed, GE, United States; Institute 2: Siemens Healthineers, German). The scanning range was from oropharynx to superior clavicle. The contrast agent (Institute 1: Bayer, German; Institute 2: Yangtze River, China) was injected intravenously through the elbow with a high-pressure syringe. The speci c parameters of the two institutes were listed in the Table 1.

Ln Segmentation
The noncontrast and contrast-enhanced CT images of patients were retrieved from the picture archiving Medical School). It facilitated data loading, segmentation, feature calculation, feature selection and building of the radiomic model. One radiologist (P.W. with 5 years of experience) manually contoured the target LN in the contrast-enhanced CT image, slice by slice, and used them as the standard to delineate the LN at the same level on the noncontrast CT scan to maintain consistency with the contrast-enhanced image, while exercising caution to avoid the surrounding blood vessels, calci cation, peripheral fat, and other non-LN tissues. The corresponding sagittal and coronal planes of the LN were referenced when it was ambiguous in the axial plane. The segmentation results were checked by a senior head and neck radiologist (Z.H. with 18 years of experience in head and neck radiology). Any disagreement between the two radiologists was resolved by discussion and consensus. Both radiologists were blinded to the pathological assessment of LN after surgery.

Radiomic Features Extraction And Analysis
Radiomic feature extraction and analysis were carried out on the 3DQI platform. wavelet-transformed textures in one scan. CT images of each LN included two phases of pre-contrast and post-contrast, which were used to calculate the texture features of the two phases respectively; a total of 1408 texture features.
Important features that would be useful for classi cation and image recognition were selected from a large number of texture features for modeling. We adopted a two-round feature selection method to select important features for the classi cation of benign and malignant LNs. First, the importance scores calculated by the Boruta algorithm were used for a rapid reduction of texture dimensionality [29]. The Boruta algorithm is a feature ranking and selection algorithm based on the random forest algorithm, which identi es all features which are either strongly or weakly related to the decision variable. Nonrelevant features were rejected using a Z-score cutoff of less than 0.01. During the second round, an iterative culling-out algorithm was used to re ne the performance of a classi er; an RF model was used [30]. In each iteration, we calculated the classi cation performance of the model by removing one of the textures, which characterize the area under the receiver operating characteristic (ROC) curves (AUC). If an AUC value, using one less texture parameter, was higher than that of the current model, the model with the maximum AUC value was selected. This iteration continued until the current model had the highest AUC.
We compared four machine-learning classi cation algorithms, including random forest (RF), support vector machine (SVM), neural network (NN), and naïve bayes (NB) and determined the optimal algorithm for the radiomic model. The performance of the model was measured by AUC. We used the 10-fold crossvalidation to validate the model during training. For the 10-fold cross-validation, the entire dataset was randomly divided into 10 subsets. Nine subsets were used for training the classi cation model. Subsequently, the trained models were tested with the subset that was not used for training. This procedure was repeated for each subset until all the subsets had been tested. This 10-fold crossvalidation was repeated 10 times to optimize and stabilize the performance of the models. In addition, the built models were tested with the external dataset from institute 2. We compared the performances of the models using AUC, sensitivity, speci city, and accuracy on the classi cation of benign and malignant LNs.

Statistical analysis
The statistical methods used in feature selection and model construction were provided by 3DQI software. SPSS 22.0 was used to analyze general data. The Kolmogorov-Smirnov test was used to test the normality of age and mean ± standard deviation (SD) were used for the normal distribution. In addition, the t-test was performed. The median (interquartile range) was used for non-normal distribution, and the Mann-Whitney test was adopted. Pearson's chi-square test or Fisher's exact test was used to compare the gender differences between the two groups. P < 0.05 was statistically signi cant.

Baseline Characteristics
Of

Performance Of Radiomic Models
Seven important features were selected from 1408 pre-contrast and contrast-enhanced CT images (Fig. 1)-3 histogram features, 1 GLCM feature, and 3 GLZSM features. The three histogram features were 0_HIST_quant0.75 (pre-contarst CT histogram, 75% quantile of the voxel intensity), 1_HIST_quant0.975 (contrast-enhanced CT histogram, 97.5% quantile of the voxel intensity), and 1_HIST_quant0.75 (contrastenhanced CT histogram, quantile of the voxel intensity). The GLCM feature was 0_GLCM_clusShade (Cluster Shade of noncontrast CT, which is a measure of the skewness and uniformity of the GLCM). The three GLZSM features were 0_GLZSM_sae (Small Area Emphasis of noncontrast CT: a measure of the distribution of small size zones), 1_GLZSM_gln (Gray Level Non-Uniformity of contrast-enhanced CT: a measure of the variability of gray-level intensity values in the image), and 1_GLZSM_ze (Zone Entropy of contrast-enhanced CT: a measure of the uncertainty/randomness in the distribution of zone size and gray levels). Five texture features (0_HIST_quant0.75, 1_HIST_quant0.975, 1_HIST_quant0.75, 0_GLCM_ clusShade, and 1_GLZSM_gln) of LNs in the metastatic group were higher than those in the nonmetastatic group. In contrast, two texture features (1_GLZSM_ze and 0_GLZSM_sze) of LNs in the nonmetastatic group were higher than those in the metastatic group.  Table 2.  Figure 2B shows the ROCs of the four radiomic models using RF, SVM, NN, and NB and data from institute 2. The external validation revealed AUC of 0.926 (95% CI: 0.86-0.98), 0.936 (95% CI: 0.88-0.98), 0.925 (95% CI: 0.86-0.97), and 0.912 (95% CI: 0.83-0.98) for RF, SVM, NN, and NB, respectively. The sensitivity, speci city, and accuracy of these models are listed in Table 2.

Discussion
In this study, we used the 3DQI software to extract the features of PTC cervical LNs at two institutes and select the feature parameters closely related to LN metastasis, establishing a CT radiomic model to predict LN metastasis based on these parameters. An internal 10-fold cross-validation showed that the AUC and accuracy were slightly higher in RF and SVM than NN and NB; SVM had a higher sensitivity, while RF had a slightly higher speci city. External validation also showed similar results, indicating the stability of the model. The model had satisfactory performance in the preoperative prediction of PTC cervical LN metastasis.
Currently, the preoperative CT evaluation of PTC cervical LN metastasis mainly relies on the signs such as microcalci cation, necrotic or cystic degeneration, uniform or non-uniform high enhancement, and minimum/maximum diameter > 0.5 [25,26]. However, subjective evaluation of these signs by radiologists has limited the performance for the diagnosis of LN metastasis. Previous studies reported that the sensitivity was approximately 55%-62%, and the speci city was 87% [17,18], which are signi cantly lower than the results of our study. In recent years, radiomics have been applied in the evaluation of PTC cervical LN metastasis mainly based on US image analysis [5,31,32]. Kim et al. [31] and Liu et al. [32] used US texture analysis to evaluate the predictive value of PTC primary lesions for cervical LN metastasis and obtained the opposite conclusion. We speculate that this might be related to the high operator dependence of US, in addition to the inclusion criteria of the two samples.
There are fewer studies on CT-based radiomic assessment of cervical LN metastasis in PTC. Lu et al. [21] analyzed the performance of CT radiomics for predicting cervical LN metastasis from a PTC primary lesion and found that the clinical nomogram yielded an AUC of 0.867 when incorporating the radiomic signature. Lee et al. [33] predicted cervical LN metastasis in thyroid cancer using CT deep learning. The AUC of the best-performance algorithm was 0.953, which was similar to our results. However, we employed machine-learning radiomic methods and rst performed external validation for model reliability. In addition, multiple cervical LN metastasis were common in PTC patients, but CT could hardly reach the node-node level comparison between LN imaging and pathology. In this study, we rst proposed to score the classic signs of PTC LN metastasis and classi ed the LN with highest score as metastatic LN, thus maximizing close to the comparison of lymph node imaging and pathology. As for nonmetastatic LNs, we selected the largest one from pathologically con rmed non-metastatic LN group to ensure the reliability and repeatability of our study.
The radiomic features of LN images extracted in the study included histogram features, GLCM and GLXSM texture features, in both pre-contrast and contrast-enhanced CT. The histogram features, 0_HIST_quant0.75, 1_HIST_quant0.975, and 1_HIST_quant0.75, re ected intensity information in a given ROI and quanti ed the heterogeneity within the LN, which can also represent lesion volume. Our results showed that the above histogram features of LNs in the metastatic group were higher than those in the non-metastatic group, indicating increased heterogeneity and volume in the metastatic LNs. The 0_GLCM_clusShade feature of GLCM can represent the uniformity and skewness of CT value spatial distributions within LNs that is mainly re ected in the in uence of pixels with spatial dependence and their relationship with surrounding environments [34]. A higher cluster shade texture implies more asymmetry and heterogeneous pixels in metastatic than non-metastatic LNs. 0_GLZSM_sze, 1_GLZSM_ze, and 1_GLZSM_gln were GLZSM textures, which represented the small zone emphasis, zone entropy, and gray-level non-uniformity. A lower GLZSM_sze in pre-contrast and GLZSM_ze in contrastenhanced images indicate that non-metastatic LNs have more small zone or more ne textures, and more randomness in the contrasted-enhanced images, whereas a higher GLZSM_gln texture indicates metastatic LNs are more heterogeneous in texture patterns. The Boruta algorithm was used to screen the features. The goal of feature selection was to obtain the feature set that can minimize the loss function of the current model and the goal of the Boruta algorithm was to screen out all the feature sets that were relevant to the dependent variables so that the in uence factors of the dependent variables could be considered more comprehensively. The model was trained and validated with 10 iterations of 10-fold cross-validation to avoid model over tting. Subsequently, model testing was carried out with external data to ascertain real-world performance.
Our study had two limitations. First, LNs with a short diameter < 5 mm were not included to avoid the partial volume effect of scans since the slice thickness of CT scan was 3-3.75 mm. Therefore, further studies are needed to evaluate model performance on these small nodules. Second, our study was retrospective at only two medical institutes. Further prospective studies with a larger sample size of patients from more institutions are required to validate our results.

Conclusions
The CT radiomic model showed high diagnostic performance for predicting PTC cervical LN metastasis and provided support to clinicians for accurate preoperative evaluation and treatment of LN metastasis.