This study was conceived within the Gemelli Generator Molise (G2 M) group study and was approved by the Institutional Review Board of the Gemelli Molise Hospital (Campobasso, Italy). Thirty patients were retrospectively enrolled from October 2015 to October 2019. All patients aged >18 and <75 years, diabetic and nondiabetic, with indications to receive carotid endarterectomy (TEA) for extracranial high-grade (>70%) internal carotid artery stenosis were included. High-risk cardiological patient candidates for carotid revascularization by stent angioplasty or myocardial and carotid revascularization by combined aorto-coronary bypass surgery and carotid TEA were excluded.
Patients entering the study signed informed consent for treatment and use of their clinical data for research or educational purposes. Patient data were anonymized, managed according to the existing legislation for the protection of privacy and uploaded to the segmentation and analysis platforms.
Imaging data and segmentation
CT angiography was performed with a 128-slice scanner (Brilliance 128, Philips Healthcare, Best, the Netherlands). An 18-gauge intravenous catheter was placed in the antecubital vein; 55 mL of contrast, iomeprol 400 mg/ml (Iomeron®, Bracco, Milan, Italy) was infused at 4 mL/s after an initial injection delay depending on an attenuation of 140 HU in the ascending aorta with a slice thickness of 0.9 mm. Curved multiplanar and volume rendering reconstructions were obtained by means of dedicated computer software.
CTA images were transferred by the Digital Imaging and Communications in Medicine (DICOM) protocol to a treatment planning system (Philips Pinnacle, Fichburg, USA) in our radiation oncology department, containing a dedicated module for image segmentation and analysis. The plaques were manually segmented on all CT slices by a senior radiologist (A.P.) and by a vascular surgeon (P.M.).
Feature extraction was performed using Moddicom (31), an R software package composed of classes written using functions and the S3 objects paradigm, which was standardized through the Image Biomarker Standardization Initiative (IBSI) (32). Two hundred thirty features, including first-order statistical features, shape-based features, gray-level cooccurrence matrix features (GLCM), gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), neighboring gray tone difference matrix (NGTDM), and gray level dependence matrix (GLDM) features, were extracted (32).
Features selection and modeling
Radiomic feature extraction usually leads to a highly redundant feature space. Since many of these features could be highly correlated with one another, a feature selection method is necessary to avoid collinearity and reduce dimensionality. Following the common practice known as “one in ten rule”, i.e., for each feature, at least ten subjects should be available for each class, our aim was to select no more than two features to build reliable models. To address this problem, we used a multistage feature selection method. First, the pairwise feature interdependencies were evaluated using the Spearman rank correlation coefficient (ρ), aiming to identify complex functional dependencies between features. The redundant features, defined as those having |ρ| ≥ 0.8, were eliminated. In the second stage, a univariate analysis was performed to assess the association between each feature and the plaque classification and chose the top-ranked features. Subsequently, we investigated the feature predictive value in a binary logistic regression. To determine the relative importance of features, the remaining features were included in a stepwise backward elimination approach. In each step of this method, a feature is considered for subtraction from the set of explanatory variables based on specified criteria, including the Akaike information criterion (AIC). In particular, we tested the impact of the deletion of each feature whose loss gives the most statistically insignificant deterioration on the model. AIC is derived from information theory and is a model selection criterion that penalizes models for which adding new explanatory variables does not supply sufficient information to the model. The aim is to minimize the AIC. It is defined as:
where logL(M) is the maximized log likelihood for the fitted model, N is the sample size and K is the number of covariates including an intercept.
The final significant features were finally used to build the models for binary classification of carotid plaques, including logistic regression (LR), support vector machine (SVM), and classification and regression tree analysis (CART).
LR is a classical machine learning algorithm that is usually used for binary classification tasks. This model provides the probability p(y = 1|x), that is, the probability of a positive result y = 1 under the given data x. LR has the advantage of fast training, and discrete and continuous variables can be used as inputs (33).
SVM-based classification models are widely used supervised classification algorithms. For a given set of feature data from two groups of patients, SVM attempts to determine the maximum hyperplane between the two classes to maximize its distance to the nearest data points on each side (the so-called support vectors). SVM represents a powerful method fable to obtain good classification results by using only a few data points (34). In particular, if the two samples are not linearly separable, some kernel functions can be used to map them to higher dimensional space where they become more separable (35). In this study, we used four kernel types, including linear, power, sigmoid and radial basis function (RBF) kernels. To maximize the margin distance between the hyperplane and the closest samples of both classes, the internal parameter of the training process, the cost C, was set to 1 after a tuning process to obtain the optimal classification results.
Finally, the final features were used as input for a classification and regression tree analysis (CART) (36) to visually stratify patients into a risk group model to stratify patients into plaque risk groups. The representation for the CART model is a binary tree; each root node represents a single input feature and a split point on that feature. The leaf nodes of the tree contain an output variable that is used to make a prediction. The best splits were identified by the Gini impurity (GI) index:
where pi is the fraction of items in the class i.
The models were cross-validated using 5-fold cross validation (37). This is a resampling method that randomly partitions each feature dataset into five equally sized subsets of samples (folds), maintaining a balanced amount of both classes in each fold. In this way, five models were trained and tested; each of the five folds was used once as the test set, while the four remaining folds were used to train the model. The process was repeated ten times aiming to reduce the variance of the cross validation results and to reduce the unlikely chance of obtaining overoptimistic results with just one run.
The performance of the models (i.e. the ability of radiomics features to discriminate and classify plaques was assessed using receiver operator characteristic curves (ROCs) and the area under the curve (AUC). In addition, class-specific accuracy, precision, recall and F-measure evaluation metrics were also used for their ability to provide classifier output quality. The accuracy is defined as the proportion of correct predictions (both true positives and true negatives) among the total number of cases examined. The precision is defined as the number of true positive results divided by the number of all positive results, including those not identified correctly (it is also known as positive predictive value). It can be thought of as a measure of classifier exactness. A low precision indicates a large number of false positives. The recall is defined as the number of true positive results divided by the number of all samples that should have been identified as positive (it is also known as sensitivity in binary classification or true positive rate). It can be thought of as a measure of classifier completeness. A low recall indicates many False Negatives. The F-score is defined as the harmonic mean of the precision and recall. Similar to AUC, these measures range from 0 to 1, with higher values indicating better classification performance.