Patients
Data for all cases were acquired from two medical centers: center 1 and center 2, from May 2018 to May 2022. Five hundred and sixty-eight patients who accepted unilateral adrenalectomy were included in the study, and they were all diagnosed with adrenal hyperplasia or adenoma pathologically. All patients underwent contrast-enhanced CT before surgery. CT images of all patients were further analyzed to identify iMAD or LPA using the following inclusion criteria: (1) patients with a single nodular lesion, (2) CT attenuation of lesions are >10 Hu on unenhanced CT images. The exclusion criteria were: (1) patients with lesions ≤ 10 Hu on unenhanced CT images, (2) bilateral or diffuse lesions, (3) lesions with necrosis or hemorrhage based on the pathological diagnosis, (4) patients without complete preoperative clinical and imaging data before surgery. The patient recruitment pathway is presented in Figure 1.
Finally, 148 patients were recruited: 108 patients from center 1 were divided into two cohorts (training and testing cohort) according to the ratio of 2:1 using computer-generated random numbers, and 40 patients from center 2 were incorporated into the external validation cohort.
The institutional review board of our hospital approved this retrospective study. Therefore, the requirement for obtaining informed consent was waived.
CT image acquisition
Patients from center 1 underwent contrast-enhanced CT using the multidetector-row CT systems (Aquilion ONE, TOSHIBA, Lispeed 64, GE Healthcare, Somatom Force, Siemens Healthcare, Ingenuity CT, Philips). The following scanning parameters were applied: 120 kV tube voltage, 250-400 mA (using automatic tube current modulation) tube current, 0.5 s or 0.6 s rotation time, 192 × 0.6 mm or 64 × 0.625 mm detector collimation, a matrix of 512 × 512, and a pitch of 0.6 or 1.0. Axial images were reconstructed with a 1 mm slice thickness. An 80-90 mL volume of iodinated contrast medium (Omnipaque 350, GE Healthcare, Shanghai, China) was injected into the antecubital vein using a power injector (at a rate of 3.0 mL/s). Unenhanced CT was first acquired, followed by two postcontrast CT scans obtained in the arterial phase (25 - 30 s) and the venous phase (60 - 70 s).
The patients from center 2 were examined using the multidetector-row CT systems (Discovery 750, GE Healthcare, Somatom Force, Siemens Healthcare). The following scanning parameters were applied: 100 or 120 kV tube voltage, 0.5 or 0.6 s rotation time, 250-400 mA (using automatic tube current modulation technique) tube current, 192 × 0.6 mm or 64 × 0.625 mm detector collimation, a matrix of 512 × 512, and a pitch of 1. Axial images were reconstructed with a 1 mm slice thickness. An 80-90 mL volume of iodinated contrast medium (Iopromide, Ultravist 300, Bayer, Germany) was injected into the antecubital vein by a power injector (at a rate of 2.5 mL/s). Pre-enhanced CT was first acquired, followed by two postcontrast CT scans obtained in the arterial phase (25 - 30 s) and venous phase (55 - 60 s).
Conventional CT feature evaluation
All of the image analysis were performed by the same radiology resident (Reader 1, HY) and a radiologist (Reader 2, CJ) with 5 and 10 years of abdominal imaging experience, respectively. The radiologists were blinded to the radiological reports and pathologic details. Reader 1 construed the following CT features by consensus: the maximum diameter (MD) and the CT attenuation of unenhanced phase (CTpre) and venous phase (CTV) on the axial CT image. All quantitative values were measured 3 times, and an average figure was applied.
Clinical factors selection and construction of the clinical factor model
Clinical data included specific assay indexes (cortisol, aldosterone, and renin) and data regarding age, sex, and history of hypertension. All information were obtained from medical records. Univariate logistic analysis was used to compare the differences in the clinical factors (including clinical data and CT features) in the training cohort between the two groups, a multiple logistic regression analysis was applied to build the clinical model by using the significant variables from the univariate analysis as inputs. Odds ratios (OR) with 95% confidence intervals (CI) for each independent factor were calculated as relative risk estimates.
Three-dimensional segmentation and radiomics feature extraction
Image resampling was performed before feature extraction to decrease the variability of radiomics features. Images were resampled to 1×1×1 mm3 voxels using the B-Spline interpolation method. A soft tissue CT window was modified with a level of 40HU and a width of 300 Hu. Three-dimensional lesion segmentation was manually delineated as a region of interest (ROI) on images of three phases at ITK-SNAP (http://www.itksnap.org/pmwiki/pmwiki.php). Images of corresponding arterial and venous phases were used as reference when the contour on the unenhanced image was delineated. An example of manual segmentation is shown in Figure 2.
Radiomics extraction was performed with Deepwise Research Portal (Deepwise Bolian, Co., Ltd.), integrated with PyRadiomics. A total of 1288 radiomics features were extracted from each phase. Twenty patients (10 iMAD and 10 LPA) were randomly chosen for further evaluation of feature stability: both the two radiologists repeated the same delineation procedure two weeks later. Inter- and intra- class correlation coefficients (ICC) were used to assess the extracted features' inter-observer reliability and intra-observer reproducibility. It is commonly accepted that intra- and inter-class correlation coefficient (ICC) < 0.5 indicates poor reliability, 0.5 - 0.75: moderate reliability, and > 0.75: good or excellent reliability [24]. Thus, the features with ICC < 0.75 were excluded.
Construction and comparison of the radiomics signatures
Z-score normalization was first used to standardize the features before the following analysis. Dimension reduction was conducted before signature construction in order to eliminate features overfitting. The stable radiomics features were assessed by one-way analysis of variance, and the features that were significantly different between the two groups (p < 0.05) were enrolled into the least absolute shrinkage and selection operator (LASSO) regression model in order to select the most valuable features in the training cohort. Then, the selected features were applied to build radiomics signatures. Two radiomics models were established: a triphasic CT feature based on unenhanced and enhanced CT features and an unenhanced radiomics model based on unenhanced CT features. A radiomics score (Rad-score) was calculated for each patient through a linear combination of selected features weighted by their respective LASSO coefficients.
Development of a radiomics nomogram and assessment of the performance of different models
Delong’s test was first used to compare two radiomics models before the radiomics nomogram was performed. The triphasic radiomics model was chosen when establishing a nomogram if there was a significant difference between two radiomics signatures, otherwise, an unenhanced radiomics model was chosen. Then, a radiomics nomogram was establish by incorporating the chosen radiomics model and the selected clinical factors. A radiomics nomogram score (Nomo-score) was calculated for each patient. The calibration of the nomogram was verified using a calibration curve. The Hosmer–Lemeshow test was used to validate the goodness-of-fit of the nomogram. The diagnostic performance of the clinical model, the chosen radiomics signature, and the radiomics nomogram was assessed based on the area under the receiver operator characteristic curve (AUC) in the training, testing, and external validation sets. Decision curve analysis (DCA) was performed to evaluate the clinical usefulness of three models by calculating the net benefits for a range of threshold probabilities in the external validation cohort. [21]
Statistical Analysis
Statistical tests were performed using MedCalc statistical software (version 20.110, https://www.medcalc.org), R statistical software (version 3.3.3, https://www.r-project.org), and MATLAB (version 2021a). Student’s t-tests or non-parametric tests (where appropriate) were used for continuous variables. Chi-squared test or Fisher’s exact test (where appropriate) were used for categorical variables. Categorical and continuous variables are shown in frequency (percentages), mean ± standard deviation, or median (interquartile range), where appropriate. The AUC was compared by Delong’s test. A p-value < 0.05 was considered to be statistically significant.