Patients
A total of 190 SPN patients who underwent 18F-FDG-PET/CT examination at our hospital were retrospectively enrolled from January 2013 to December 2019. A flow chart of patient enrollment and study design is shown in Fig. 2. All cases in the training group were used to train the prediction model, while the cases in the independent testing group were used to evaluate the performance of the model.
Imaging
A GE Discovery Elite PET/CT scanner was used, whereby 18F-FDG is produced by a GE Mini Tracer cyclotron and synthesized by an automatic synthesis module, with radiochemical purity > 99%. Before examinations, patients fasted for more than 6 h, and the blood glucose was < 15 mg/L. The 18F-FDG injection dose was 3.7 MBq/kg body mass, and the patient was examined by routine PET/CT after 60 min of rest. CT scans were taken first with a tube voltage of 120 kV, automatic tube current (15 ~ 180 mA), tube rotation speed of 0.8 s/rot, and scanning layer thickness of 3.8 mm. PET scans were conducted in 3-dimensional mode, matrix of 192 × 192, 2 min/bed. The scanning range covered from the upper part of both thighs to the top of the head. After scanning, the ordered subsets maximum expectation method iteration was used for image reconstruction. PET and CT images were then transferred to a Xeleris workstation for image fusion.
Image Segmentation And Feature Extraction
Digital Imaging and Communications in Medicine(DICOM) format of CT and PET images were imported into the Artificial Intelligence Kit (AK, version 3.3.0, GE Healthcare, China) platform. Both CT and PET images were resampled through linear interpolation to ensure that the voxel was isotropic, with a voxel size of 1.0 mm × 1.0 mm × 1.0 mm. Resampled images were then imported to ITK-SNAP software (http://www.itksnap.org, version 3.6.0) for segmentation. The three-dimension (3D) regions of interest (ROIs) were manually delineated along the edges of lesions on all continuous slices on CT images and PET images, respectively. Figure S1 and Figure S2 show original CT and PET images and three-dimensional images of the ROI in a benign and a malignant case, respectively.
CT and PET images with respective sketched ROI files were imported into the AK platform for radiomics feature extraction. In addition, Intraclass correlation coefficients (ICCs) were used to assess the intra- and interobserver reproducibility of radiomics feature extraction. To assess interobserver reproducibility, the VOI segmentation of 30 randomly chosen images was performed by two chest radiologists (reader 1 and 2) independently who were blinded to all patients’ information. To evaluate intraobserver reproducibility, reader 1 repeated the same procedure at a 1-month interval. Reader 1 completed the remaining image segmentations. Features with ICCs greater than 0.75 indicated good reproducibility and were selected for subsequent analysis.
The maximum SUV of SPNs was measured. Use PET VCAR software to automatically select the entire tumor area as the volume of interest (Volume of Interesting, VOI), and measure the maximum SUV of the primary tumor (SUVmax).
Feature Selection And Modeling
The synthetic minority oversampling technique (SMOTE) method [12] was used for sample equalization to produce a benign: malignant ratio of 1:1. Before analysis, outlier and missing values in the training group were replaced by the median. The least absolute shrinkage and selection operator (LASSO) method [13] with 5-fold cross validation algorithm was then used for dimensionality reduction. This was followed by the Spearman correlation analysis method to remove redundancy, whereby features that correlated highly (|r|>0.9) with other features were eliminated. Finally, the most meaningful features based on CT and PET images were used for subsequent modeling, respectively. In the training group, the CT radiomics features, PET radiomics features and combined CT and PET radiomics features were used as independent variables with the pathological results of each patient's SPN as the dependent variable. The backward stepwise elimination method was used to construct the multivariate logistic regression model, calculated as: Radiomics signature = β0 + β1×x1 + β2×x2+……+βn×xn, where β0 is a constant term, xn = {xi, i = 1, 2, ..., n} represents the selected radiomics feature, and βn = {βi, i = 1, 2, ..., n} represents the feature regression coefficient. Based on this calculation formula, the CT radiomics signatures, the PET radiomics signatures, and the joint radiomics signatures were constructed in the training group.
Performance Evaluation
Receiver operating characteristic (ROC) curves, calibration curves, and decision curves were plotted to evaluate the discriminative performance, the calibration degree, and clinical usefulness.
Discrimination
The optimal diagnostic threshold was calculated based on the principle of the maximum Youden index, and then substituted into the independent testing dataset., The area under the ROC curve (AUC), the sensitivity (SEN), specificity (SPE), positive predictive values (PPV) negative predictive value (NPV), and accuracy (ACC) were calculated from ROC analysis in both the training and testing groups to evaluate the diagnostic efficacy of the three models. Besides, the ROC curve of the extracted SUV value was plotted. Delong tests were used to compare whether the difference between AUCs of each radiomics model was statistically significant.
Calibration
Calibration Curves were plotted in both training and testing cohorts to explore the agreement between the observed outcome frequencies and predicted probabilities of the model. The Hosmer-Lemeshow test was used to determine the goodness of fit of the models, and P values of more than 0.05 were considered well-calibrated.
Clinical Usefulness
Decision curve analysis (DCA) was performed to evaluate the net benefit for clinical application of the model by quantifying the net benefits at different threshold probabilities.
Statistical analysis
The baseline characteristics and SUVmax value in the training and testing cohorts were compared using Student’s t-test or the Mann–Whitney U-test for continuous variables and the chi-squared test or Fisher’s exact test for categorical variables. All statistical analyses were performed using R (version 3.5.1). The “glmnet”, “pROC”, “rms” and “rmda” packages in R were used in this study. A two-tailed P < 0.05 indicated a statistical significance.