1. Study subject
A retrospective analysis was performed on patients who underwent 18F-FDG PET/CT examination from November 2017 to April 2021 in the Third Affiliated Hospital of Soochow University. The patients received surgical resection and EGFR detection within one month after PET/CT examination. Lung adenocarcinoma was confirmed by pathology after surgery. According to the EGFR detection results, the patients were divided into mutant and wild-type groups. The inclusion criteria were: (1) lung cancer was manifesting as GGNs by imaging; (2) PET/CT and HRCT examinations were performed within one month before surgery; (3) lung adenocarcinoma was confirmed by postoperative pathology, and EGFR mutation status was measured. The exclusion criteria were: (1) Poor image quality of GGNs or lesions were difficult to segment; (2) Patients with other tumors; (3) Patients with severe liver disease or diabetes; (4) The FDG uptake of GGNs was too low to distinguish it from the lung background. Finally, 106 patients were enrolled, including 106 GGNs, of which 81 were EGFR mutants, and 25 were wild-type (Figure 1). This study has been approved by the Ethics Committee of the Third Affiliated Hospital of Soochow University [Approval Number: (2020) Section No. 075].
2. EGFR mutation analysis
EGFR mutation analysis was performed by experienced pathologists in our pathology department using postoperative samples. The slice thickness was 3-5 mm, and the samples were frozen for 30 minutes. Gene amplification was performed using the ARMS method, also known as allele-specific amplification. Exon 18, 19, 20, and 21 of EGFR gene were tested using the Shanghai Yuanqi EGFR gene mutation detection kit (AmoyDx EGFR mutation detection kit), and the result was interpreted according to the criteria provided by the test kit.
3. Radiomics analysis
The radiomics analysis consists of 6 steps. The details are shown in Figure 2. and explained in detail as follows.
The image acquisition protocol was based on the Imaging Biomarker Standardization Initiative (IBSI) reporting guidelines (28). Within one month before surgery, the patients received 18F-FDG PET/CT examination (Biograph mCT 64, Siemens, Erlangen, Germany). According to the European Association for Nuclear Medicine (EANM) guideline 1.0 (version 2.0 released in February 2015) (29), the patients were fasted for 4 to 6 hours and then received an intravenous injection of 18F-FDG at 3.70 to 5.55 MBq/kg for imaging. The images were collected after 60 minutes of rest. All patients lay supine; the collection time was 2 min/bed, and the collection range was from the skull base to the middle femur. After PET/CT, the images were reconstructed on the post-processing workstation (TrueD software), and a spiral CT scan was immediately performed on GGNs under a breath-holding state. The scanning and reconstruction parameters were: tube voltage 140 kV; tube current was automatically adjusted by the caredose software according to human anatomy and tissue density; rotation time 0.5 s/turn; pitch 0.6; layer thickness 3.0mm; matrix 512×512; lung window: window width 1200 HU, window level -600 HU; mediastinal window: window width 350 HU, window level 40 HU. The images were reconstructed based on 1.0 mm layer thickness and 0.5 mm layer spacing. Attenuation correction was performed on the PET image using CT data, and the corrected PET image was combined with the CT image. No respiratory gating technology was applied during image acquisition.
3D-Slicer (version number 4.11.20200930, www.slicer.com) was used for image segmentation. We used a semi-automatic segmentation method developed by Beichel et al. (30) for PET images. For CT images (3mm), we used NVIDIA AI-Assisted Annotation (3D-Slicer built-in) and boundary-based CT segmentation model to process lung nodule images. The above segmentation was verified by experienced nuclear medicine doctors blinded to the patient's pathology and EGFR mutation status.
Before feature extraction, the images were normalized and interpolated (sitkBSpline algorithm, B-spline of order 3 interpolation) so that the isotropic voxel spacing was ration invariant. Feature extraction was performed to compare the image data from different samples. The CT image was resampled to size 1 mm × 1 mm × 1 mm, and the PET image was resampled to size 3 mm × 3 mm × 3 mm. The images were discretized by the fixed binwidth method. The binwidth of CT and PET images were 25 and 0.313, respectively. Discretization, Laplacian of Gaussian (LOG), and wavelet transform preprocessing were performed to generate different feature sets. Different sigma values were used for the LOG filter to extract fine, medium, and coarse features, ranging from 0.5 to 5, with a step size of 0.5 (31). The wavelet transform produced 8 decompositions per level (applying all possible combinations of high- or low-pass filters in each of the three dimensions, including HHH, HHL, HLH, HLL, LHH, LHL, LLH, and LLL). Preprocessing steps (including discretization, LOG, and wavelet transform) were performed on all shape features, first-order statistics, and textural features.
In the next step, multiple features from different feature classes were extracted. These categories included shape and morphological features (14 shape features), first-order statistics (18 FOS features), gray-level co-occurrence matrix (24 GLCM features), gray-level dependence matrix (14 GLDM features), gray-level run length matrix (16 GLRLM features), gray-level scale zone matrix (16 GLSZM features), and neighboring gray tone difference matrix (5 NGTDM features). The open-source Python (3.7.10) library PyRadiomics (version number 3.0.1, https://pyradiomics.readthedocs.io/en/latest/#) was used for image feature extraction (32). The radiomic features calculated by this package conform to the definition of features described by the Image Biomarker Standardization Initiative (IBSI), which ensures the coordination and reproducibility of the calculated radiological features, thereby promoting the reproducibility of this study (32,33). The detailed information of the radiomic features was listed in Supplementary Material Table S1.
Due to a large number of radiomic features and the relatively small case number in this study, the variance method was used to remove features with small variance (threshold=0.18) to avoid overfitting the model. Subsequently, the data set was stratified random sampled at a ratio of 65:35 to generate training set (n=68) and testing set (n=38). In the training set, the Mann-Whitney U test was used to screen out 55 radiomic features (p-value<0.1) related to the mutation status of EGFR. Then, in the standardized training set, the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm selected the optimal 14 predictive features from the 55 features (34). The LASSO algorithm adds an L1 regularization term to a least square algorithm to avoid overfitting, and 5-fold cross-validation was used. The Spearman correlation matrix of the selected 14 radiomic features was calculated (Supplementary Material Figure S1), and the highly correlated redundant features (r>0.85) were examined.
Classifier and Modeling
The analysis was performed using the internally developed Python framework in the open-source Python library Scikit-Learn (35), including feature selection and classification. The selected 14 predictive features were used to train the four ML models, including Logistic Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Extreme gradient boosting (XGBoost). The StratifiedKFold iterator in scikit-learn was used to apply 5-fold cross-validation in the training set to determine the model generalization. StratifiedKFold is a variation of k-fold cross-validation. It ensures that each set contains approximately the same percentage of samples of each target class as the whole training dataset. The AUC of the estimator applicable to the imbalance classification was used to evaluate the parameter settings.
4. Statistical analysis
The R software (version 3.4.3; http://www.R-project.org/) was used for statistical analysis. Continuous variables were expressed as mean ± standard deviation (normal distribution) or median (Q1-Q3) (skew distribution); categorical variables were expressed as frequency or rate (%). χ2 test (categorical variable), T-test (normal distribution), or Mann-Whitney U test (skew distribution) was used to compare the differences in general data and PET/CT conventional parameters between different EGFR mutation states (binary categorical variables). To test the model's generalization ability, 5-fold cross-validation was performed on the model based on the selected features. The receiver operating characteristic (ROC) curve and AUC were applied on the testing set to evaluate the model performance, provide classification report and confusion matrix, and calculate sensitivity, specificity, accuracy, positive prediction value (PPV), and negative prediction value (NPV), which provided quantitative performance measure of the model. Pairwise comparison of the AUCs of the models was performed using the method proposed by Delong et al. (36), and a calibration curve was drawn to compare the prediction accuracy of the models. All statistical tests were two-sided, and P<0.05 was considered statistically significant.