In this study, we investigated the feasibility of CT-based radiomic features extracted from intra- and peritumoral regions of NSCLC to reflect the tissue distribution differences between LUSC and LUAD, and developed a CT-based radiomics strategy that incorporated high-throughput features with an ensemble classifier for the preoperative prediction of LUSC and LUAD. Three widely used methods, SVM-RFE, LASSO, and mRMR, were employed to select optimal features with significant intergroup differences between LUSC and LUAD for classification model development. Five independent classifiers, QDA, SVM with RBF kernel, SVM with sigmoid/tanh kernel, RF, and XGBoost, which were reported to have favorable classification performance and robustness for the diagnosis of cancer phenotypes with a small database, were utilized to form an ensemble classifier for classification model building. The results of the model that was developed using the ensemble classifier and optimal features selected by SVM-RFE from intra- and peritumoral regions demonstrate favorable discriminative power with both the training and testing cohorts.
In recent years, CT-/PET-CT/multimodal MRI-based radiomics strategies have been repeatedly demonstrated to have great capability for the prediction of LUSC and LUAD [2, 9, 16, 18–20]. The diagnostic performance ranged between 0.72 and 0.843. Nevertheless, all these previous studies only focused on how to extract an increasing number of features from the intratumoral region of the image, regardless of the peritumoral parenchyma, which might also contain substantial information and be of equal importance for the prediction task. Some studies have revealed that the interface of the tumor has a “rim” of densely packed tumor-infiltrating lymphocytes and tumor-associated macrophages in representative hematoxylin and eosin–stained images [8, 21, 51, 52]. At a macroscopic scale, the densely packed stromal tumor-infiltrating lymphocytes around LUAD represent fine and smooth textures on CT images and thus could be potential imaging biomarkers for the identification of LUAD from LUSC [21]. However, whether radiomics features extracted from the peritumoral parenchyma region effectively reflect the intergroup difference of the tissue and microenvironment between LUSC and LUAD, remains unknown to date.
In this study, we found that a large number of radiomics features extracted from the intratumoral region and peritumoral region were significantly different between LUSC and LUAD, and the total number of significant features extracted from the first ring (0-5 mm) peritumoral region was much greater than that of the significant features extracted from the second ring (5-10 mm) peritumoral region. These results demonstrate and verify for the first time the hypothesis that the peritumoral region on CT images also contains substantial information that can reflect the tissue texture difference between LUSC and LUAD. In addition, the closer the peritumoral region is to the intratumoral region, the more substantial the information it contains.
Most of the previous studies only focused on extracting features from the original image data, neglecting the image filters that not only reduce the noise but also enhance the quality and magnify the texture in the image [53, 54]. Therefore, in this study, 10 filters including wavelet-HL, wavelet-LL, wavelet-LH, wavelet-HH, square, square root, logarithm, exponential, gradient, and LBP were utilized to preprocess the image for feature extraction. Seven categories of radiomics features, including morphological features, first-order features, second-order features, and high-order texture features, were adopted in this study to fully characterize the shape properties and global, local and regional distribution patterns of the tissue, respectively. Student’s t-tests integrated with three widely applied feature selection algorithms (SVM-RFE, LASSO and mRMR), were adopted for optimal feature selection and performance comparison. The results indicate that the optimal features selected using the SVM-RFE algorithm from all significant features of both intra- and peritumoral regions have the most powerful diagnostic ability for the discrimination between LUSC and LUAD.
Classification model development is the last but most crucial step in the proposed radiomics strategy for the prediction of LUSC and LUAD. In this step, the choice of an optimal decision classifier, for instance, SVM with RBF kernel or Sigmiod kernel, RF, QDA, or XGBoost represent the core influence of performance variation [40]. Hence, the determination of an optimal classifier is of critical importance. To fully integrate all the merits of these five independent classifiers, an ensemble classifier was generated using five independent classifiers, SVM with RBF kernel, or sigmoid kernel, RF, QDA, and XGBoost, and its diagnostic performance was compared with these independent classifiers. The results indicate that i) the classification model developed using the ensemble classifier achieves the most favorable, consistent and robust diagnostic performance compared with other independent classifiers, and ii) optimal features determined by SVM-RFE from both intra- and peritumoral regions with the ensemble classifier achieve the best diagnostic performance for the prediction of LUSC and LUAD with both training and testing cohorts. In addition, the classification results of all these models developed by each classifier with optimal features determined from intratumoral, peritumoral, or both of intratumoral and peritumoral regions using SVM-RFE, LASSO, and mRMR also revealed that although the model based on the ensemble classifier did not always obtain the best results, it always ranked as one of the top two models in terms of the AUC with both cohorts, suggesting remarkable consistency and robustness in the prediction of LUSC and LUAD.
The limitations of this study include the following aspects. First, inherent bias might exist given the retrospective nature of the present study with relatively small patient cohorts collected from a single clinical center. A larger number of participants from two or more clinical centers are further required to validate the performance of the model we developed. Moreover, other potential clinical factors, such as gene mutations and key molecular biomarkers, were not included in the current study given the incomplete data in the archival database, which should be further analyzed. In addition, deep radiomics features incorporating the current manual radiomics features might further improve current performance in the prediction of LUSC and LUAD.
In conclusion, the proposed CT-based radiomics strategy that extracts features from intra- and peritumoral regions, adopts SVM-RFE for optimal feature selection, and utilizes ensemble learning for classification model development is demonstrated with favorable predictive precision and stability for preoperatively prediction of LUSC and LUAD.