Intratumoral and peritumoral CT-based radiomics strategy reveals distinct subtypes of non-small-cell lung cancer

To evaluate a new radiomics strategy that incorporates intratumoral and peritumoral features extracted from lung CT images with ensemble learning for pretreatment prediction of lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD). A total of 105 patients (47 LUSC and 58 LUAD) with pretherapy CT scans were involved in this retrospective study, and were divided into training (n = 73) and testing (n = 32) cohorts. Seven categories of radiomics features involving 3078 metrics in total were extracted from the intra- and peritumoral regions of each patient’s CT data. Student’s t tests in combination with three feature selection methods were adopted for optimal features selection. An ensemble classifier was developed using five common machine learning classifiers with these optimal features. The performance was assessed using both training and testing cohorts, and further compared with that of Visual Geometry Group-16 (VGG-16) deep network for this predictive task. The classification models developed using optimal feature subsets determined from intratumoral region and peritumoral region with the ensemble classifier achieved mean area under the curve (AUC) of 0.87, 0.83 in the training cohort and 0.66, 0.60 in the testing cohort, respectively. The model developed by using the optimal feature subset selected from both intra- and peritumoral regions with the ensemble classifier achieved great performance improvement, with AUC of 0.87 and 0.78 in both cohorts, respectively, which are also superior to that of VGG-16 (AUC of 0.68 in the testing cohort). The proposed new radiomics strategy that extracts image features from the intra- and peritumoral regions with ensemble learning could greatly improve the diagnostic performance for the histological subtype stratification in patients with NSCLC.


Introduction
Lung cancer is the most frequently occurring cancer and the leading cause of cancer-related death in men globally (Sung et al. 2020). In women, lung cancer is the third most commonly diagnosed cancer and the second most leading cause of cancer-related death (Sung et al. 2020). Approximately 85% of primary lung malignancies are non-small-cell lung cancer (NSCLC), and the 5-year survival rate is less than 20% (Bashir et al. 2019;Herbst et al. 2008;Bray, et al. 2018;Su 2019;Ma et al. 2018a).
Lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) are two major histological subtypes of NSCLC that constitute approximately 35% and 60% of primary NSCLC cases, respectively (Bashir et al. 2019;Herbst et al. 2008;Su 2019;Zhu et al. 2018;Hoffman et al. 2000;Tang, et al. 2020). LUSC often shows keratinization, pearl formation, and intercellular bridges, whereas LUAD may exhibit lepidic, glandular, papillary or micropapillary, or solid architecture (Bashir et al. 2019). These two histological subtypes always present different anatomical sites and glucose metabolism levels, and reflect the need for different optimal treatments to improve clinical outcomes (Herbst et al. 2008;Ma et al. 2018a;Zhu et al. 2018;Hoffman et al. 2000). Therefore, accurately predicting LUSC and LUAD is of paramount importance prior to clinical interventions (Ma et al. 2018a).
The first-line reference in preoperatively diagnosing LUSC and LUAD is lung biopsy (Herbst et al. 2008;Ma et al. 2018a;Zhu et al. 2018;Hoffman et al. 2000;Mahon et al. 2019), which is an invasive diagnostic approach with a high level of risks in clinical practice (Ebrahimi et al. 2016). In addition, concerning the issue of tumor heterogeneity of NSCLC, lung biopsy examines only very limited proportions of the tumor tissue and is incapable of completely characterizing tumor properties (Su 2019;Zhu et al. 2018).
Developing a noninvasive strategy for the accurate prediction of LUSC and LUAD preoperatively is desirable.
Non-invasive imaging technologies, such as computed tomography (CT) and multiparametric magnetic resonance imaging (mpMRI), have recently been widely used for the pretherapy diagnosis of NSCLC (Su 2019;Zhu et al. 2018;Sun et al. 2018;Starkov et al. 2018;Sollini et al. 2017;Shen et al. 2017). Compared with mpMRI, CT offers considerably better imaging efficiency, higher resolution, and fewer motion artifacts caused by breathing and is thus recommended in the guidelines for NSCLC screening and diagnosis (Bashir et al. 2019;Starkov et al. 2018). However, it is very challenging for clinicians to visually predict the histological subtype of NSCLC directly from CT images to discriminate between LUSC and LUAD.
In recent years, radiomics strategies have been used for the prediction of LUSC and LUAD. In 2016, Wu et al. explored a CT-based radiomics strategy with 440 features extracted, and the Naïve Baye's classifier was used and achieved fair performance for the differentiation of LUSC and LUAD with an area under the curve (AUC) of the receiver-operating characteristic (ROC) curve of 0.72 (Wu, et al. 2016). Bashir et al. extracted 115 radiomics features from CT data and developed a prediction model based on the optimal features and random forest (RF) classifier, achieving an AUC of 0.82 for discriminating between LUSC and LUAD (Bashir et al. 2019). Chaunzwa et al. introduced the convolutional neural network (CNN) to the prediction task and developed a prediction model based on the Visual Geometry Group-16 (VGG-16) network (Chaunzwa, et al. 2018), obtaining an optimal AUC of 0.751.
In addition, some recent studies also integrated the radiomics strategy with positron emission tomography computed tomography (PET-CT) images, achieving favorable diagnostic performance in the differentiation of these two subtypes of NSCLC (Ma et al. 2018b;Koyasu et al. 2020;Ren 2020). For instance, Koyasu et al. proposed a PET-CTbased radiomics strategy with an extreme gradient boosting (XGBoost) classifier for the prediction task (Koyasu et al. 2020), achieving good performance with an AUC of 0.843.
Although these previous studies have repeatedly demonstrated the feasibility of the radiomics strategy based on CT or PET-CT for the prediction of histological subtypes of NSCLC, all the features they extracted were from the intratumoral region of the image. We are not aware of any work that has attempted to evaluate the peritumoral area outside the tumor to distinguish LUSC from LUAD. According to a recent study (Beig et al. 2019), perinodular region-based radiomics features on lung CT images effectively reflect the difference between LUAD and granulomas and accurately distinguish these two types of lung nodules. Whether the radiomics features extracted from the peritumoral region of NSCLC can reflect the significant difference between LUSC and LUAD and further be used for the prediction task remains an open question to date. Therefore, the first aim of this study was to investigate whether the radiomics features extracted from the peritumoral region of NSCLC could significantly reflect the difference between LUSC and LUAD. To achieve this goal, seven feature categories were employed in this study, including morphological features, histogram-based features (firstorder features, hereafter), Haralick features of co-occurrence matrices (CM features, hereafter) (Haralick et al. 1973), and features derived from the run length matrix (RLM features, hereafter) (Galloway 1975), the neighborhood gray-tone difference matrix (NGTDM features, hereafter) (Amadasun and King 1989), the gray-level size zone matrix (GLSZM features, hereafter) (Thibault et al. 2014), and gray-level dependence matrix (GLDM features, hereafter) (Sun and Wee 1983) to fully characterize the global, local, and regional differences of the tissue in the peritumoral region between LUSC and LUAD (Xu, et al. 2019).
The second aim was to develop an accurate and consistent model for predicting LUSC and LUAD. To fulfill this aim, both intra-and peritumoral region-based radiomics features were utilized, and an ensemble classifier that combined multiple binary classifiers, such as support vector machine (SVM), RF, and XGBoost, was used to form a more robust predictive model. The diagnostic performance of the model was then assessed with AUC for the differentiation of LUSC and LUAD. Besides, the performance of the proposed model was also compared with that of the widely used deep network Visual Geometry Group-16 (VGG-16) (Li 2019).

Materials and methods
This retrospective study was approved by the institutional ethics review board of Xijing Hospital, and informed content was waived. The overall methodological pipeline of this study is shown in Fig. 1.

Patients
A total of 146 archival patients with postoperatively confirmed NSCLC were collected from Xijing Hospital. The inclusion criteria were as follows: (i) primary LUSC or LUAD was pathologically confirmed; (ii) CT scan was performed prior to any therapies. Patients who met one of the following conditions were excluded: i) lack of postoperative pathological information to confirm the histopathological subtype of the patient as LUSC or LUAD (n = 21); (ii) missing preoperative CT scan (n = 16); or (iii) poor imaging quality makes accurate tumor annotations extremely difficult (n = 4). Finally, 105 subjects were eligible for this Fig. 1 The schematic pipeline of the proposed strategy for the prediction of lung squamous cell carcinoma (LUSC) and lung adenocarcinoma (LUAD) via intra-and peritumoral CT radiomics features and ensemble learning study, including 47 patients with LUSC and 58 patients with LUAD. The patients were then randomly allocated into the training cohort (n = 73) and testing cohort (n = 32). The inclusion-exclusion process is illustrated in Fig. 2.

Image acquisition and region of interest annotation
All patients underwent thoracic CT imaging using a uCT 760 system (United Imaging Healthcare, China). The primary scanning parameters were as follows: 80 kV; 80 mAs; detector collimation: 64 × 0.6 mm; rotation time: 0.4 s; slice thickness: 5 mm; spacing between slices:5 mm; pixel spacing: 0.6 × 0.6 mm; and matrix size, 512 × 512. The entire lung region was scanned in each patient, and the image slice varied from 100 to 400.
Two types of regions of interest (ROIs), including intra-and peritumoral regions, were annotated from the CT images, as shown in Fig. 3. Prior to the intratumoral region annotation of each CT dataset, the axial image slice was selected to obtain the largest area of the archived tumor with the maximal size in each patient's lung region. Then, a manually depicted polygonal ROI was used to segment the intratumor region on the selected image slice. Two radiologists with 20 and 10 years of lung CT interpretation experience independently performed intratumoral region delineation using a custom-developed package. Then, divergence of their delineation results was carefully corrected by consensus.
After the intratumoral region mask was obtained, we adopted the morphological dilation operator to generate a new region mask that was approximately 10 mm larger in Illustration of the intratumoral region (light green) manually delineated and the first ring (0-5 mm, light purple) and second ring (5-10 mm, red) of the peritumoral regions generated by morphologically expanding the segmented intratumoral region mask radial distance than the intratumoral region according to pixel size (Beig et al. 2019). Then, the corresponding peritumoral region was the ring of the lung parenchyma around the tumor that was obtained by subtracting the intratumoral region mask from the new region mask after morphological expansion, as shown in Fig. 3. Finally, the peritumoral region was further divided into two rings including the first ring (0-5 mm) and the second ring (5-10 mm) for feature extraction and comparison (Beig et al. 2019).

Radiomics feature extraction
After intra-and peritumoral ROI segmentation, ten filters, including wavelet-HL, wavelet-LL, wavelet-LH, wavelet-HH, square, square root, logarithm, exponential, gradient, and local binary pattern (LBP), were utilized to the original image to magnify the tissue patterns and unearth important features. Then, six feature categories, including firstorder features, GLCM features, GLRLM features, NGTDM features, GLSZM features, and GLDM, were calculated from the original segmented image data and ten generated images of the intratumoral and two rings of the peritumoral regions (Zwanenburg, et al. 2020). Given that the peritumoral region was dilated based on use of the intratumoral region, the shape 2D features were only calculated from the intratumoral region. Therefore, 1032, 1023, and 1023 radiomics features were extracted from the intratumoral region and the first ring and the second ring of the peritumoral region, respectively, as shown in Table 1. Open source Pyradiomics (version 3.0.1) was used to perform this analysis (Griethuysen et al. 2017). All of the codes and results have been attached in the Appendix document in supplementary material.

Feature selection
In this study, a two-step feature selection strategy was adopted to determine an optimal subset of features for model construction, as shown in Fig. 1. The first step was statistical analysis of all these features between LUSC and LUAD, which was performed with Scikit-learn. Student's t test with a significant p-value set as 0.05 was then performed with all radiomics features to select those with significant intergroup differences between LUSC and LUAD (Probable et al. 1992).
Then, all significant features were standardized to eradicate differences of the feature-value scales. The normalized feature z of each feature x for a specific patient is calculated as follows: where x and are the mean and standard deviation, respectively, of each feature from the training cohort.
In the second step of feature selection, three widely used feature selection algorithms, including the minimum redundancy maximum relevance method (mRMR) (Peng et al. 2005), the least absolute shrinkage and selection operator(LASSO) (Tibshirani 1996;Sauerbrei et al. 2007), and the linear SVM-based recursive feature elimination 1 3 (SVM-RFE) (Fehr et al. 2015), were further implemented with these significant features to select an optimal feature subset from the training cohort for model development and external testing.

Model development based on ensemble learning and validation
With optimal features selected, the predictive model was then developed using the training cohort and the ensemble learning strategy with tenfold cross-validation and 10 rounds, as shown in Fig. S1 of the Appendix. And the performance of the model was then externally evaluated using the testing cohort. In each split, ninefold were used for model training and the fold remained was used for performance validation. The training performance we finally obtained was the average value of all the validation with ten splits. Then, we repeated the entire process with ten rounds to obtain optimal hyperparameters with the best average performance. After that, the entire training cohort was used for model development with these optimal hyperparameters. And the testing cohort, which was not participating in the training process, was used as the external cohort to verify the overall performance. Five commonly used binary classifiers, including the quadratic discriminant analysis (QDA) classifier, SVM with radial basis function (RBF) kernel, SVM with sigmoid/tanh kernel, RF, and XGBoost, were included in the ensemble learning framework. QDA is the most commonly used binary classifier, which has no same-covariance assumption for each binary class (Linear and Quadratic Discriminant analysis 2022; Tharwat 2016). SVM is a classical machine learning classifier with several typical kernels, such as RBF and sigmoid/tanh, that is used to compute the decision boundary that separates two classes with the maximum marginal distance (Hastie et al. 2009;Lam, et al. 2012;Stenzinger, et al. 2021). It has advantages in dealing with nonlinear features and is not easily overfit with even small datasets (Liang et al. 2018). The RF classifier can build multiple random decision trees (100 trees of the default parameter in Scikit-learn to avoid overfitting) and integrate them to make an accurate diagnosis (Liang et al. 2018;Khalilia et al. 2011;Seera and Lim 2014). XGBoost offers many benefits in classification, including high precision and consistency and the prevention of overfitting (Chen and Guestrin 2016; Colen 2021); thus, it was also included in the ensemble learning strategy.
The ensemble classifier was finally developed by weighting the predictive value of these five classifiers in the model training process, which can be expressed as follows: (2) where P(j) represents the final predictive value of the jth patient; p i (j) denotes the predictive value of the jth patient using the ith classifier; and i is the weighting parameter of the ith classifier in the ensemble learning process, which meets the following condition: In this study, the optimal weight i was determined based on minimizing the predictive error in the training process, and the cutoff P(j) for assigning the patient to the LUAD group was set as 0.5. If P(j) was greater than or equal to 0.5, the jth patient was allocated to the LUAD group. The overall performance was evaluated using both the training cohort and the testing cohort with the quantitative metrics of accuracy and AUC (Gupta and Mittal 2019a, b, c;Kora and Krishna 2014).
In addition, we also compared the performance of the proposed ensemble classifier with the VGG-16 network. The experiment was conducted with an NVIDIA GeForce RTX 3090 machine with 24 GB of memory. Hyperparameter parameters included: epoch of 50, batch size of 8, and learning rate of 0.0001. The optimizer is the Adam optimizer.

Statistical analysis
Statistical analyses of the patient demographics were performed using IBM SPSS statistics (version 19.0, Armonk, NY), and Python software (version 3.6 DL-GPU) was used to perform statistical selection of features with significant differences between LUSC and LUAD. Chi-square tests were performed to evaluate significant differences in primary clinical factors distributed between the training and testing cohorts, and Student's t tests were used to select significant radiomics features between LUSC and LUAD. Twosided p values less than 0.05 were considered significant (Xu, et al. 2019;Wu et al. 2017;Wu et al. 2018).

Demographics of eligible patients
A total of 105 NSCLC patients were eligible for this study, including 47 patients with LUSC and 58 with LUAD. These patients were randomly allocated into the training cohort (n = 73) and the testing cohort (n = 32). The baseline demographics and clinical information of these patients were collected from the archival medical document, as shown in Table 1. Statistical analyses indicate no significant differences between both the training and testing cohorts in terms of all these primary factors.
(3) Intra-and peritumoral tissue distribution differences between LUSC and LUAD characterized by the significant radiomics feature energy on CT images with the unit normalized as "1" on the color bar

Results of the two-step feature selection strategy
A total of 3078 standardized radiomics features, including 1032 features from the intratumoral region, 1023 from the first ring (0-5 mm), and 1023 from the second ring (5-10 mm) of peritumoral regions, were analyzed using Student's t test (p value < 0.05) to determine those with significant intergroup differences between LUSC and LUAD. Eventually, 500 significant features were selected from the intratumoral region, whereas only 220 and 119 significant features were selected from the first ring and second ring of peritumoral regions, respectively, as shown in Fig. 4. These results indicate that (i) a large number of radiomics features extracted from the peritumoral region can also reflect the significant differences in tissue distribution patterns between LUSC and LUAD; (ii) the closer the peritumoral region is located to the intratumoral region, the more features with significant differences could be obtained to reflect the tumor property difference. Figure 5 illustrates an example of the intra-and peritumoral tissue distribution differences of LUSC and LUAD determined using one of the significant radiomics features, energy, with 3 × 3 sliding patches on the CT image. After statistical analysis-based feature selection, three radiomics feature subsets were finally obtained, including (i) 500 significant features from the intratumoral region, (ii) 339 significant features from the entire peritumoral region, and (iii) 839 significant features from both intratumoral and peritumoral regions. All these significant features in each feature subset were further selected using three commonly applied strategies: SVM-RFE, LASSO, and mRMR with the mutual information difference (MID), as shown in Figs. 6, 7, 8. Table 2 shows the results after the second-step feature selection procedure.

Classification model development and performance evaluation
As these optimal feature subsets were determined, classification models were developed using five commonly used machine learning classifiers and the ensemble classifier with the training cohort, and the performance of each model was evaluated using both training and testing cohorts for distinguishing LUSC from LUAD. The results are presented in Fig. 9. Three columns of subfigures in Fig. 9 exhibit the performance of predictive models developed using Optimal features selected using SVM-RFE approach: a 12 optimal features selected from the intratumoral region; b six optimal features selected from the peritumoral region; and c nine optimal features selected from intra-and peritumoral regions optimal feature subsets determined from the intratumoral region, peritumoral region, and both intra-and peritumoral regions. These findings indicate that (i) the classification model determined from the peritumoral region achieved comparable performance to that from the intratumoral region; (ii) the classification model determined from intraand peritumoral regions dramatically improved the overall performance for the prediction of LUSC and LUAD; and  Optimal features selected using mRMR with MID: a 12 optimal features selected from the intratumoral region; b 12 optimal features selected from the peritumoral region; and c 12 optimal features selected from intra-and peritumoral regions (iii) the model developed by the ensemble classifier achieved more favorable and consistent performance with training and testing cohorts compared with those developed by five independent classifiers. Table 3 shows the performance of classification models developed by the ensemble classifier for the prediction task, indicating that the ensemble classification model developed by SVM-RFE-based optimal features determined from intra-and peritumoral regions achieved the best performance with AUC values of 0.87 and 0.78 in the training and testing cohorts, respectively. Besides, the VGG-16 model was trained with the same training cohort and validated with the same testing cohort as those were used in our proposed model. Finally, it obtained an AUC of 0.67 in the testing cohort for the identification of LUSC and LUAD, which is obvious inferior to that of our proposed model.

Discussion
In this study, we investigated the feasibility of CT-based radiomic features extracted from intra-and peritumoral regions of NSCLC to reflect the tissue distribution differences between LUSC and LUAD, and developed a CT-based radiomics strategy that incorporated high-throughput features with an ensemble classifier for the preoperative prediction of LUSC and LUAD. Three widely used methods, SVM-RFE, LASSO, and mRMR, were employed to select optimal features with significant intergroup differences between LUSC and LUAD for classification model development. Five independent classifiers, QDA, SVM with RBF kernel, SVM with sigmoid/tanh kernel, RF, and XGBoost, which were reported to have favorable classification performance and robustness for the diagnosis of cancer phenotypes with a small database, were utilized to form an ensemble classifier for classification model building. The results of the model that was developed using the ensemble classifier and optimal features selected by SVM-RFE from intra-and peritumoral regions demonstrate favorable discriminative power with both the training and testing cohorts.
In recent years, CT-/PET-CT/multimodal MRI-based radiomics strategies have been repeatedly demonstrated to have great capability for the prediction of LUSC and LUAD (Bashir et al. 2019;Tang, et al. 2020;Wu, et al. 2016;Ma et al. 2018b;Koyasu et al. 2020;Ren 2020). The diagnostic performance ranged between 0.72 and 0.843. Nevertheless, all these previous studies only focused on how to extract an increasing number of features from the intratumoral region of the image, regardless of the peritumoral parenchyma, which might also contain substantial information and be of equal importance for the prediction task. Some studies have revealed that the interface of the tumor has a "rim" of densely packed tumor-infiltrating lymphocytes and tumorassociated macrophages in representative hematoxylin and eosin-stained images (Hoffman et al. 2000;Beig et al. 2019;Kirienko et al. 2018;Jong et al. 2018). At a macroscopic scale, the densely packed stromal tumor-infiltrating lymphocytes around LUAD represent fine and smooth textures on CT images and thus could be potential imaging biomarkers for the identification of LUAD from LUSC (Beig et al. 2019). However, whether radiomics features extracted from the peritumoral parenchyma region effectively reflect the intergroup difference of the tissue and microenvironment between LUSC and LUAD remains unknown to date.
In this study, we found that a large number of radiomics features extracted from the intratumoral region and peritumoral region were significantly different between LUSC and LUAD, and the total number of significant features extracted from the first ring (0-5 mm) peritumoral region was much greater than that of the significant features extracted from the second ring (5-10 mm) peritumoral region. These results demonstrate and verify for the first time the hypothesis that the peritumoral region on CT images also contains substantial information that can reflect the tissue texture difference between LUSC and LUAD. In addition, the closer the peritumoral region is to the intratumoral region, the more substantial the information it contains.
Most of the previous studies only focused on extracting features from the original image data, neglecting the image filters that not only reduce the noise but also enhance the quality and magnify the texture in the image (Xu et al. 2017a, b). Therefore, in this study, ten filters, including wavelet-HL, wavelet-LL, wavelet-LH, wavelet-HH, square, square root, logarithm, exponential, gradient, and LBP, were utilized to preprocess the image for feature extraction. Seven categories of radiomics features, including morphological features, first-order features, second-order features, and high-order texture features, were adopted in this study to fully characterize the shape properties and global, local, and regional distribution patterns of the tissue, respectively. Student's t tests integrated with three widely applied feature selection algorithms (SVM-RFE, LASSO, and mRMR), were adopted for optimal feature selection and performance comparison. The results indicate that the optimal features selected using the SVM-RFE algorithm from all significant features of both intra-and peritumoral regions have the most powerful diagnostic ability for the discrimination between LUSC and LUAD. Fig. 9 Classification models developed using five independent classifiers and the ensemble classifier with optimal features determined by three different feature selection methods: a performance of classification models developed by using different classifiers and optimal features selected by SVM-RFE approach; b performance of classification models developed using different classifiers and optimal features selected by LASSO approach; c performance of classification models developed using different classifiers and optimal features selected by mRMR with MID ◂ Classification model development is the last but most crucial step in the proposed radiomics strategy for the prediction of LUSC and LUAD. In this step, the choice of an optimal decision classifier, for instance, SVM with RBF kernel or Sigmiod kernel, RF, QDA, or XGBoost represent the core influence of performance variation (Liang et al. 2018). Hence, the determination of an optimal classifier is of critical importance. To fully integrate all the merits of these five independent classifiers, an ensemble classifier was generated using five independent classifiers, SVM with RBF kernel, or sigmoid kernel, RF, QDA, and XGBoost, and its diagnostic performance was compared with these independent classifiers. The results indicate that (i) the classification model developed using the ensemble classifier achieves the most favorable, consistent and robust diagnostic performance compared with other independent classifiers, and (ii) optimal features determined by SVM-RFE from both intra-and peritumoral regions with the ensemble classifier achieve the best diagnostic performance for the prediction of LUSC and LUAD with both training and testing cohorts. In addition, the classification results of all these models developed by each classifier with optimal features determined from intratumoral, peritumoral, or both of intratumoral and peritumoral regions using SVM-RFE, LASSO, and mRMR also revealed that although the model based on the ensemble classifier did not always obtain the best results, it always ranked as one of the top two models in terms of the AUC with both cohorts, suggesting remarkable consistency and robustness in the prediction of LUSC and LUAD.
We further included the performance comparison between our methodology with the deep-learning algorithm VGG-16 which has been widely used for NSCLC image analysis. VGG-16 uses more channels, deeper convolutional layers, and wider feature map, which can extract more representative features for disease characterization. In addition, small kernels (3 × 3) were utilized in VGG-16 to replace the large kernels in other deep network, which can largely reduce the amount of parameters, carry out more nonlinear mapping, and help to increase the fitting ability of the network. Results indicate that the propose model has more advantages in the diagnosis of LUSC and LUAD with very limited data size.
However, the results of this study should be carefully interpreted due to the following limitations. First of all, the sample size of our study is small and single-centered, which might impair the generalizability of the model for the large multi-center database application. Moreover, other potential clinical factors, such as gene mutations and key molecular biomarkers, were not included in the current study given the incomplete data in the archival database, which should be further analyzed. In addition, deep radiomics features incorporating the current manual radiomics features might further improve current performance in the prediction of LUSC and LUAD. In future work, a large database from multiple centers will be collected for further evaluating the proposed method. Besides, multimodal imaging data like PET/CT or PET/MR will be considered to further improve the diagnostic performance.
In conclusion, the proposed CT-based radiomics strategy that extracts features from intra-and peritumoral regions, adopts SVM-RFE for optimal feature selection, and utilizes ensemble learning for classification model development is demonstrated with favorable predictive precision and stability for preoperatively prediction of LUSC and LUAD.
Author contributions XX, XT, and HH contributed to the study concept, design, and data interpretation. XT contributed to the CT and clinical data collection. XT and HY contributed to the intratumoral region annotation. HH, XX and PD performed the peritumoral region extraction and radiomics feature calculation; XX, HH and XT contributed to the model construction and data analysis. XX, XT, and HH contributed to the manuscript drafting, editing and revision. All authors approve the final version of the manuscript for submission.
Funding This work was funded by the National Natural Science Foundation of China (No. 81901698) and Young Eagle plan of High Ambition Project (No. 2020CYJHXXP).

Data availability statement
The raw/processed data of this study cannot be publicly shared at present as it forms part of an ongoing study, but it could be available under reasonable request from the corresponding author with the permission of the Institutional Review Board. Results and code package in each step of this study have been arranged in a document named as "Appendix". The code package has also been uploaded to Gitee for publicly sharing and further perfection (https:// gitee. com/ yang-tianr an-01/ radio mics_-ensem ble_ learn ing/ commit/ d51e6 859ef 48c92 cc0c7 94639 f0828 6ac89 569f8).