Clinical and radiomic factors for predicting invasiveness in pulmonary ground‑glass opacity

Patients with preinvasive or invasive pulmonary ground-glass opacity (GGO) often face different clinical treatments and prognoses. The present study aimed to identify the invasiveness of pulmonary GGO by analysing clinical and radiomic features. Patients with pulmonary GGOs who were treated between January 2014 and February 2019 were included. Clinical features were collected, while radiomic features were extracted from computed tomography records using the three-dimensional Slicer software. Predictors of GGO invasiveness were selected by least absolute shrinkage and selection operator logistic regression analysis, and receiver operating characteristic (ROC) curves were drawn for each prediction model. A total of 194 patients with pulmonary GGOs were included in the present study. The maximum diameter of the solid component, waveletHLL_ngtdm_Coarseness (P=0.03), waveletLHH_firstorder_Maximum (P<0.01) and waveletLLH_glrlm_LongRunEmphasis (P<0.01) were significant predictors of invasive lung GGOs. The area under the ROC curve (AUC) for the prediction models of clinical features and radiomic features was 0.755 and 0.719, respectively, whereas the AUC for the combined prediction model was 0.864 (95% CI, 0.802-0.926). Finally, a nomogram was established for individualized prediction of invasiveness. The combination of radiomic and clinical features can enable the differentiation between preinvasive and invasive GGOs. The present results can provide some basis for the best choice of treatment in patients with lung GGOs.


Introduction
In recent years, the detection rate of pulmonary ground-glass opacity (GGO) increased signi cantly with the application of highresolution chest CT (HRCT) [1][2]. Studies have demonstrated that 59%-75% of persistent GGOs were precancerous lesions or early stage adenocarcinoma [3][4]. According to the Internal Association for the Study of Lung Cancer, American Thoracic Society and European Respiratory Society classi cation in 2011, adenocarcinoma was classi ed as atypical adenomatous hyperplasia (AAH), adenocarcinoma in situ (AIS), minimally invasive adenocarcinoma (MIA) and invasive adenocarcinoma (IA) [5]. In general, preinvasive GGOs contain AAH and AIS, and MIA and IA are categorized as invasive lesions [6]. Up to now, the treatment of pulmonary GGO was often based on CT manifestations and clinical experience. AAH and AIs are pure GGO or mixed GGO with few solid components on chest CT [7][8], and these types of preinvasive nodules often needed close follow-up or limited resection [9][10]. The mixed GGO with more solid components tend to be invasiveness lesions, and it is often necessary to perform segmentectomy or lobectomy with lymph node dissection for this type of GGO. After appropriate operation, compared with the 100% 5-year disease-free survival with AAH and AIS and the almost 100% 5-year disease-free survival with MIA [11], the long-term survival rate of patients with invasive adenocarcinoma remains poor [12][13][14][15][16], with reports of 74.6%-92.4% in the literature. Therefore, the identi cation of invasiveness in pulmonary GGO is essential to the decision-making of clinical treatment and prognosis.
Radiomics refers to the quantitative image features extraction from radiology images [17]. In oncology, tumour radiomic features determined by imaging data can be used to reveal the correlation between diagnosis, prediction and prognosis of cancer patients, and those features often include nodal shape, nodal volume, intensity and a series of "texture" features [18][19][20]. Our purpose was to identify the invasiveness of GGOs by the clinical and radiomic features from the chest CT.
Patients And Methods

Patient selection and grouping
From January 2014 to February 2019, a total of 268 patients (301 GGOs, 2 of which have 3 GGOs, 29 of which have 2 GGO, and the rest were single) who were operated for pulmonary GGOs in the Department of thoracic surgery of Xuanwu Hospital were collected. Inclusion criteria: (1) complete clinical characters, including gender, age, smoking history, family history of lung cancer and pathological type; (2) complete non-enhanced chest CT data within two months before operation; and (3) nal pathological results indicate malignant lesions (including AAH, AIS, Mia and IPS). Exclusion criteria: (1) before the chest CT examination, the patients had undergone puncture biopsy, radiotherapy, radiofrequency ablation or other treatment of GGOs; and (2) the maximum diameter of GGO on CT images was greater than 3 cm.According to the pathological results, the nal cases were divided into two groups: preinvasive lesions (including AAH and AIS) and invasive lesions (including MIA and IA).
2. Chest CT examination and general imaging feature acquisition All preoperative chest CTs were non-enhanced and performed by one of the two machines (Sensation Cardiac 64, Siemens Medical Solutions, Forchheim, Germany; Somatom De nition Flash, Siemens Medical Solutions, Forchheim, Germany). All CT examinations were performed with the following parameters: 120 kVp; pitch, 1.2; 100-200 mAs; and collimation, 5.0 mm. The chest CT images of the patients were analysed by two radiologists. The largest diameter of the tumour (T, tumour) and the consolidation (C, consolidation component) were measured on the lung window and on the mediastinal window separately. The CTR (consolidation/tumour ratio) of pure GGO alone was 0, while that of solid nodules was 1. A nodule that is part-solid had a CTR between 0 and 1 [21]. The nal result is the average of the two radiologists. (2) Stable radiomic feature selection To obtain stable radiomic features, each image data point is subjected to VOI segmentation and radiomic feature extraction twice, the intraclass correlation coe cient (ICC) for each radiomic feature is calculated, and the stable radiomic features were selected as ICC > 0.75.

Selection of prediction factors and establishment of the prediction model
Patients enrolled in our study were divided into a training cohort and a validation cohort. The multivariate logistic regression analysis with backward method was used to select independent predictors from clinical features (including CTR, Maximum_diameter of GGO, age, family history, gender and smoking history) in the training cohort, and receiver operating characteristic (ROC) curves was plotted, with area under curve (AUC) values representing the predictive ability of the clinical prediction model. For radiomic features, the LASSO algorithm and 10-fold cross-validation were used to obtain independent predictors in distinguishing the two pathologic subtypes in the training cohort, ROC curves representing the radiomic prediction model were plotted, and AUC values were calculated. Finally, all meaningful predictors were used to build a combined prediction model, which was compared with the clinical and radiomic prediction models. We also used the validation cohort to validate the predictive ability of the prediction models. A nomogram was constructed to predict invasiveness for individual GGO.

Statistical analysis
The means of continuous variables were compared using the Mann-Whitney U test, and Pearson's chi-square test was used for categorical variables between the two groups by SPSS (version 22.0, Inc., IBM Company, Chicago, Illinois, USA). R software (version 3.5.2, http://www.R-project.org) was also used for data analysis. ICC was calculated using the "psych" package in R. The "MASS" package was used for logistic regression in the clinical features group. The LASSO regression analysis was performed for radiomic features and combined predictor selection by the "ncvreg" package in R. The ROC curve was built by the "pROC" and "ggplot2" packages in R. The "OptimalCutpoints" package was used for cut-off calculation in R. A nomogram was formulated by using the package "rms" in R. The concordance index (C-index), which represents the performance of the nomogram, was calculated with the "rcorrcens" package in "Hmisc" in R. P < 0.05 was set as statistically signi cant. The related computerized programs with R are listed in the Appendix.

Ethics approval and consent to participate
All procedures performed in studies involving human participants were in accordance with the ethical standards of both institutional and national research committees. Written informed consent from either the patients or their representatives was obtained before surgery.  Table 1. The clinical features of the two group of patients were analysed, and there were no signi cant differences in gender (P = 0.651), age (P = 0.382), smoking history (P = 0.685), family history (P = 0.183) and nodule location (P = 0.554) between the preinvasive GGO and invasive GGO groups. However, there were signi cant differences in the Maximum_diameter (P < 0.001) and CTR (P < 0.001) between the two groups. All GGOs were randomly divided into two groups: a training cohort (including 144 GGOs) and a validation cohort (including 51 GGOs).

Radiomic prediction model
After MCP-penalized LASSO regression analysis and 10-fold cross-validation of 567 radiomic features in the training cohort of 144 GGOs, 2 radiomic features were identi ed as independent predictors of invasiveness: waveletLHLglcmIdmn (p < 0.001) and waveletLLLglcmInverseVariance (p < 0.001). ROC curves were drawn based on these radiomic features. In the prediction model, the AUCs of the texture features ranged from 0.678 to 0.724. The predictive ability of a single texture feature for EGFR mutation was poor. The combined predictive ability of all texture features, radiomic_training, was 0.809, indicating improved predictive ability (Fig. 2). The formula for calculating the radiomic prediction model score was as follows: Radiomic-score = -47.088 + 51.501* waveletLHLglcmIdmn -9.871 * waveletLLLglcmInverseVariance.

Combined prediction model
Maximum_diameter, CTR and waveletLHLglcmIdmn were selected from all the clinical and radiomic features by MCP-penalized LASSO regression analysis and 10-fold cross-validation for building the combined prediction model. The ROC curves are shown in Fig. 3. The combined prediction model score was calculated as follows: Combined-score = -23.036 + 1.196 * Maximum_diameter + 3.31 * CTR + 22.383 * waveletLHLglcmIdmn.
The sensitivity, speci city, positive predictive value, negative predictive value, accuracy, AUC and 95% CI of each prediction model in the training and validation cohorts were calculated to show the predictive ability (Table 2). No signi cant difference in AUC values was found between the training and validation cohorts for any of the three prediction models.

Nomogram establishment and validation
Based on the 3 predictors selected in the combined model, a nomogram was constructed to predict individual invasiveness of GGOs, as shown in Fig. 4. The C-indexes of the invasion prediction nomogram were 0.848 (95% CI, 0.814-0.881) in the training cohort and 0.815 (95% CI, 0.755-0.874) in the veri cation cohort. The nomogram was subjected to 1,000 bootstrap resamples for internal validation, and the calibration curve was plotted (Fig. 5). The mean absolute errors of the calibration curves were 0.019 in the training cohort and 0.05 in the validation cohort.

Discussion
Determining the best treatment for pulmonary GGOs has remained a challenge for thoracic surgeons. The main reason is because it is di cult to classify GGOs before surgery, though a new classi cation of pulmonary adenocarcinoma was de ned in 2011.
Preoperative percutaneous CT-guided ne needle aspiration biopsy, endobronchial ultrasonography images and virtual bronchoscopy have been used for the pathological diagnosis of GGOs, but the diagnostic yield remains lower for smaller pGGOs [22][23]. In recent years, an increasing number of studies have been devoted to nding imaging biomarkers for GGO classi cation through chest CT, especially for the identi cation of invasive GGO lesions.
On chest CT, preinvasive GGO often appears as a pure GGO, and invasive GGO is more often a larger, mixed GGO [23][24][25][26]. Eguchi et al. reported that if the diameter of pure GGO is larger than 11 mm, it is more likely to be invasive [27]. Recently, Li et al. reported a cut-off diameter of 13.5 mm for evaluating the invasiveness in GGO nodules [28]. In 2013, Lee SM et al reported that the cut-off diameter for invasive GGO was 14 mm [16]. A study in 2019 reported that in the part-solid group of GGOs, a diameter larger than 1 cm was a signi cant factor for predicting invasiveness [29]. In our study, Maximum_diameter was an independent predictor of invasiveness for GGO, and the mean Maximum_diameter values were 0.97 0.42 cm in the preinvasive group and 1.43 0.63 cm in the invasive group (p<0.01); the cut-off diameter for invasive GGO was 1.3 cm. The proportion of solid components of GGOs is often evaluated by CTR. Studies from 2001 to 2006 reported that CTR>0.5 has been suggested as a predictor for pathological invasiveness [14,[30][31][32][33][34][35]. A multi-institutional prospective study concluded that an adenocarcinoma ≤ 2 cm with a CTR ≤ 0.25 could be considered as a radiologically non-invasive lung adenocarcinoma [36]. In these studies [14,[30][31][32][33][34][35][36], based on the CTR range (0-0.25, 0.25-0.5, 0.5-0.75, 0.75-1) for group statistics, the obtained CTR value was not an accurate cut-off value. In our study, we included CTR as an independent variable into the multivariate regression analysis. In the clinical prediction model, it was determined that CTR > 0.27 was a signi cant predictor of invasive GGO. Zhou et al reported that the combination of CTR > 0. 38 and SUVmax > 1.46 seems to be a better predictor of invasive GGO [37], but their study only enrolled patients with mixed GGOs. In our study, the AUCs of Maximum_diameter and CTR from ROC curves were 0.739 and 0.781, respectively, in the prediction model of invasiveness established by clinical features. The combined predictive ability of Maximum_diameter and CTR was found to be improved (AUC=0.84).
Texture analysis (TA) is an important means of medical image processing, as it can measure the tissue heterogeneity characteristics that cannot be observed by human eyes and can quantitatively display the subtle changes of image pixel value and their arrangement. In recent years, only a few studies had introduced TA and radiomic features of chest CT into the study of differentiating invasive pulmonary GGOs. Hee-Dong Chae et al [38] concluded that in prat-solid GGOs, higher kurtosis and smaller mass could signi cantly differentiate preinvasive lesions from invasive pulmonary adenocarcinoma (IPA). Wei li et al [39] found that the voxel count and correlation features were signi cant differentiators of preinvasive lesions from IPAs and MIAs. Another 2018 study found that pure or mixed lesions (PM) and fractal dimension (FD) were predictors of invasive adenocarcinomas appearing as GGOs [40]. In a recent study [41], a support vector machine (SVM) trained on all the heterogeneity indicators showed higher accuracy in differentiating (88.1%) between indolent and invasive lesions. In the above studies, only two-or threedimensional original texture features were used, and certain amounts of wavelet transform features were excluded. To the best of our knowledge, this is the rst study to incorporate the 107 original features and 8 groups of wavelet features (each group contains 93 wavelet feature factors) into radiomic predictor selection for differentiating the invasiveness of GGO. For those highdimensional data, to avoid over tting in the prediction, we used MCP-penalized LASSO regression and 10-fold cross-validation analysis to identify relevant variables for subsequent establishment of the radiomic prediction model. Finally, waveletLHLglcmIdmn and waveletLLLglcmInverseVariance were selected, and their AUCs were 0.724 and 0.678, respectively.
However, the combination of the two radiomic predictors showed a better predictive ability of invasiveness (AUC = 0.809).

± ±
In the combined prediction model for differentiating the invasiveness of GGO, predictors were re-selected from all 6 clinical features and 567 stable radiomic features. Maximum_diameter, CTR and waveletLHLglcmIdmn were enrolled. The predictive ability of the combined prediction model (AUC = 0.848) was better than that of any single prediction model developed by clinical features (AUC = 0.84) or radiomic features (AUC = 0.809). The nomogram for the individual prediction model was constructed by the three predictors. The C-index (0.848 and 0.815 for the training and validation cohorts, respectively) and the calibration curve showed that the nomogram had good predictive ability.
Our study has some limitations. Due to the small sample size, we did not analyse the pure GGO (pGGO) group alone. Although we speculated that Maximum_diameter and some radiomic features might show a good predictive ability for invasiveness in the pGGO group, further con rmation by a large sample study is needed in the future.

Conclusion
The combined prediction model constructed by clinical and radiomic predictors showed a good predictive ability for invasiveness

Competing interests
There are no con icts of interest.

Consent for publication
Not applicable.

Availability of data and materials
On reasonable request, the datasets used during the current study are available from the corresponding author.

Code availability
All codes used with R are available in the Appendix.

Authors' contributions
Dang YT was responsible for conceiving and designing the study, data analysis, writing of the manuscript and all manuscript revisions. Wang RT and Qian K were responsible for patient data collection. Lu J was responsible for CT data collection and editing of the manuscript. Zhang Y was responsible for project conceptualization, manuscript revisions and editing of the manuscript. All authors read and approved the nal manuscript.    A nomogram incorporating all the signi cant predictors for invasiveness was constructed with the training cohort. The predictors include Maximum_diameter, CTR and radiomic (waveletLHLglcmIdmn). The sum of points received for each variable value is located on the total points axis, and a line was drawn downward to the prediction axis to determine the invasiveness probability.