Characteristics of the training and validation cohorts
In total, 396 SPNs patients were included in this retrospective study, including 295 patients from Sun Yat-sen University Cancer Center. Clinical, CT image, and laboratory data were presented in Supplement table 1.. And other 101 patients from Henan Tumor Hospital were used for external validation (Supplement table 2). The mean age (SD) of patients in the training cohort was 57.0 (11.0) years; 192 patients (65.1%) were men and 189 (64.1%) patients were diagnosed as MSPNs, including 163 (86.2%) adenocarcinoma, 17 (9.0%) squamous cell carcinoma and 9 (4.8%) others. In the external validation cohort, the amounts for adenocarcinoma, squamous cell carcinoma, and others were 60 (91.0%), 3 (4.5%), and 3 (4.5%), respectively.
To select the potential predictors for predicting malignancy of SPNs, we used LASSO logistic regression analysis. Figure 1A showed the change in trajectory of each variable was analyzed. Moreover, 10-fold cross-validation was employed for model construction, and the confidence interval under each λ was presented in Figure 1B. According to the 1-SE criteria, we selected λ = 0.044 as the optimal value for the model, which included 11 potential predictors (age, previous cancer history, diameter, spiculation, calcification, pleural stretch, VC, FEV1, DLCO1, CEA, and NSE) with non-zero coefficients from the 63 candidate variables identified in the training cohort. The clinical and laboratory data of these selected predictors in training cohort, validation cohort, and external validation cohort were presented in Table 1.
Construction and evaluation of the novel prediction model
For predicting each individual patient’s malignancy risk, the risk score was calculated for each patient with the following formula:
Risk score = -1.137 + (0.036*age) + (0.380*previous cancer history) + (0.195*diameter) + (0.016* spiculation) - (0.290*calcification) + (0.026*pleural stretch) - (0.168*VC) - (0.236*FEV1) + (0.052*DLCO1) + (0.018*CEA) + (0.004*NSE).
Subsequently, we used the following formulas to calculate the probability of malignancy: probability (P) = erisk score /(1+erisk score), where e is the natural logarithm, the values for the continuous variables were medical recorded; the value for the previous cancer history, spiculation, calcification, pleural stretch, equals 1 if the element exists, and 0 otherwise.
Finally, the calibration of model was analyzed using HL test. The new prediction model showed good calibration with the HL test (P = 0.964, Supplement Figure 1A). The AUC for the novel model was 0.768 (95% CI: 0.716 - 0.815), a P value of 0.58 was ultimately selected as a cut-off point and P values > 0.573 should be considered a malignant disease. The sensitivity of this model for the training cohort was 78.84% (72.3%-84.4%), specificity = 61.32% (51.4%-70.6%), positive likelihood ratio (LR+) = 2.04, and negative likelihood ratio (LR-) = 0.35.
Validation of the novel prediction model
The performance of the novel prediction model was validated in the external validation cohort. According to the formula constructed in the training cohort, a risk score and probability of malignancy were calculated for each patient in the validation set. Then the discrimination and the calibration of the model were assessed using ROC, calibration curve, and the HL test were performed. In the external validation cohort, the AUC was 0.718 (95% CI: 0.620 - 0.803), the sensitivity, specificity, LR+, and LR- of model was 81.82%, 40.00%, 1.36, and 0.45. In addition, calibration curve and HL test reflected the new model had a high accuracy of the model for predicting MSPNs in the external validation cohort (P = 0.950, Supplement Figure 1B).
Assessment the performance of our model, PKUPH model, Shanghai model, and Mayo model for SPNs screening using ROC analysis, DCA, NRI and IDI
The data for training, validation and external validation cohorts were substituted into our proposed model, PKUPH model, Shanghai model, and Mayo model to generate the respective ROC curves (Figure 2 and Table 2). For the training cohort, the AUC of the three models was 0.768, 0.659, 0.728, and 0.602, respectively. The AUC of our model was higher than the PKUPH model (P < 0.001), Shanghai model (P = 0.180), and Mayo model (P < 0.001). In the external validation cohort, the AUC of the four models was 0.718, 0.674, 0.632, and 0.562, respectively. The AUC of our model was also higher than the PKUPH model (P = 0.404), Shanghai model (P = 0.048), and Mayo model (P = 0.007).
DCA was employed to evaluate the clinical utility of the four models in the training and external validation cohorts (Figure 3). The x-axis of the decision curve was the threshold of the predicted probability using the models to classify MSPNs patients and BSPNs patients. The y-axis shows the clinical decision net benefit for patients based on the classification result in this threshold. The decision curves of the treat-all scheme and the treat-none scheme were used as references in the decision curve analysis. Our model (red) showed had a higher overall net benefit than PKUPH model (black), Shanghai model (blue), and Mayo model (brown) both in the training and external validation cohorts. The application of our model was associated with reasonably good clinical utility across the three data.
The improvement in the predictive accuracy of our proposed model as compared to the PKUPH model and Mayo model, which was estimated by calculating the NRI and IDI in the training and external validation cohorts (Table 3). Comparing our model to PKUPH model, Shanghai model, and Mayo model, the changed in NRIs of the training and external validation cohorts were 0.177 (P = 0.005) and -0.035 (P = 0.726), 0.127 (P = 0.058) and 0.027 (P = 0.769), 0.396 (P < 0.001) and 0.249 (P = 0.008), respectively. The changed in IDIs of the training and external validation cohorts were -0.019 (P = 0.433) and -0.043 (P = 0.341), -0.076 (P = 0.005) and -0.017 (P = 0.709), 0.112 (P < 0.001) and 0.086 (P < 0.001), respectively. These results indicated that the new model could supplement the deficiencies of the two models in predicting MSPNs.
Comparison of the sensitivity, specificity, positive likelihood ratio, negative likelihood ratio of the four models analyzed in this study
Comparison of the sensitivity, specificity, LR+, LR- of the three models in the four independent cohorts of patients (Supplement Table 3). The threshold of our model and Shanghai model was 0.58 and 0.67, respectively. And the threshold of PKUPH model and Mayo model were used literature reports as 0.463 and 0.10, respectively. In the training cohort, the performance of our model were: sensitivity: 78.84% (95% CI: 72.3%-84.4%); specificity: 61.32% (95% CI: 51.4%-70.6%); LR+: 2.04 (95% CI: 1.7-2.4); and LR-: 0.35(95% CI:0.2-0.5); for PKUPH model, sensitivity was 85.19% (95% CI: 79.3%-90.4%), specificity was 34.91% (95% CI: 25.9%-44.8%); LR+: 1.31 (95% CI: 1.0-1.7); and LR-: 0.42 (95% CI: 0.3-0.6); for Shanghai model, sensitivity was 70.9% (95% CI: 63.9%-77.3%), specificity was 87.74% (95% CI: 79.9%-93.3%); LR+: 2.16 (95% CI: 1.7-2.3); and LR-: 0.45 (95% CI: 0.3-0.6); for Mayo model, sensitivity was 26.46% (95% CI: 20.3%-33.3%), specificity was 87.74% (95% CI: 79.9%-93.3%); LR+: 2.16 (95% CI: 1.7-2.8); and LR-: 0.84 (95% CI: 0.5-1.4). The specificity our model were better than PKUPH model, whereas the sensitivity was lower than PKUPH model, and the sensitivity of our model had a good performance than Shanghai model and Mayo model, but the specificity was worse than Shanghai model and Mayo model. There had inconsistent results in the external validation cohorts. Comparison of the four models at their respective thresholds in the two cohorts were inconclusive: each model has its own merits and demerits in predicting MSPNs.
Building and validating combined predictive nomogram
In order to combine the merits of each model in predicting MSPNs, a combined nomogram was constructed from our model, PKUPH model, Shanghai model, and Mayo model, to predict malignancy of SPNs in training cohort and external validation cohort (Figure 4A, B respectively). Each model was assigned a point. As an example, locate our model risk score, draw a line straight upward to the "Points" axis to determine how many points associated with that model risk score. Repeat the process for each model, sum the points achieved for each covariate, and locate the sum on the "Total Points" axis. Final draw a line straight down to find the patient’s risk of malignance. The AUC of combined nomogram was 0.789 for the training set, and an AUC of 0.735 for the external validation set, which were higher than those models alone. Then the calibration curves for the probability of malignancy were used to assess the agreement between the predicted and actual observation in training cohort, validation cohort, and external validation cohort (Figure 4C, D respectively). The calibration plots showed a good match between the prediction by nomogram and actual observation. All the results revealed the improvement of SPNs discrimination using the combined nomogram.
The correlation between the novel prediction, PKUPH, Shanghai, and Mayo models
Figure 5 and Supplement Table 4 showed the correlations between the novel prediction model, PKUPH model, and Mayo model in training cohort (A) and external validation cohort (B). Pearson's correlation coefficients (PCC) was computed to determine the interrelationship between the three models. The results revealed that the new prediction model was significantly and positively correlated with PKUPH model (PCC: training cohort: 0.669, P < 0.001; external validation cohort: 0.586, P < 0.001), Shanghai model (PCC: training cohort: 0.613, P < 0.001; external validation cohort: 0.665, P < 0.001), and Mayo model (PCC: training cohort: 0.429, P < 0.001; external validation cohort: 0.379, P < 0.001), indicating that our analysis results had credible prediction value.