A total of 639 consecutive patients for lung cancer at Nanfang Hospital (Guangzhou, Guangdong, China) were identified retrospectively between January 2010 and December 2019. The medical records and histopathology reports were retrospectively analyzed. Inclusion criteria considered the conditions that (1) Patients with primary NSCLC underwent radical resection with routine systematic lymph nodes dissection. (2) The post-surgical diagnosis was NSCLC with the absence or presence of LNM. The exclusion criteria included (1) Patients with non-NSCLC such as mixed carcinoma, polymorphic carcinoma, atypical carcinoid, and mucoepidermoid carcinoma. (2) Radiotherapy or chemotherapy before surgery. (3) Patients underwent an incomplete resection or were without mediastinal node dissection. (4) Missing key information. Finally, 152 eligible patients were enrolled. The inclusion and exclusion criteria process is shown in Fig. 1. The institutional review board at Nanfang Hospital approved the ethical approval and the informed consent requirement was waived.
Based on literature reviews [15, 18, 20–24], clinicians and pathologists' opinions on factors that may increase the risk of lymph node metastasis combined with our existing data, we identified 17 potential predictors including patients of basic information, clinicopathologic parameters, and immunohistochemical features. Except for the maximum tumor diameter, which is a continuous variable, the others are categorical variables. The definition of predictors is shown in Appendix A1.
The outcome is lymph node status for NSCLC patients. All the patients underwent anatomical lung resection and systematic nodal dissection by thoracic surgeons. Experienced pulmonary pathologists histologically assessed all resected tumor specimens and nodal samples, and a final diagnosis was evaluated based on the WHO classification. The medical records and histopathology reports were retrospectively analyzed.
Model Development And Validation
We randomly divided the complete original dataset into 80% as the training set and 20% as the validation set. The analysis process consists of four main stages: data preprocessing, feature selection, construction of prediction models, and model evaluation. The first stage includes: data preprocessing incorporates two items of the data standardization, processing of missing values. As long as one categorical variable data is missing, we delete the row of data. For continuous variables, random forest imputation was used. Secondly, in the selection models, least absolute shrinkage and selection operator (LASSO) algorithms under 10-fold cross-validation were used to select features related to the outcome models. In the full models, we chose all features to build the models. In the third stage, based on the selected features, we conducted three algorithms included random forest, support vector machine, and penalized logistic regression to build prediction models. Finally, we used the area under the receiver operating characteristic (AUC), accuracy, sensitivity, specificity, calibration curves and decision curve analysis (DCA) to evaluate models. The calibration curve is usually used to evaluate the consistency or the degree of calibration, that is, the difference between the predicted value and the true value. DCA is a novel method of assessing the clinical prediction model used to help identify high-risk patients for intervention and low-risk patients to avoid over-treatment.
Statistical analysis was conducted with R software (version 4.0.2; http://www.Rproject.org) and Python (version 3.7.0; https://www.python.org). The models were programmed in Python using the sklearn library. The AUROC curves, calibration curves and DCA were generated using the “ggplot2” package, “rms" package and "dca" package respectively. See the appendix A2 for the adjusted parameters of each algorithm. The Chi-square test was tested to compare the two counting data sets and two independent sample t-test for measurement data. Statistical analyses were all two-sided, with the p-value set at .05 while the p-value was set at. 01 in the Delong test. The AUC, accuracy, sensitivity and specificity were used for testing discrimination. The calibration curves were performed for testing calibration. The DCA was used to test clinical use.