Ethics statement
All the data used in our study came from the publicly available SEER database with permission granted to access these research data (SEER*Stat version 8.3.8 username: 16809-Nov2019). The Nov 2018 Sub (1973–2016 varying) datasets were selected for analysis. Participant informed consent was not required for this study as SEER data is publicly available and de-identified.
Data Source And Study Population
The SEER data set diagnosed between 2004 and 2015 in our study was restricted to NSCLC in stage I and tumor size ≤ 3cm. The extracted clinical information included the following: patient ID, age at diagnosis, sex, marital status at diagnosis, laterality, primary site, histologic type, grade, surgery at primary site, scope of regional lymph node surgery, the number of regional lymph nodes examined, the 6/7th edition TNM stages, T stage, N stage, M stage, Visceral Pleural Invasion, CS extension, Separate Tumor Nodules-Ipsilateral Lung, survival months, vital status recode, first malignant primary indicator, and sequence number.
Data Processing
Patients who met all of the eligibility criteria were enrolled in the study: (1) age > 18 years; (2) pathologically confirmed NSCLC (histologic types were classified as squamous carcinoma, adenocarcinoma, large cell carcinoma, and NSCLC); (3) diagnosis of stage IA NSCLC according to the the 8th TNM classification of American Joint Committee on Cancer (AJCC) staging manua[12]. The AJCC 8th stage was calculated according to tumor size, extension, visceral pleural invasion and 6/7th TNM stages. (4) surgical performed by lobectomy, segmentectomy or wedge resection. The exclusion criteria include:(1) pathologically confirmed small cell lung cancer or all subtypes of sarcoma, (2) age < 18, (3) tumor located in the main bronchus and have positive visceral pleural invasion (VPI), which could not be classified in stage IA according to the 8th edition AJCC manual.
The study variables obtained from the datasets include: age, sex, race, marital status, tumor size, histologic type, grade, first malignancy, malignancy sequence, surgery, primary site, laterality, lymph node scope region, lymph node examined number. Tumor size and lymph node examined number were numerical variable, and other variables were treated as factor variable.
Statistical Analysis
The whole eligible patient cohort was randomly divided into a training set, and a validating set at a ratio of 7:3. The training set was used to develop the model and the validation set was used to evaluate the prediction accuracy of the model. In the training set, survival prognosis was investigated with univariate Cox regression model followed by multivariate Cox regression model to recognize the independent prognostic factors. In addition, the VIF and correlation coefficient were calculated between all independent prognostic factors to identify multicollinearity. When two independent variables were identified multicollinearity, only one could be included in the final factors. Finally, independent prognosis risk factors included in constructing Cox regression model and nomogram for Stage IA NSCLC patients were identified.
During validation of the nomograms, the total points for each patient were calculated according to the established nomograms and then Cox regression was performed using the total points as predictor in the validation cohort. The performance of the nomogram was evaluated by concordance index (C-index)[13], calibration curve and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. ROC curves were used for the sensitivity and specificity of the nomogram. Finally, decision curve analysis (DCA) performed with the Decision Curve package[14] was used to evaluate the clinical benefits and utility of the nomogram compared with AJCC 8th TNM stage alone in predicting of 1-, 3-,and 5-year survival rates. To clarify the descriptive power of the nomogram, all patients in training and validation set were divided into four subgroups according to quartiles of nomogram-derived total scores. Then survival analyses were carried out using Kaplan − Meier method and differences in OS and LCSS were examined using the Log-Rank test.
In addition, we compared the prognostic predication power between our established nomogram and AJCC 8th stage (IA1, IA2 and IA3) by the Cox regression analysis and C-index. We also compared the benefit of OS and LCSS between chemotherapy and non-chemotherapy for stage IA patients in all sub-groups according to nomogram scores. To minimize the potential selection bias, propensity score matching (PSM) was performed to balance the confounding factors between the two groups using the nearest matching method with a 1:1 ratio. Patients were matched by the variables as following: age, sex, marital status, year diagnosis, race, Primary site, Histologic type, Grade, Surgery, Laterality, LN Scope Reg, LN exam Num, tumor, size and Stage eight.
All statistical computations were done using the statistical programming language R (R version 3.6.3, https://www.r-project.org). Statistical differences of distribution in demographic and clinical characteristics between the training and validation cohorts were evaluated using the CBCgrps package. The rms, foreign, survival packages were used to construct the nomogram. The survminer package was used to cumulative survival time by Kaplan-Meier method and compare the differences in survival curves by log-rank test. Spearman correlation matrix (R package PerformanceAnalytics) and variance inflation factors (vif function in R package car) were computed to evaluate possible collinearity among explanatory variables. Multicollinearity was explored using the VIF and correlation coefficient, considering values of VIF > 5 or correlation coefficient of > 0.7 indications of collinearity between variables[15, 16]. The MatchIt package was used to complete the PSM between two group of chemotherapy and non-chemotherapy. P < 0.05 was considered as statistically significant.