Prognostic nomograms for predicting overall survival of stage I lung cancer patients: a large population-based study

Background Postoperative prognosis of stage I non-small cell lung cancer (NSCLC) undergoing lymph node dissection is heterogeneous. Therefore, we sought to construct a novel survival prediction model for stage I NSCLC undergoing lymph node dissection. Method: Based on the data from the Surveillance, Epidemiology, and End Results (SEER) program, we successfully determined and incorporated independent prognostic markers to construct the nomogram. The constructed nomogram was further subjected to external validation with an independent cohort of patients. The performance of the survival prediction model was assessed by concordance index, calibration plots, and risk subgroup classication. The good discrimination with a (95% to and good calibration. Calibration plots demonstrated an optimal consistency between the nomogram predicted and actual of survival.


Introduction
Non-small cell lung cancer (NSCLC) is still a major health problem in the world [1]. Although smoking control measures have been introduced in recent years, the incidence of NSCLC has not decreased signi cantly [2,3]. The number of deaths from NSCLC is also on the rise every year [4]. Although stage I NSCLC patients can be cured by surgery, there are still differences in prognosis between patients and patients. At present, there is a lack of an accurate tool for personalized evaluation of the prognosis of patients with stage I NSCLC, which is an urgent clinical issue to be solved.
The National Comprehensive Cancer Network (NCCN) has released the latest NSCLC treatment guidelines. Surgical treatment and lymph node dissection or sentinel lymph node biopsy are recommended in the latest guidelines for patients with stage I NSCLC. The prognosis of patients with NSCLC is related to many factors. Some studies [5,6] have shown that patient age may be an independent factor related to prognosis. Other studies [7][8][9][10][11] have shown that race differences that may be caused by genetic differences may also be independent factors affecting prognosis. Pathological type, pathological grade and clinical stage are all independent factors affecting the prognosis of patients [12].
The mode of operation is also an independent factor affecting the prognosis of patients [13].
Considering so many related factors, it is very necessary to establish an accurate prediction model including all these factors. The purpose of this study was to develop a valid but simple prediction tool for stage I NSCLC to assess the prognosis using only characteristics easily available when starting follow-up.

Patients And Method
Patients For this study, stage I NSCLC patients who were performed surgery were eligible. Patients who underwent surgery included lymph node dissection, no lymph node dissection and sentinel lymph node biopsy were included. Patients with unknown or missing information were excluded. A total of 300410 patients with early lung cancer underwent surgery from the Surveillance, Epidemiology, and End Results (SEER) database between 2003 and 2015. The analysis included age, gender, race, histology, differentiation grade, stage, lymph node dissection mode, and months of survival.

Statistical analysis
All data including demographic, disease, and treatment characteristics were expressed as count (%).
Statistical analysis was performed using the R software (Version 3.6.2; https:// www.R-project.org).
The study included two cohorts, the training cohort and the veri cation cohort. The training cohort was used for establishing Cox proportional hazards model, the nomogram and the survival curves [14]. The veri cation cohort demonstrated accuracy. Multivariate Cox proportional hazard regression was used to determine independent predictors of mortality [15]. P < .05 was considered statistically signi cant. The predicting model was established with independent predictors which were determined by multivariate Cox proportional hazard regression [15]. Calibration curves were plotted to assess the calibration of the nomogram. To quantify the discrimination performance of the nomogram, Harrell's C-index was measured [16]. The veri cation cohort was subjected to calculate a relatively corrected C-index. Survival curves of the use of different therapies in stage IA and IB, younger and older age, and squamous cell carcinoma and adenocarcinoma were evaluated by the Kaplan-Meier method.

Patients' characteristics
In the training cohort, of 23394 patients with stage I NSCLC who received surgery during the study period. 20421(87.3%) patients underwent lymph node dissection, 2860(12.2%) patients did not undergo lymph node dissection, and 113(0.5%) patients underwent sentinel lymph node biopsy. The characteristics of patients in the training and veri cation cohorts are listed in Table 1.

Development of an individualized prognosis model
The results of the multivariate Cox proportional hazard regression among age, sex, race, histology, differentiation grade, stage, and lymph node dissection mode are given in Table 2. The model that incorporated the above independent predictors was developed and presented as the prognosis nomogram ( Figure 1).

Apparent performance of the prognosis nomogram
The C-index for OS prediction was 0.666 (95% CI, 0.652 to 0.681). The calibration plot for the probability of survival at 3 or 5 year after surgery showed an optimal agreement between the prediction by nomogram and actual observation (Figure 2A and 2B).
In veri cation cohort, the C-index of the nomogram for predicting OS was 0.658 (95% CI, 0.650 to 0.665), and a calibration curve showed good agreement between prediction and observation in the probability of 3-or 5-year survival ( Figure 2C and 2D).

Survival curves
The OS curves are shown in Figure 3 based on different cohorts. As presented in the total sample, there is a signi cant difference between lymph node dissection (LND) and non-lymph node dissection (NLND), indicating that the survival rate of LND is better than that of NLND. However, there was no signi cant difference between LND and sentinel lymph node biopsy (SLNB). At the same time, there was signi cant difference between SLNB and NLND. Further analysis was performed to identify the effects of other factors in Figure 3. Patients were divided into six subgroups based on age, histology and stage. In stage IB, older age (≥65) and SCC subgroups, there was no signi cant difference between SLNB and NLND curves. The survival rate of SLNB was better than that of NLND in stage IA, younger age (<65) and adenocarcinoma subgroups. There was no signi cant difference between the curves of SLNB and LND in all subgroups, but there was signi cant difference between the curves of LND and NLND in all subgroups.

Discussion
Nowadays, nomograms are widely used as prognostic devices in oncology and medicine [17][18][19][20]. Nomograms depended on user-friendly digital interfaces, increased accuracy, and more easily understood prognoses to aid better clinical decision making [20]. The purpose of our study is to establish nomogram to predict the prognosis of stage I NSCLC and to guide clinical decision making.
We developed and validated a novel prediction tool for prognosis among stage I NSCLC patients merely using seven easily available variables. Incorporating demographic, disease, and therapy features' risk factors into an easy-to-use nomogram facilitates the stage I NSCLC individualized prediction of prognosis. This study provided a relatively accurate prediction tool of prognosis for stage I NSCLC patients. Validation in the cohort demonstrated good discrimination and calibration power; especially our C-index in the validation identi ed that this nomogram can be widely and accurately used for its large sample size.
In this study, age was an independent predictor. It can be seen that the risk in the older age group (≥65) is higher than that in the younger (<65) age group. The weight score of the older age group (≥65) was 48 in nomogram, while the weight score of the younger (<65) age group was only 0. It demonstrated that for stage I NSCLC, patients in the younger age group can get a better prognosis.
Race was also an independent predictor. However, race had no statistical signi cance (P=0.864) in Cox proportional hazard regression. This factor was included only to make the prediction model evaluation more comprehensive. Another possible reason was that the white group accounts for an excessive proportion (84.4%) of the entire sample, which may have an impact on the statistics. According to another study [21] based on the SEER database, it was pointed out that there were differences in prognosis among different races for advanced lung cancer with bone metastasis. However, there was still no convincing evidence that whether there was a difference in prognosis among different races in early lung cancer.
Gender was another independent predictor. The prognosis of male group was worse than that of female group. In nomogram, the weight score of the male group was 42, while that of the female group was only 0. This may be because men are more likely to smoke than women, and smoking was an independent predictor of lung cancer prognosis.
Pathological grade was also an independent predictor. The pathological grade was divided into four grades from highly differentiated to undifferentiated. The weighted score of undifferentiation in nomogram was 100, indicating that the prognosis of undifferentiated patients was the worst. The scores of low differentiation, middle differentiation and high differentiation were 68, 32 and 0 respectively. It demonstrated that the prognosis of high differentiation was the best. This was also consistent with NCCN lung cancer treatment guidelines. At the same time, pathological classi cation was another independent predictor. The weighted score of adenocarcinomas was 0 in nomogram and 32 in squamous cell carcinoma. It is suggested that the prognosis of adenocarcinoma group was better than that of squamous cell carcinoma group.
As an independent predictor of clinical stage, the prognosis of IB group was worse than that of IA group. In nomogram, the weight score of IB group was 32 and that of IA group was 0. The tumor size of patients in IB group is larger than that in IA group. According to the results of a study, patients in IB group may have tumor micro metastasis, so the prognosis was worse than that in IA group.
The mode of lymph node dissection was an important independent predictor. Current treatment guidelines for stage I NSCLC recommend lymph node dissection or sentinel lymph node biopsy. In nomogram, the weight score of lymph node dissection group was 0 and that of non-lymph node dissection group was 52. This demonstrated that the prognosis of the lymph node dissection group was better than that of the non-lymph node dissection group. At the same time, in the KM survival curves, in different subgroups, the prognosis of the lymph node dissection group was better than that of the nonlymph node dissection group, and the results were statistically signi cant. However, the current cancer treatment is becoming more and more standardized and personalized, so lymph node dissection after sentinel lymph node biopsy may become a more useful way. In our study, the results of the sentinel lymph node biopsy group were not statistically signi cant. This suggests that sentinel lymph node biopsy and lymph node dissection may have similar effects on prognosis. Some studies [22] have shown that patients with positive results of sentinel lymph node biopsy followed by lymph node dissection may have a better prognosis. Direct lymph node dissection without sentinel lymph node biopsy may cause more complications and affect the long-term prognosis of patients [23,24]. Based on these results, sentinel lymph node biopsy after surgical resection of the tumor may be a better way.

Conclusion
This study developed a novel nomogram with a relatively good accuracy to help clinicians access the prognosis in stage I NSCLC patients when starting follow-up. For better prognosis, sentinel lymph node biopsy was recommended for patients with stage I NSCLC.  Notes Removed means lymph node dissection; None means no lymph node dissection; Sentry means sentinel lymph node biopsy. Figure 1 Constructed nomogram for predicting survival in patients with stage I non-small cell lung cancer undergoing lymph node dissection. SC, squamous cell carcinoma.