Development and Validation of a Nomogram to Predict the Overall Survival of Patients with Stage I Non-Small Cell Lung Cancer After Surgery

Background: This study aimed to establish an effective prognostic nomogram for stage I non-small cell lung cancer (NSCLC) patients receiving surgery. Methods: Stage I NSCLC patients at Taizhou Hospital of Zhejiang Province from October 2010 to June 2014 were included. Clinical, pathological and laboratory variables ascertained before surgery were selected through least absolute shrinkage and selection operator and Cox regression analysis, and a nomogram model was established. The nomogram model was validated using stage I NSCLC patients from the same institution between July 2014 and December 2015. Results: Training cohort included 274 patients. From 46 potential predictors, the independent risk factors for Overall survival (OS) of patients with stage I NSCLC were age, CA125, and tumor diameter. The C index of the nomogram model was 0.759(95% condence interval (CI)(cid:0)0.666–0.852). The area under the ROC curve of the nomogram was 0.776. The patients in the low-risk subgroup had signicantly better survival than those in the high-risk subgroup (logrank, p < 0.001). The AUC in the validation cohort was 0.809, and the C index was 0.707(95% CI, 0.572-0.842). Conclusion: We developed a nomogram model that includes clinical, pathological and laboratory variables to predict the OS of stage I NSCLC patients. node dissection with microscopically radical resection (R0);histopathologically conrmed NSCLC. Exclusion criteria: second primary tumor; history of malignant tumors before NSCLC surgery; patients with tumor-node-metastasis (TNM) staging systems from 2 to stage 4 (cid:0) Radiotherapy after surgery.


Introduction
Lung cancer is the most common cause of cancer death in the world, and 85% of patients have non-small cell lung cancer (NSCLC) [1]. With the increasing use of high-resolution computed tomography (CT) and low-dose CT in high-risk groups of lung cancer screening, an increasing number of early lung cancers have been diagnosed. Surgical treatment is still the most effective means of treatment for early NSCLC patients [2].
Nomograms have been widely used to predict the prognosis of cancer patients because they can integrate all prognostic factors into a model and thus better stratify cancer patients [3][4][5]. Although there are many nomograms for the prognosis of NSCLC patients after surgery, there are still few nomograms for the prognosis of stage I NSCLC patients. The ve-year survival rate for stage I NSCLC patients generally ranges 73-90% [6,7],and their survival after surgery is still di cult to assess. Therefore, we need nomograms to help clinicians nd high-risk groups and carry out effective long-term follow-up and treatment.
Our goal was to establish a nomogram containing clinical and pathological data and laboratory indicators to predict survival of stage I NSCLC patients.

Materials And Methods
Continuous variables were expressed as the mean (standard deviation) or median (interquartile range ).
Nonparametric tests (Mann-Whitney U) were used for continuous variables. Percentages were used to describe categorical variables. The comparison of categorical variables was performed by the X2 test. A two-sided p-value of < 0.05 was considered statistically signi cant. Forty-eight variables were selected by least absolute shrinkage and selection operator (LASSO) regression. The most predictive covariates were selected by the minimum (λ min). The selected variables were analyzed by multivariate Cox regression.
The most meaningful variables were selected into the nomogram model [8,9]. The performance of the nomogram was compared by discrimination (Harrell's concordance index) and calibration (calibration plots). The application of the nomogram model in the training cohort and the validation cohort was compared with the receiver operating characteristic curve (ROC curve). The total score of each patient was calculated through the established nomogram model, and then risk strati cation was performed by X-TILE software. The log-rank test was used to compare the survival time for categorical variables, and survival curves were depicted using the Kaplan-Meier method. R software version 4.0.2 was used to develop the nomogram and perform the decision curve and LASSO regression analyses.

Characteristics of the Validation Cohort
We collected 274 stage I NSCLC patients who had undergone surgery. The median survival time was 45.3 months (range 2.0-78.5). A total of 28 patients died, and the overall mortality rate was 10.2%. The 1-, 3-,5year survival rate respectively was 96.72%, 92.34%, 89.78%. A total of 151(55.1%) patients were male, only 8% had vascular invasion, and 15% had pleural invasion. There were 119 smoking patients, accounting for 43.4%. The tumor size was 2.0±0.9 cm. There were 58 patients with squamous cell carcinoma (21.2%). There were 112 patients (40.9%) with tumors located in the left lung, and 87 patients (31.8%) who underwent postoperative chemotherapy (Table 1). Laboratory ndings of the training cohort are presented in Table 2.

Variable selection
Forty-six variables in the training cohort were included in the LASSO regression. Five variables remained signi cant predictors of survival of stage I NSCLC patients after surgery, including age, CA125, pathological type, vascular invasion and tumor size (Supplement Figure 1). Through Cox regression analysis, it was found that these three variables were independent risk factors for predicting the OS of stage I NSCLC patients. These variables included age (HR=3.849; 95% con dence interval (CI),

Nomogram construction
Three independent risk factors for predicting the OS of patients were found, and a nomogram was constructed ( Figure 1A). The C-index of the nomogram was 0.759[95% CI,0.666-0.852]. The calibration curves showed that the nomogram predicted the survival rate of stage I NSCLC patients at 1, 3, and 5 years, and there was good agreement between the actual observations ( Figure 2A/2B/2C). We also compared the predictive ability for OS between tumor size, CA125, the nomogram and TNM staging systems. The C-index was 0.759, which was signi cantly higher (P <0.001) than that of the CA125 (0.503), tumor size(0.665), and TNM staging system (0.709). Decision curve analysis (DCA) and clinical impact curve analysis showed that the nomogram had better overall net income than tumor size, and TNM staging ( Figure 1B

Risk strati cation of the nomogram in predicting OS
Through the nomogram, we calculated the total score of each patient in the training cohort. The patients were divided into two groups by X-tile software: the low-risk group (<98.8) and the high-risk group (≥98.8) (Supplementary Figure 2). In the training cohort, the survival time of the low-risk group was 47.1±11.7 months, and the survival time of the high-risk group was 41.9±20.7 months (P=0.027) ( Figure 3C). The patients in the low-risk subgroup had signi cantly better survival than those in the high-risk subgroup (logrank, p < 0.001). In the veri cation group, the low-risk survival time was 41.4±5.7 months, and the high-risk group was 37.6±9.9 months (P=0.008) ( Figure 3D). In the validation group, the high-risk group had a worse survival rate than the low-risk group (logrank, p < 0.001).

Discussion
In this study, we established a nomogram model for the postoperative survival of patients with stage I NSCLC. The nomogram model incorporated three variables: CA125, age and tumor size. The nomogram model had good discrimination ability, the C-index of the training cohort was 0.759, and the C-index of the validation cohort was 0.707. The analysis of DCA and ROC curves also further showed that the nomogram model had better net clinical bene ts. We also divided patients into low-risk groups and highrisk groups, which can better guide clinicians in the follow-up and treatment of stage I NSCLC patients after surgery.
In recent years, there have been a large number of nomogram studies on tumor prognosis, including breast cancer [10,11], colorectal cancer [12], gastric cancer [5] and so on. These studies showed that the nomogram model had a better predictive value for the prognosis of tumors than the traditional TNM staging system. There are many nomogram models for the prognosis of lung cancer. For example, a nomogram was established to predict brain metastasis in NSCLC patients after radical surgery [13]. There are also many nomogram models for predicting disease-free survival after surgery for early NSCLC (pathological stage I and stage II) [14][15][16]. However, there are still few nomogram models for predicting the survival rate of stage I NSCLC patients that include clinical data, pathological data, and laboratory indicators.
In this study, 48 variables, including clinical and pathological data and laboratory indicators, were selected through LASSO regression. Cox regression analysis found that there were three independent risk factors for postoperative survival in stage I NSCLC patients: age, tumor size, CA125. Many research reports have shown that age is a risk factor for poor postoperative prognosis in lung cancer patients, whether in early or advanced lung cancer [17][18][19]. Tumor size was an major factor for the prognosis of NSCLC patients, especially stage I NSCLC patients [20]. TNM staging system also fully considered the tumor size, but the TNM staging system only analyzed the tumor size according to 1 cm, 2 cm, and 3 cm sizes. We believe that the actual tumor size will have a great in uence on OS of stage I NSCLC patients. Therefore, this paper analyzed tumor size as a continuous variable, which may have better predictive value for the prognosis of with stage I NSCLC patients, which was also consistent with the study of Cao et al. [21]. Laboratory indicators are generally not included in other nomogram. But laboratory indicators are easy to obtain and easy to monitor, So the nomogram also includes laboratory indices such as serum tumor markers. CA125, as common tumor markers, have great value in predicting the prognosis of tumors. In some studies, it has also been shown that elevated CA125 is an independent risk factor for the survival of NSCLC patients [22,23].
This study combined clinical, pathological and laboratory variables to establish a nomogram model to predict the postoperative survival of stage I NSCLC patients. The calibration curve showed that there was good agreement between the predicted survival rates of stage I NSCLC patients at 1, 3, and 5 years after surgery and actual observations. The C-index also showed the good predictive ability of the nomogram model (0.759for the training cohort and 0.707 for the validation cohort). Both DCA and the clinical impact curve indicated that the nomogram model had better overall net bene ts than the TNM staging system.
The 5-year survival rate of patients with stage I NSCLC was relatively high. Therefore, not all patients need additional treatment and frequent follow-up.
.In this study, the risk strati cation of patients with stage NSCLC was based on the score of the nomogram model. Not all patients with stage NSCLC need adjuvant therapy .For stage NSCLC patients in the low-risk group, regular follow-up is enough, while for patients in the high-risk group, more careful follow-up examination and adjuvant therapy may be required. This could also help clinicians make individualized treatment decisions.
This study also had some shortcomings, as follows: 1. This study was a single-center study, and there may be regional biases. 2. A relatively small sample size will limit statistical analysis. Although the nomogram model had been veri ed internally and externally to verify the accuracy of the model, data Page 7/16 from different regions still needed to be further veri ed. 3. This study failed to include some important genetic factors, such as EGFR mutations.

Conclusions
In summary, we developed and veri ed a nomogram model of postoperative survival of stage NSCLC patients, including clinical, pathological and laboratory indicators. Through this model, clinicians could better identify patients with poor prognosis and help them provide more personalized treatment decisions.

Declarations
Ethics approval and consent to participate The study was approved by the Ethics Committee of Taizhou Hospital of Zhejiang Province.

Consent for publication
In cases where images are entirely unidenti able and there are no details on individuals reported within the manuscript, consent for publication of images may not be required.

Availability of data and materials
The authors declare that the data supporting the ndings of this study are available within the article and additional supporting les.

Competing interests
The authors declare that they have no competing interests

Funding
The authors declare that they have no funding.
Authors' contributions YX, WL, and LH contributed to conception and design of the study. CL organized the database. CS performed the statistical analysis. YX wrote the rst draft of the manuscript. WL and CS wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version