Nomogram Predicting the Prognosis of Patients with Surgically Resected Stage IA Non-small Cell Lung Cancer

The American Joint Committee on Cancer (AJCC) 8th stage system was limited in accuracy for predicting prognosis of stage IA non-small cell lung cancer (NSCLC) patients. This study aimed to establish and validate two nomograms that predict overall survival (OS) and lung cancer–specific survival (LCSS) in surgically resected stage IA NSCLC patients. Postoperative patients with stage IA NSCLC in SEER database between 2004 and 2015 were examined. Survival and clinical information according to the inclusion and exclusion criteria were collected. All patients were randomly divided into the training cohort and validation cohort with a ratio of 7:3. Independent prognosis factors were evaluated using univariate and multivariate Cox regression analyses, and predictive nomogram was established based on these factors. Nomogram performance was measured using the C-index, calibration plots, and DCA. Patients were grouped by quartiles of nomogram scores and survival curves were plotted by Kaplan–Meier analysis. In total, 33,533 patients were included in the study. The nomogram contained 12 prognostic factors in OS and 10 prognostic factors in LCSS. In the validation set, the C-index was 0.652 for predicting OS and 0.651 for predicting LCSS. The calibration curves for the nomogram-predicted probability of OS and LCSS showed good agreement between the actual observation and nomogram prediction. DCA indicated that the clinical value of the nomograms were higher than AJCC 8th stage for predicting OS and LCSS. Nomogram scores related risk stratification revealed statistically significant difference which have better discrimination than AJCC 8th stage. The nomogram can accurately predict OS and LCSS in surgically resected patients with stage IA NSCLC.


Background
In 2018, more than 1.7 million cancer deaths worldwide were caused by lung cancer that remains the most common cancer (11.6% of all cancers) and the main cause of cancerrelated deaths [1]. Non-small cell lung cancer (NSCLC) is the major pathological type, consisting of 85% of all types of lung cancer and was classified by WHO into 3 main types: adenocarcinoma, squamous cell carcinoma, and large cell [2]. The AJCC staging system is used worldwide and considered has a great impact on prognosis of NSCLC.
Even for patients with stage IA NSCLC in AJCC 8th stage, the 5-year OS is reported 77-92% [3]. The postoperative adjuvant chemotherapy and targeted therapy were not recommended according to the National Comprehensive Cancer Network (NCCN) guidelines. Find a more effective method to stratify IA NSCLC patients with obvious prognostic difference may be helpful to identify patients who have the potential to receive benefit from postoperative adjuvant therapy. Studies have proposed that other independent prognostic factors, aside from AJCC staging system could significantly contribute to individualized prediction of overall survival (OS) and lung cancer-specific survival (LCSS) in early stage NSCLC [4][5][6].
Nomogram is a graphical illustration of a mathematical model, in which different factors are pooled together to predict a definite endpoint, and has been utilized as a convenient and reliable tool to predict the outcome of cancer patients [7]. Several studies revealed that the constructed nomograms shown to be more accuracy in prediction of prognosis than AJCC staging system [8][9][10][11]. However, nomograms for predicting survival outcomes in surgically resected stage IA NSCLC have not been published.
In this study, we aimed to develop clinical nomograms for predicting OS and LCSS of the current stage IA surgically resected NSCLC patients using the surveillance, epidemiology, and end results (SEER) database.

Ethics Statement
All the data used in our study came from the publicly available SEER database with permission granted to access these research data (SEER*Stat version 8.3.8 username: 16,809-Nov2019). The Nov 2018 Sub (1973-2016 varying) datasets were selected for analysis. Participant informed consent was not required for this study as SEER data is publicly available and de-identified.

Data Source and Study Population
The SEER data set diagnosed between 2004 and 2015 in our study was restricted to NSCLC in stage I and tumor size ≤ 3 cm. The extracted clinical information included the following: patient ID, age at diagnosis, sex, marital status at diagnosis, laterality, primary site, histologic type, grade, surgery at primary site, scope of regional lymph node surgery, the number of regional lymph nodes examined, the 6/7th edition TNM stages, T stage, N stage, M stage, Visceral Pleural Invasion, CS extension, Separate Tumor Nodules-Ipsilateral Lung, survival months, vital status recode, first malignant primary indicator, and sequence number.

Data Processing
Patients who met all of the eligibility criteria were enrolled in the study: (1) age > 18 years; (2) pathologically confirmed NSCLC (histologic types were classified as squamous carcinoma, adenocarcinoma, large cell carcinoma, and NSCLC); (3) diagnosis of stage IA NSCLC according to the 8th TNM classification of American Joint Committee on Cancer (AJCC) staging manual [12]. The AJCC 8th stage was calculated according to tumor size, extension, visceral pleural invasion, and 6/7th TNM stages; and (4) surgical performed by lobectomy, segmentectomy, or wedge resection.
The exclusion criteria include the following: (1) pathologically confirmed small cell lung cancer or all subtypes of sarcoma; (2) age < 18; and (3) tumor located in the main bronchus and have positive visceral pleural invasion (VPI), which could not be classified in stage IA according to the 8th edition AJCC manual.
The study variables obtained from the datasets include the following: age, sex, race, marital status, tumor size, histologic type, grade, first malignancy, malignancy sequence, surgery, primary site, laterality, lymph node Scope Region, and lymph node examined number. Tumor size and lymph node examined number were numerical variable, and other variables were treated as factor variable.

Statistical Analysis
The whole eligible patient cohort was randomly divided into a training set, and a validating set at a ratio of 7:3. The training set was used to develop the model and the validation set was used to evaluate the prediction accuracy of the model. In the training set, survival prognosis was investigated with univariate Cox regression model followed by multivariate Cox regression model to recognize the independent prognostic factors. In addition, the VIF and correlation coefficient were calculated between all independent prognostic factors to identify multicollinearity. When two independent variables were identified multicollinearity, only one could be included in the final factors. Finally, independent prognosis risk factors included in constructing Cox regression model and nomogram for Stage IA NSCLC patients were identified.
During validation of the nomograms, the total points for each patient were calculated according to the established nomograms and then Cox regression was performed using the total points as predictor in the validation cohort. The performance of the nomogram was evaluated by concordance index (C-index) [13] and calibration curve. Finally, decision curve analysis (DCA) performed with the Decision Curve package [14] was used to evaluate the clinical benefits and utility of the nomogram compared with 8th AJCC TNM stage alone in predicting of 1-, 3-, and 5-year survival rates. To clarify the descriptive power of the nomogram, all patients in training and validation set were divided into four subgroups according to quartiles of nomogram-derived total scores. Then, survival analyses were carried out using Kaplan − Meier method and differences in OS and LCSS were examined using the Log-Rank test.
All statistical computations were done using the statistical programming language R (R version 3.6.3, https:// www.rproje ct. org). Statistical differences of distribution in demographic and clinical characteristics between the training and validation cohorts were evaluated using the CBCgrps package. The rms, foreign, and survival packages were used to construct the nomogram. The survminer package was used to cumulative survival time by Kaplan-Meier method and compare the differences in survival curves by log-rank test. Spearman correlation matrix (R package PerformanceAnalytics) and variance inflation factors (vif function in R package car) were computed to evaluate possible collinearity among explanatory variables. Multicollinearity was explored using the VIF and correlation coefficient, considering values of VIF > 5 or correlation coefficient of > 0.7 indications of collinearity between variables [15,16]. P < 0.05 was considered statistically significant.

Patient Characteristics
A total of 33,533 patients in stage IA NSCLC (AJCC 8th edition) were finally included based on the inclusion and exclusion criteria, which were randomly divided into a training set of 23,589 patients and a validation set of 9944 patients (Fig. 1). Table 1 summarizes baseline demographics and clinical characteristics for the study population.

Cox Regression Analysis
Multivariate Cox regression analysis was performed by using the significant prognostic factors identified in univariate analysis. In the multivariate Cox regression analyses, 13 factors were significantly associated with OS as shown in Fig. 2B. While 10 variables were identified as independent prognostic factors for LCSS in multivariate Cox regression analysis (Fig. 3B). Among the 13 factors associated with OS, correlation coefficient between first malignant and malignant sequence was 0.9 (eFigure 1A). Thus, the variant first malignant was removed for the less significant statistic compared with malignant sequence. There was no significant correlation the 10 selected independent variables of LCSS for the training set (eFigure 1B). Finally, 12 and 10 variables were included in the construction of the nomogram of OS and   Fig. 2, and those for LCSS are shown in Fig. 3.

Construction and Validation of the Nomogram
The nomogram of OS was constructed of the above 12 variables (Fig. 4A). The nomogram of LCSS was constructed of 10 variables (Fig. 5A). The nomogram-derived total score of OS and LCSS for each patients in training and validating set was calculated by adding the scores of each selected factor. Cox regression analysis was performed using the total scores as a unique factor in the validation set, the C-index was 0.652 (95% CI, 0.643-0.662) for OS and 0.651 (95% CI, 0.639-0.664) for LCSS. The C-index of nomogram showed a relative good performance which was significantly superior than AJCC 8th stage both in training set and validating set (P < 0.001) (eTable 1). Also, the final established multivariate Cox regression model was identified superior than AJCC 8th stage Cox regression model in terms of the discriminability for OS ( Fig. 4D and E) and LCSS ( Fig. 5D and E) in both training set and validation set. The predicted calibration curves for the 1-, 3-, and 5-year OS ( Fig. 4B and C) and LCSS ( Fig. 5B and C) showed favorable calibration of the nomogram in the training set and validation set. In validation set, the results of 1-, 3-, and 5-year DCA for both OS (Fig. 4F-H) and LCSS (Fig. 5F-H) Max). The Logrank test revealed significant differences between the four groups, in both OS (P < 0.001) and LCSS (P < 0.001) which shown higher distinguish power than AJCC 8th stage groups (Fig. 6A-H).

Prognosis of Adjuvant Chemotherapy
We extracted the patients in our data set who suffered primary lung cancer only once and the information of whether received chemotherapy. There were 16,864 patients included with 488 patients received adjuvant chemotherapy and 16,376 did not. According to the nomogram scores those patients were divided into four risk stratification groups as the previous described in OS (eTable 2) and LCSS (eTable 3). No significant difference in the OS was observed between the adjuvant chemotherapy and non-adjuvant chemotherapy for patients with high nomogram scores (264.69-Max) and (238.35-264.69) groups (eFigure 2C-D). However, for patients in other nomogram scores groups, non-adjuvant chemotherapy revealed a superior OS than adjuvant chemotherapy (eFigure 2A-B). Additionally, nonadjuvant chemotherapy showed significantly superior LCSS than adjuvant chemotherapy in each nomogram scores subgroups (eFigure 2E-H). Those results were also consistent through propensity matching analysis (eFigure 3A-H). These results suggest that surgically resected patients in stage IA received adjuvant chemotherapy have poor prognosis than those who received non-chemotherapy.

Discussion
In the study, we evaluate the OS and LCSS for patients with surgically resected stage IA NSCLC from the SEER database diagnosed between 2004 and 2015. Two nomograms were built to predict the probability of OS and LCSS, as an easily used clinical tool. The good prediction power and clinical utility of the two nomograms were also supported by C-index, calibration curves, and DCA curves.
In the finial Cox regression model, independent prognostic factors for OS was 12 and for LCSS was 10. Compared with LCSS nomogram the OS nomogram additional included the factor of malignant sequence and age range. This means the patient with prior cancer diagnosis IQR, inter-quartile range; LN, lymph node and high age were at greater risk of overall death, but no increase in the risk of lung cancer-specific death. As previous reported, the risks of increasing treatment intolerance due to prior cancer treatment are more likely to worsen the survival of patients with early stage lung cancer [17]. Due to the declining health, surgery and disease resistance in elderly patients are poorer than that in young patients; thus resulted poor prognosis [18]. However, in our study, the age was detected no significant influence in LCSS for patients with stage IA NSCLC. This reason may be the over excellent prognosis in NSCLC patients with stage IA, with a median LCSS over 10 years. Interestingly, our study found that for stage IA NSCLC patients married status had superior OS and LCSS than unmarried status. The reason may be married patients have a better psychosocial environment to fight against lung cancer. As reported that marriage could significantly reduce the mortality rate of women and social and psychological factors could influence the prognosis of patients with cancer [19]. Notably, primary site with overlapping lesion was Similarly, Dariusz Dziedzic also reported that lung cancer with adjacent lobe invasion is a separate group of tumor that lies between stages T2 and T3 for its poor survival rate [20]. Recently, several studies demonstrated that lobectomy compared with sublobar and segmentectomy compared with wedge resection both showed significant superior OS and LCSS for stage IA NSCLC, which was in line with our finding [21]. In our study we found both LN Scope Region and LN examined Number should be considered important prognostic factors even for stage IA NSCLC without positive LN. This finding was consistent with the published study, in which the authors recommended at least 16 LN should be resected for NSCLC patients [22]. But in their study, the LN Scope Region was not considered in analysis.
We additionally analyzed the OS and LCSS by dividing the patient population into four sub-groups according to quartiles of nomogram scores. The high scores sub-groups could be considered high-risk patients in stage IA, which may received benefit from adjuvant therapy. However, to our surprise, the trends of adjuvant chemotherapy for stage IA patients were noticed that the better the prognosis sub-groups have, the more adverse effective of OS and LCSS affected. One recent study found surgically resected EGFR-Mutated NSCLC patients in IB stage could receive significant benefit from adjuvant-targeted therapy based on Osimertinib [23]. Whether patients in identified high-risk stratification of stage IA stage could also receive the similar benefit is highly expected. To our knowledge, this is the first retrospective study of large cohort to construct the nomogram of NSCLC patients in AJCC 8th edition IA stage. However, it should be noted that there were several limitations in this study. First, this study was retrospective, various biases are introduced including selection bias, loss to follow-up, and missing data. Second, other known risk factors for prognosis, such as information on chemotherapy, lymphovascular invasion, spread through air space, and the latest classification of adenocarcinoma, were hard to obtain from the SEER database. Finally, there was a lack of external validation independent of the SEER database, and the generalization prognostic model to the global population is still unclear.

Conclusions
In summary, we developed practical nomograms with good prediction accuracy and discrimination that can help clinicians make personalized survival predictions for NSCLC patients with AJCC 8th stage IA NSCLC. This nomogram can help to identify high-risk stage IA patients after surgical resection.