Patients Characteristics
Totally, we enrolled 3629 patients diagnosed as ESCC from 2004 through 2015 from SEER database. As the flow chart showed (Supplementary Figure 1), we firstly identified the diagnosis of ESCC according to to pathological diagnosis, then patients without T stage information (n=20354), N stage information (n=108), M stage information (n=108), survival information (n=8192) and other information (n=4164) were excluded. Baseline characteristics of patients were presented in the Table 1. 1732 patients were diagnosed from 2004 to 2009 and 1897 patients were found from 2010 to 2015. As for the basic characteristics analysis, we found patients with ESCC were more frequent in the patients aged at >50 years older and male patients. In addition, the lymph node metastasis rate is about 54.42%, while patients with distant metastasis accounted for 24.28%. The median survival was 9 months which ranged from 4 to 23 months.
Grouping of lymph nodes in EC patients
Using KAPS algorithm, we found the optimal cut-off dividing the number of examined lymph nodes (LNs) into two groups was 8, and then performed the Kaplan-Meier survival analysis. As showed in Figure 1A, it could be considered as significantly different between the two groups (P<0.001) for over survival (OS) rate. Also, for analysis of CSS, patients with less than 8 examined LNs had worse prognosis that those who with more than 8 examined LNs (Figure 1B). Additionally, we performed Kaplan-Meier survival analysis in the different stage and found patients with >8 examined LNs remained to have a better survival (Figure 2). The difference of survival among the two goups for patients in the different TNM stage was statistically significant (P<0.0001).
EC survival prediction model
Since the variables we extracted were sufficient, to sellect the most suitable characters to predict prognosis, we performed Lasso regression analysis and found age, tumor size, T stage, N stage, M stage and examined LNs were highly associated with survival (Figure 3). Moreover, using multivariate cox analysis model, we identified age, tumor size, T stage, N stage, M stage and examined LNs were independent prognostic factors (Figure 4). Patients with age>=70, tumor size>5cm and advanced stage had poorer prognosis, while patients with examined LNs>8 were associated with better prognosis. Then model of nomogram that predicts survival was performed based on multivariate cox analysis (Figure 5). The nomogram showed T stage contributed the most to prognosis, followed by examined LNs, M stage, tumor size and age, whereas lymph node metastasis had the least effect for prognosis. As for explannation of nomogram, a straight line can be drawn down at each time point to determine the estimated probability of survival. With respect to each predictor, we could read the points assigned on the 0–10 scale at the top and then add these points. Lastly, find the number on the “Total Points” scale and read the corresponding predictions of 1-, 3-, and 5-year risk.
Validation and Calibration of Nomogram
In our internal validation, the C-index for the established nomogram to predict CSS (0.708, 95%CI, 0.678-0.753) was significantly higher than that for 7th TNM stage (0.601, 95%, 0.573-0.656, P<0.001) (Table 2). In our external validation, C-index of the new model (0.687, 95%CI, 0.601-0.734) was higher than that of 7th TNM stage (0.605, 95%CI, 0.563-0.659, P<0.001) (Table 2). As indicated by the calibration plots, good agreement was observed between the actual observation and prediction by nomogram for 1-year, 3-year and 5-year CSS (Figure 6A-6C). As for analysis of specificity, the nomogram performed better than TNM staging in both internal cohort (1-year AUC: 0.753 vs 0.653, 3-year AUC: 0.761 vs 0.701, 5-year AUC: 0.783 vs 0.733, P<0.001, Figure 6D-6F) and external cohort (1-year AUC: 0.761 vs 0.641, 3-year AUC: 0.753 vs 0.687, 5-year AUC: 0.75 vs 0.685, P<0.001, Figure 6G-I). Finally to compare the clinical usability between nomogram and TNM stage, we performed DCA and showed results in the Figure 7. For predicting 1-year, 3-year and 5-year CSS, no matter in the internal cohort or external cohort, the nomogram showed a greater benefit across the period of follow-up compared to TNM stage.