The development and external validation of a nomogram predicting overall survival of gastric cancer patients with inadequate lymph nodes based on an international database

Inadequate sampling of lymph nodes could lead to stage migration and indicate a poor prognosis for gastric cancer after curative surgery. Some emerging novel predictors and the application of a nomogram could increase the accuracy of survival prediction. An international database regarding gastric cancer was employed as the primary cohort. The patients with inadequate (< 30) lymph nodes (LN) were analyzed by Cox proportional hazards regression. Based on the selected model, a nomogram was plotted and calibrated against an external validation database. A total of 1109 patients were included in the primary cohort, and there were 6584 patients in the validation cohort. There were significant differences regarding the clinical characteristics between the two cohorts. The model containing age, T stages, N stages, metastatic lymph nodes (mLN), and the number of total LN retrieved (TLN) showed superiority over the conventional TNM stages. Harrell's concordance index of the nomogram and TNM stages was 0.744 and 0.717, respectively. The external validation demonstrated a good concordance with the nomogram-predicted survival. The nomogram including age, T stages, N stages, mLN, and TLN had a better accuracy than the conventional TNM staging system in predicting overall survival for gastric cancer patients with inadequate (< 30) LN.


Introduction
Gastric cancer remains the fifth most prevalent and third leading cause of cancer death worldwide according to the latest epidemiological data [1]. A radical gastrectomy is crucial for treating gastric cancer, and extended peri-gastric lymphadenectomy (D2 surgery) is considered a standard procedure for treating locally advanced stages [2]. When applying an enlarged surgical field, the number of lymph nodes (LN) retrieved could theoretically be increased, meaning that the LN count could be regarded as a criterion for surgical quality control [3]. A lower LN count could imply a deficiency in the surgical treatment and lead to an unfavorable prognosis [4,5].
Additionally, the number of LN retrieved is a prerequisite for applying the American Joint Committee on Cancer (AJCC) TNM classification system for gastric cancer [6], which is the most universally accepted prognostic indicator. According to AJCC system, at least 16 regional nodes should be removed or assessed pathologically, and the 1 3 removal or evaluation of more nodes (≥ 30) is desirable [6]. A substandard LN count could lead to N stage migration [7,8] and subsequent underestimation of the survival risk. To increase the predictive value of LN-related parameters, some novel predictors have been studied and introduced in clinical practice. According to multiple studies [9][10][11][12], the predictors, including the ratio of metastatic LN, the ratio of negative to positive LN, and the log odds of positive lymph nodes (LODDS), have been confirmed to be superior over the conventional AJCC N stage in predicting survival.
Additionally, besides the attempts to mend the predictors related to LN status, parameters [13,14] related to tumor volume have been investigated and they also showed some value in survival prediction. Based on the hypothesis that adding additional information to the conventional AJCC staging system might help increase its predictive accuracy, we carried out this study accordingly. An international database was employed to minimize the bias, and a nomogram was used to maximize its practical clinical value.

Primary database and external validation database
The primary database was comprised of the IMIGASTRIC database and the PUCH database. The former is an international database dedicated to the international study of minimally invasive surgery for gastric cancer. The database was established and is run by St. Mary's Hospital of Terni and "La Sapienza" University of Rome. The latter is a prospectively collected and maintained database for gastric cancer, established and run by Peking University Cancer Hospital. As the largest database providing information on cancer statistics, the Surveillance, Epidemiology, and End Results (SEER) database was chosen as the external validation dataset. The study was performed in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Peking University Cancer Hospital (2017KT102).

Inclusion and exclusion criteria
The definition of inadequate lymph nodes was taken from the 8 th edition of the AJCC TNM classification for gastric cancer [6], where fewer than 30 examined lymph nodes was considered undesirable. Thus, the inclusion criteria for case selection in the primary database were as follows: (1) histologically confirmed gastric adenocarcinoma; (2) gastrectomy was performed with D2 lymphadenectomy; (3) the number of total LN retrieved (TLN) was less than 30; and (4) the follow-up time was longer than 60 months or the end-point (death) was reached.
The exclusion criteria were as follows: (1) neoadjuvant treatments were applied; (2) a previous surgery history related to the stomach; (3) noncurative surgery was confirmed, including a positive margin and positive cytology; (4) metastatic disease was confirmed before or at the time of surgery, including distant metastasis, distant LN metastasis, and peritoneal dissemination; and (5) a diagnosis of another malignancy.
As for the validation database, the inclusion and exclusion criteria were the same, except that the surgical procedure had to be defined roughly as gastrectomy (coded as 30, 40, 50, 60, and 80 in the variable "RX Summ-Surg Prim Site") as an inclusion criterion instead of D2 lymphadenectomy, because there was no corresponding variable coding for D2 lymphadenectomy in the SEER database. Additionally, to minimize the impact of inadequacy of surgical treatment in the SEER database, the minimum number of LN retrieved was required to be no fewer than 8, in keeping with the number in the primary database. Cases of unnatural death and any cases missing data for the number of metastatic LN and the TLN, the T stage, and the tumor size were excluded as well.

Parameter definition
The depth invasion of the tumor (T stage), LN metastasis (N stage), and the overall stage (TN stage) were categorized with the 8th edition of the American Joint Committee on Cancer (AJCC) staging system for gastric cancer [6].
With the goal of maintaining the simplicity and practicability of the prognostic factors used for the nomogram, we only adopted some common, simple and easy-to-calculate parameters related to tumor volume and LN metastasis, including the tumor diameter defined as the longest length of tumor volume, the tumor index (TI) [14] calculated as tumor diameter × T stage, the number of metastatic lymph nodes (mLN), and the ratio of mLN to total lymph nodes examined (mLNR). The overall survival (OS) was defined as the time (in months) from the surgery to the date of death from any cause or the last follow-up.

Factor and model selection
The univariate analyses were performed in the primary cohort to identify the significant prognostic factors for OS. Then, the factors with a p value less than 0.1 in the univariate analyses were included in the multivariate analyses to identify the independent prognostic factors. The selected model was compared with the control model by Akaike Information Criterion (AIC) and Harrell's concordance index (C-index). A smaller AIC and a larger C-index indicate a more accurate and effective model.

Nomogram development and external validation
The nomogram was plotted with RStudio (version 1.1.463, with packages "Hmisc", "lattice", "Formula", "ggplot2", and "rms") according to Zhang's method [15]. The selected factors in the primary cohort were introduced into the Cox proportional hazard model, and a visualized risk diagram was drawn based on the model. For the external validation, the calibration curves for 3-and 5-year survival rates were plotted by comparing the means of the nomogram-predicted OS with the observed Kaplan-Meier estimate of actual OS in both the primary and the validation cohorts. The bootstrapping method (1000 repetitions) was used to reduce the estimate bias.

Statistical analyses
Categorical data were presented as the numbers (percentage); continuous data were presented as the mean (± standard deviation) if normally distributed or as the median (interquartile range) if not normally distributed. Differences in categorical and continuous data were analyzed by Chisquare tests and independent-samples t-tests, respectively. The difference of OS was analyzed by the log-rank test.
For the univariate analyses, Cox proportional hazards regressions were performed to determine the predicting factors of OS. The multivariate analysis was performed with the bidirectional stepwise method. All analyses were performed with SPSS® (version 23.0) and RStudio (version 1.1.463). Statistical significance was defined as a two-sided p < 0.05 for all tests.

The clinical characteristics and survival
A total of 1109 patients were included in the primary cohort, and 6584 patients in the validation cohort. The age, sex, race, tumor location, T stage, N stage, TNM stage, tumor diameter, TI, mLN, mLNR, and the total LN were significantly different between the two cohorts (Table 1). In summary, the population in the primary cohort was younger, had more male patients, was dominated by Asian patients, had more cancers located in the middle and upper third of the stomach, had more patients in early stages, and had smaller tumor diameter compared to the validation cohort. Although the mLN was similar between the two cohorts, the mLNR was lower in the primary cohort due to the higher number of TLN. The median follow-up times were 78 and 90 months respectively, and the 3-year and 5-year OSs were significantly higher in the primary cohort than the validation cohort (all p < 0.001) (Table 1) (Fig. 1).

The univariate and multivariate analyses
The univariate analyses presented a significant correlation between the age, T stages, N stages, tumor diameter, TI, mLN, mLNR and the OS (all P < 0.001). The multivariate analysis showed that the age, T stages, N stages and mLN were independent prognostic factors for OS ( Table 2).

The model selection
In the multivariate model, the "N3b" stage did not reached statistical significance, and the hazard ratio (HR) was even lower than for the "N3a" stage, which was obviously inconsistent with the clinical impression. This result might be explained by the limited number of cases in "N3b" stage in the primary cohort (n = 71, 6.4%). In addition, the number of cases of "T4b" was even lower (n = 26, 2.3%), so we combined the stage "N3b" with "N3a" as "N3", and the stage "T4b" with "T4a" as "T4" in the subsequent analyses (Tables 1 and 2). Apparently, the mLNR had a constant mathematical relationship with mLN and TLN, and given the strong correlation between the two factors, the multivariate analysis might be biased by the collinearity between mLN and mLNR. On the other hand, since the TLN was documented to be an independent prognostic factor in multiple studies [4,5] and was recommended to be above 30 for a favorable prognosis [6], we introduced the mLN and TLN into the model to improve its clinical benefits. Generally, a model is evaluated by its discriminative capability and the goodness of fit. The C-index and AIC were used as the assessment criteria for the two above features respectively. As the controlled model, the TNM stage model had a C-index of 0.717, and an AIC of 6662.988, which was inferior to the other models. Interestingly, all of the other models had the same C-index of 0.744; however, the model (age + T stages + N stages + mLN + TLN) had the lowest AIC (6607.119), which implied a superiority in goodness of fit, in spite of the minor differences among the models ( Table 3).

The nomogram development and external validation
The model including age, T stages, N stages, mLN, and TLN was selected to develop the nomogram (Fig. 2). The nomogram was used to predict the patient's 3-year and 5-year OS by measuring all of the variables on the "Points" scale and summing up all of the scores. Then, the projective point of the "Total points" on the "survival" scale could be used to read the survival rate. The C-index of the nomogram was 0.744, which was close to the cutoff point of 0.75, and indicated a relatively good discriminative capability.
The calibration curves in the validation cohort demonstrated a good concordance with the nomogram-predicted survival for both 3-year and for 5-year (Fig. 3). The C-index of the nomogram in the validation cohort was 0.694 and 0.690 for 3-year and 5-year OS, respectively. It is notable that, even though there was a significant difference in the Fig. 1 The 3-year and 5-year overall survival curves for the primary and validation cohorts survival rate between the primary cohort and the validation cohort, the nomogram still had good accuracy in predicting survival in the range out of the primary cohort.

Discussion
For clinical practice, the survival prediction is the foundation upon which to generate the treatment strategy. As a universally adopted predictive tool, the N staging system for gastric cancer has been challenged by the emerging novel parameters, and the AJCC has made some important modifications regarding the N stages. In the latest version, the TNM stage groupings have been reclassified, and the "N3a" and "N3b" contribute to the TNM grouping separately, which refines the role of N stages in TNM classification. This modification could yield a more accurate prediction of survival [16]. However, a prerequisite is necessary before applying the system, which is that the minimum number of TLN should be no less than 16 [6]. As regards the presence of inadequate TLN, the AJCC TNM staging system still leaves a blank.
The definition of "inadequate" has varied greatly among different studies. In terms of stage migration, the number 15 is generally regarded as the cutoff value [7,8]. Patients with substandard TLN could be underestimating their survival risk based on an inaccurate N stage. Furthermore, in terms of survival benefits, the number defined as "inadequate" could increase to a range of 25-30 [5,17,18]. For a total gastrectomy, the "adequate" number could even be as high as 40 [19]. The patients left in the gap between the two definitions of "inadequate" should not be neglected and deserve further investigations to avoid a misleading survival prediction. In the present study, we adopted 30 as the cutoff value to achieve deeper clinical significance. The nomogram developed in this study showed superiority over the conventional AJCC TNM staging system with regard to its discriminative capability and applicability. For better practicability, we only included certain simple and easy-to-calculate parameters. The independent prognostic factors selected in the primary cohort were similar to those of other studies related to the prediction of the nomogram for gastric cancer after D2 lymphadenectomy [20,21]. The selection of independent predictors is a crucial process for nomogram development, especially when highly correlated factors are evident such as the commonly used parameters (mLN, mLNR, TLN, LODDS, and so on) related to LN involvement. Because no standard and effective statistical methods are available to handle the collinearity problem, predictor selection based on clinical consideration is permitted.
In the present study, both mLN and mLNR were the significant predictors in the univariate analysis; however, in the multivariate analysis with the stepwise method, mLNR was deleted in the first step. In any case, we attempted to test the discriminative capability of the combination containing mLNR and found the same C-index (0.744) with the combination containing mLN, with only a tiny difference of AIC (6609.483 vs 6609.415). Interestingly, we also found Fig. 2 A nomogram predicting 3-year and 5-year overall survival for gastric cancer patients with inadequate lymph nodes. The nomogram is used to predict the 3-year and 5-year overall survival by measuring all the variables on the "Points" scale and summing up all the points. The survival rate can be read from the projective point of the "Total points" on the "Survival" scale Fig. 3 The calibration curves for the nomogram predicting overall survival. The x-axis represents the nomogram-predicted survival, and the y-axis represents the actual survival and 95% confidence interval by Kaplan-Meier analysis. a Calibration curve for 3-year overall survival; b Calibration curve for 5-year overall survival that mLN combined with TLN could improve the goodness of fit of the model, which is in agreement with most of the aforementioned studies; as such, we finally used a model containing age, T stage, N stage, mLN, and TLN to plot the nomogram.
Notably, this study was based on an international database. Both the primary and validation cohorts had relatively large samples. The data were collected from multiple centers and contained multiple races, which could make the results more convincing than those of monocentric research with mono-ethnicity. The data on clinical characteristics revealed significant differences between the primary and validation cohorts for tumor features, T stages, N stages, and TNM stages. Above all, even with the significant difference in survival, the nomogram could still predict the survival in the validation cohort.
The present study, although carefully conducted, has some limitations. First, the information about adjuvant chemotherapy was absent in this study because of the data shortage in the primary cohort; thus, the impact of adjuvant chemotherapy could not be precisely evaluated and applied in this study. Therefore, in the era of multimodality therapy, the application of this prognostic model is not warranted, and it needs further investigation. Second, the discriminative capability of the nomogram had a median accuracy (C-index = 0.744), which was lower than the nomograms developed at the Seoul National University Hospital (C-index = 0.78) [20] and the Catholic University St. Mary's Hospital (C-index = 0.87) [21]. The latter two studies had different populations, of which the majority had an LN of more than 15 and no upper limitations; thus, their predictive accuracy could be higher than ours.

Conclusion
A nomogram containing age, T stages, N stages, mLN, and TLN was superior to the conventional AJCC TNM staging system in predicting OS for gastric cancer patients with inadequate (< 30) LNs. Continuous improvements of the AJCC TNM staging system are advocated, especially for patients with a limited number of LNs to be examined.