Development and validation of a prognostic nomogram for patients with stage II colon mucinous adenocarcinoma

Mucinous histology is generally considered as a risk factor of prognosis in stage II colon cancer, but there is no appropriate model for prognostic evaluation and treatment decision in patients with stage II colon mucinous adenocarcinoma (C-MAC) Thus, it is urgent to develop a comprehensive, individualized evaluation tool to reflect the heterogeneity of stage II C-MAC. Patients with stage II C-MAC who underwent surgical treatment in the Surveillance, Epidemiology, and End Results Program were enrolled and randomly divided into training cohort (70%) and internal validation cohort (30%). Prognostic predictors which were determined by univariate and multivariate analysis in the training cohort were included in the nomogram. The calibration curves, decision curve analysis, X-tile analysis, and Kaplan–Meier curve of the nomogram were validated in the internal validation cohort. Three thousand seven hundred sixty-two patients of stage II C-MAC were enrolled. The age, pathological T (pT) stage, tumor number, serum carcinoembryonic antigen (CEA), and perineural invasion (PNI) were independent predictors of overall survival (OS), which were used to establish a nomogram. Calibration curves of the nomogram indicated good consistency between nomogram prediction and actual survival for 1-, 3- and 5-year OS. Besides, patients with stage II C-MAC could be divided into high-, middle-, and low-risk subgroups by the nomogram. Further subgroup analysis indicated that patients in the high-risk group could have a survival benefit from chemotherapy after surgical treatment. We established the first nomogram to accurately predict the survival of stage II C-MAC patients who underwent surgical treatment. In addition, the nomogram identified low-, middle-, and high-risk subgroups of patients and found chemotherapy might improve survival in the high-risk subgroup of stage II C-MAC patients.


Background
Colon mucinous adenocarcinoma (C-MAC) is the second most common histology type of colon cancer (CC) with specific clinicopathological and molecular characteristics, Jia Huang, Yiwei Zhang, and Jia Zhou contributed equally to this work.
which accounts for 10-15% of all CC [1,2]. The treatment strategy and prognosis of C-MAC were still controversial from previous reports, because there were only a few studies on this topic and no guidelines or consensuses was established [1,3,4]. Even though some oncologists have considered mucinous histology would be a high-risk factor for prognosis of CC [5,6], this dilemma is notably prominent in C-MAC especially in stage II patients, in which it still has no study to help make the prognostic prediction and treatment decisions after surgical treatment in the clinic.
Stage II C-MAC patients are a very heterogeneous group in survival and therapy selection [1,6,7]. The majority of studies have demonstrated that C-MAC patients had worse long-term survival than non-mucinous adenocarcinoma (non-MAC) patients in stage II, but some others were dissenting [6][7][8][9]. Recently, several clinicopathologic factors were considered associated with the prognosis of C-MAC, but these factors were currently not well integrated into a systematic or mathematic model [3,5]. Thus, there is an urgent need to develop a comprehensive, individualized, and convenient evaluation tool to reflect the heterogeneity of stage II C-MAC, which would be valuable for prognosis prediction and treatment decisions.
Nomogram is a straightforward, intuitive, and advanced tool for individualized assessment, which is widely used to predict prognosis in various diseases [10,11]. It is a twodimensional graphic calculator that integrates multiple variables to rapidly and accurately estimate the risk probability for a particular individual, which has also been applied in CC [12][13][14]. According to this, we identified a series of overall survival (OS) -related variables of patients with stage II C-MAC who underwent surgical treatment based on the Surveillance, Epidemiology, and End Results (SEER) Program database, then constructed and validated a nomogram to predict the 1-, 3-, and 5-year OS of these patients, as well as risk stratification in this study.

Data source
Data were retrieved from the National Cancer Institute's SEER program (https:// seer. cancer. gov/). We required cases from the SEER database registries in the anonymous data. Permission was obtained to download the data from the SEER database, which did not need prior informed patient consent. The SEER*Stat software (8.3.8 version) was used for the data extraction and patient selection.

Patients selection
From 2010 to 2015, patients with stage II C-MAC who underwent surgical treatment were enrolled. The detailed data processing flowchart is shown in Fig. 1. Selection criteria were as follows: (1) treatment with surgical resection, surgical types including partial/subtotal colectomy, hemicolectomy or greater, total colectomy, colectomy, and colectomy (subtotal, hemicolectomy or total) plus partial or total removal of other organs (code 30-70); (2) American Joint Committee on Cancer (AJCC) 7th ed TNM stage II; (3) the patients were pathologically diagnosed confirmed; (4) with complete survival data; (5) the morphology ICD-0-3 codes of C-MAC were limited to mucinous adenocarcinoma (8480/3). Meanwhile, the excluded criteria were listed: (1) without surgery information, (2) the primary tumor sides excluding rectum, and (3) patients accepted radiation before surgery. Furthermore, other clinical variables were extracted for patients in the SEER database: race, age, sex, tumor location, carcinoembryonic antigen (CEA), perineural invasion (PNI), tumor differentiation, primary surgery site, tumor number, pathological T (pT) stage (AJCC, 7th ed), and survival months. Proximal colon cancer was defined as cancer located in the cecum, ascending colon, hepatic flexure, or transverse colon; distal colon cancer was defined as cancer in the splenic flexure, descending colon and sigmoid colon. The final cohort contained 3762 eligible patients with stage II C-MAC who underwent surgical treatment.

Statistical analysis
Descriptive statistics of patient characteristics were summarized, and the eligible 3762 patients were randomly divided into a training cohort (n = 2826, 70%) and a validation cohort (n = 936, 30%) by the random number methods in Microsoft Excel (version 16.64). Continuous variables with normal distribution were compared using the one-way ANOVA test, and categorical variables were presented as Depending on the training cohort, hazard ratios (HRs) and 95% confidence intervals (CIs) were evaluated by the univariate and multivariable Cox proportional regression, and the independent prognostic factors associated with OS were analyzed in the training cohort by univariate and multivariable Cox proportional regression. The variables with a p < 0.05 in the univariate analyses were selected into the multivariate analyses. The nomogram package in R software then constructed the predictive nomogram model with appropriate predictive ability based on independent prognostic factors. Variables with two-tailed p < 0.05 in the previous multivariate Cox analysis were included in this nomogram model. Then, the performance of the nomogram was independently validated using the validation cohort. Calibration plots and decision curve analysis (DCA) were performed to assess the predictive accuracy of the nomogram models. The nomogram scored all patients retrospectively and divided them into low-, middle-, and high-risk subgroups using the X-tile program (www. tissu earray. org/ rimml ab/). All statistical evaluation was performed using IBM SPSS Statistics version 22.0 (SPSS Inc., Chicago, IL, USA) and R software (version 3.5.1; http:// www. Rproj ect. org). P values < 0.05 were considered significant.

Patients' characteristics
A total of 3762 eligible patients from the SEER database were enrolled and divided into the training cohort (n = 2826) and internal validation cohort (n = 936) in the analysis. The patient characteristics were comparable between the two cohorts ( Table 1). The results showed that stage II C-MAC patients who underwent surgical treatment were more common in white (90.4%), female (52.2%), tumor location at the proximal colon (77.8%), tumor larger than 5 cm (56.9%), more pT3 stage (75.4%), grade I/II (77.0%), solitary tumor (67.1%), and treatment with subtotal colectomy (68.5%) in the training cohort. Similar results were observed in the validation cohort (all p > 0.05). Positive serum CEA levels accounted for 29.3% and 30.7%, and PNI accounted for 3.4% and 3.2% in the training and validation cohorts, respectively. The median survival time for the training and validation cohorts were 36.00 (37.78 ± 23.006) and 36.79 (35.00 ± 22.919) months, respectively.

Identification of prognostic factors of patients in the training cohort
Univariate and multivariate Cox regression were performed to identified the potential prognostic factors for the OS of stage II C-MAC patients ( Table 2). In the univariate analysis, a total of 5 factors, including age, pT stage, tumor number, serum CEA level, and PNI status were significantly associated with OS in patients with stage II C-MAC who underwent surgical treatment (Table 2). Then, these significant potential prognostic factors were included in the multivariate analysis. Finally, multivariate analysis indicated that all of the five factors, age, pT stage, tumor number, serum CEA level, and PNI status were independent prognostic factors for OS in patients with stage II C-MAC who underwent surgical treatment ( Table 2). The above results indicated that elder age, higher pT stage, multiple tumors, elevated serum CEA level, and PNI were associated with a poor OS of stage II C-MAC patients who underwent surgical treatment.

Development and validation of the prognostic nomogram for stage II C-MAC patients
To predict the OS of individual stage II C-MAC patient who underwent surgical treatment, a nomogram was subsequently established based on the results of multivariate Cox analysis (Fig. 2). In this nomogram, each selected variable was assigned a corresponding score based on its value using the nomogram. The C-index of this nomogram was 0.689 (95% CI, 0.670-0.709) in the training cohort and 0.657 (95% CI, 0.624-0.690) in the validation cohort respectively, which indicated the model was reliable.
To further verify the survival prediction efficacy of the nomogram, we test it in the training and validation cohort. The results showed that the calibration curves were high consistency with the actual survival in predicting patients' 1-year (Fig. 3A, B), 3-year (Fig. 3C, D), and 5-year survival (Fig. 3E, F) of stage II C-MAC patients both in the training and validation cohorts, which indicated this nomogram had a good calibration. Furthermore, the DCA results showed that the predicted curve and the actual observation curve were very close in 1-year (Fig. 4A, B), 3-year (Fig. 4C, D), and 5-year survival prediction (Fig. 4E, F), especially in the 1-year, which ensured the clinical application value.

Performance of the nomogram in stratifying risk of stage II C-MAC patients
To further validate the sensitivity and specificity of the nomogram and better predict individual disease risk and prognosis, patients were stratified into three risk subgroups in the training cohort according to the total prognostic scores calculated in the nomogram (Fig. 5A). The stratification was shown in Fig. 5B: low-risk (< 93.4), middle-risk (93.4-121.8), and high-risk (≥ 121.8) subgroups according to the cut-off values determined by the X-tile software. Among the three subgroups, patients in the low-risk subgroup had better survival than patients in the high-and middle-risk subgroups both in the training and validation cohorts (Fig. 5C, D). The above results indicated that our nomogram was effective in predicting the survival probability of stage II C-MAC patients who underwent surgical treatment.

Additional interesting findings from the nomogram
Interestingly, we had some additional findings using the risk stratification of nomogram in stage II C-MAC patients. The results showed that patients in the high-risk subgroup who received chemotherapy had better OS than those who did not receive chemotherapy (p < 0.05, Fig. 6A). However, the middle-and low-risk subgroups had no significant survival difference between chemotherapy and non-chemotherapy stratification via the Kaplan-Meier curves (both p > 0.05, Fig. 6B, C). These results indicated that patients in the highrisk subgroup could benefit from chemotherapy, but could not gain a survival advantage in the low-and middle-risk subgroups.

Discussion
Nowadays, increasing evidence strongly suggest clinicians to explore valuable prognostic factors and further construct risk stratification model to evaluate prognosis and guide treatment decisions for CC patients [15][16][17][18][19]. However, the role of histology subtype is always overlooked, especially in MAC. Meanwhile, accurate prediction of the survival for C-MAC patients is crucial for subsequent treatment decisions and long-term follow-up [16,20,21]. The prediction is extraordinarily valuable in stage II C-MAC, which is a very heterogeneous sub-population of CC [6,22]. To solve this problem, we successfully constructed and validated a specific nomogram integrating multiple clinicopathological factors, which could evaluate the survival probability for individual stage II C-MAC patients who underwent surgical treatment. To our knowledge, the present study was the first nomogram to estimate the individual prognosis of stage II C-MAC patients and also provided risk stratification for patients who needed chemotherapy.
The clinicopathological variables used for this nomogram were routinely collected from the SEER database. This study found that diagnosis age, pT stage, serum CEA level, tumor number, and PNI status were the prognostic factors for survival of stage II C-MAC patients after surgical treatment. All of these risk factors have been separately validated by previous studies. For example, some studies found patients with C-MAC tumors generally had a higher pT stage and a higher ratio of PNI, which might cause a higher incidence of local spreading and lead to lower resection rates [9,23,24]. There were also studies revealed that preoperative elevated serum CEA level was a risk factor for worse survival [8,25]. Meanwhile, the present study found the prognosis was strongly associated with age, which was confirmed by other studies in patients with C-MAC who underwent surgical treatment [5,7,26,27]. The potential explanation was that elderly patients were often accompanied by dysfunction, malnutrition, and comorbidity, which limited them to receive aggressive treatment. Multiple tumor number was also considered as a risk factor for prognosis compared with the solitary tumor in C-MAC by some studies [6,8,28]. Based on these above predictors, we constructed a novel nomogram model in the present study, which avoided the limitation of a single predictor and achieved individual and high predictive accuracy. Besides, this nomogram was extensively validated by discrimination and calibration in a separate internal validation cohort, as well as checked the potential clinical utility by DCA curves. After these validations, the nomogram model had good predictive accuracy for OS and was valuable for clinical decision-making in stage II C-MAC patients. What is more, these risk factors constructed the nomogram which could be conveniently obtained by any medical center, thus greatly expanding the clinical application value.
In addition, another interesting finding was that the risk score stratification based on the nomogram had good potency to determine postoperative chemotherapy for stage II C-MAC patients. An unsolved dilemma for oncologists is whether to perform postoperative chemotherapy in stage II C-MAC patients [6][7][8][9]17]. Postoperative chemotherapy represented a double-edged sword in stage II C-MAC.
Considering the toxicity and resistance of chemotherapeutic agents, recent studies proposed that stage II CC patients should carefully consider postoperative chemotherapy [1,29,30]. In this study, chemotherapy was beneficial in the high-risk subgroup of stage II C-MAC patients, but it would not improve survival for low-and middle-risk patients. These findings could help oncologists to identify a subset of stage II C-MAC patients who should perform postoperative chemotherapy, which could guide subsequent treatment choice and follow-up.
Given the above, this study confirmed that the simple nomogram was valuable for patients with stage II C-MAC, which offered the following primary strengths. First, this nomogram was a good tool to provide accurate and individual prognosis prediction for an individual stage II C-MAC patient. Secondly, as a simple and intuitive graphic calculator, this nomogram was based on readily available data of stage II C-MAC patients, which was easily obtained and convenient for clinical translational application in the clinic [18,31,32]. Thus, oncologists could rapidly evaluate prognosis and conveniently interpret it to patients, then work together to make subsequent treatment decisions and followup. Thirdly, this nomogram came from the largest publicly available database and experienced a systematic assessment, making it more robust and reliable in the clinic [33,34]. In addition, patients stratified by the prognostic scores provided better performance in the individual survival prediction and chemotherapy management for stage II C-MAC patients. Survival curves stratified by the score calculated by the nomogram scoring system. A X-tile plot of training cohort in the total risk score. B The cut-off point was highlighted by X-tile. C, D The survival curves showed that the high-risk group had poorer OS than the middle-and low-risk groups in the training (C) and validation cohort (D) However, there were still several limitations in this study, which were primarily originated from the inherent weaknesses of the retrospective database with inevitable bias and missing data. In addition, some specific information of C-MAC patients were lacking in the SEER database, such as MSI status, KRAS, and BRAF mutations. In addition, although the nomogram was well validated by the internal validation cohort, further prospective multi-center validation was still needed.

Conclusions
In conclusion, we established the first convenient nomogram to accurately predict the survival of stage II C-MAC patients who underwent surgical treatment. In addition, this nomogram could provide postoperative chemotherapy guidance for stage II C-MAC patients. Fig. 6 The sub-stratified analysis for the long-term survival of the risk group according to chemotherapy. A The survival curves showed the high-risk group who received chemotherapy had better OS than that who did not receive chemotherapy. B, C The survival curves showed no differences in OS between chemotherapy and no chemotherapy patients in the middle-risk group (B) and low-risk group (C)