Construction and validation of a clinicopathologic signature for predicting the prognosis of stage II and III colorectal cancer


 Background: Various clinical and pathological indicators affect the prognosis of stage II and III colorectal cancer (CRC). Nevertheless, few studies have systematically integrated these indicators to construct a signature for assessing the prognostic risk.Methods: Patients with stage II-III CRC underwent R0 radical resection from 2009 to 2016 were included in this study. All clinical and follow-up data were collected. Our study mainly incorporated an internal training cohort and an external validation cohort from two centers. Data processing and data analysis were performed in the R language. Cox proportional hazard regression was used for univariate and multivariable analyses. The log-rank test was performed to compare prognosis among groups.Results: A total of 1200 eligible patients were included in our study. 8 variables, including T/N stage, lymphatic/vascular infiltration, preoperative CEA, CA125 and CA199, were included and the nomogram was established in the signature for predicting OS. The concordance index of the signature was 0.72. The 3-year and 5-year calibration curves of CRC based on the nomogram showed perfect correlation between predicted and observed outcomes. Obvious differences were observed in the survival of different risk groups (p < 0.001). Patients with low risk score signature had a 5-year OS rate of 77%, whereas patients with high risk score signature had the worst 5-year OS rate (only 8%). Furthermore, our signature also achieved similar performance in external cohort validation.Conclusions: The signature based on clinical and pathologic factors had favorable accuracy for predicting the prognosis of CRC. Therefore, our signature may provide significant suggestions for clinical work, particularly in identifying high-risk stage II and III CRC.


Introduction
Colorectal cancer (CRC) is commonly diagnosed malignancies worldwide, which has the third highest incidence and the second highest mortality rate in cancers, causing more than 50,000 deaths (1). Despite continuous advances in treatment methods, such as improvements in surgery, and improvements in targeted drugs and the clinical application of immunotherapy, the prognosis of CRC is still not optimistic.
Even after resection, the 5-year survival rate of CRC was 65% (2). Postoperative recurrence and metastasis are the main factors causing death and affecting the prognosis of patients.
How can we evaluate patient prognosis after surgery? It is a di cult clinical problem. Currently, the tumor-node-metastasis (TNM) staging system according to the 8th AJCC criterion is the main reference for adjuvant treatment and prognosis (3)(4)(5). Generally, patients with a higher stage have a worse prognosis. However, our previous study showed that the prognosis of stage II colon cancer may be worse than that of stage III colon cancer(6). This shows that there may be certain shortcomings in assessing the prognosis based on TNM staging alone. In addition, the TNM system cannot provide all the information about a patient's status after surgery. It is undeniable that the TNM system misses some indicators that are meaningful for the prognosis of CRC, such as high-risk factors, pathological variables and serum tumor markers. Patients with lymphatic invasion in CRC liver metastasis demonstrated adverse survival, especially combined with vascular invasion (7). In addition, serum tumor markers, including CEA and CA19-9, are classic indicators closely related to the prognosis of CRC. Patients with elevated periprocedural CEA have a worse prognosis and a higher risk of recurrence than patients with normalized CEA(8, 9). The level of CA19-9 is usually normal in CRC. However, adding it to postoperative surveillance is recommended because of its signi cant value for prognosis, especially in patients with BRAF mutations (10).
Therefore, accurate evaluation of the prognosis of CRC is a complex and multidimensional problem. The TNM system cannot cover all the indicators, and the signi cance of a single serum tumor marker for gastrointestinal tumors is limited. How to systematically integrate these indicators, analyze the weight of these markers, and further develop a mathematical signature to accurately evaluate the prognosis are the primary questions of this study.

Data Collection
This study was a double-center retrospective clinical study registered in the Chinese Clinical Trial Registry Patient clinical data, including sex, age, tumor markers and pathological reports, were mainly provided by medical histories and the electronic medical record department. Pathological staging was based on the 8th AJCC criterion for CRC. Preoperative tumor marker values were examined within one week before surgery. All patients were followed up according to current NCCN guidelines, including analysis of serum tumor markers, colonoscopy, chest X-ray and CT (or MRI). Patient follow-up data were updated by telephone, email and medical histories. Overall survival (OS) was de ned as the time from surgery to death.

Statistical analysis
Data processing and data analysis were performed in the R language (version 3.4.4, https://www.rbloggers.com/r-3-4-4-released/). Several missing data were estimated by multiple imputations in the "mice" package.
For prognostic signature, we performed a univariate analysis of all variables by Cox proportional hazards regression. Then, the signi cant covariates (score test p <0.05) were retained in the multivariate Cox proportional hazards regression. We performed stepwise and backward selection processes to achieve the nal signature using the Akaike information criterion (AIC). The discrimination of the signature was evaluated by the concordance index (C-index) and corrected 1,000 times by bootstrapping. Furthermore, patients were classi ed into three groups (high, intermediate, and low risk) based on risk score, which was calculated with the prognostic signature. The Kaplan-Meier (K-M) method was applied to compare survival curves for these three groups. Hazard ratios (HRs) and 95% con dence intervals (CIs) were estimated, and the log-rank test was used to determine the signi cant differences. A nomogram was employed to predict the 3-year and 5-year patient survival rates. Additionally, calibration curves were drawn for 3-year and 5-year survival using the nomogram to evaluate our signature from observations and predictions. P values were two-sided, with statistically signi cant differences at p < 0.05.

Study design and patient data
Our study included two major databases, with an internal training cohort used to build the signature and an external cohort used to validate the signature. A total of 707 patients from Shanghai Jiao Tong University A liated Sixth People's Hospital were screened for internal cohort to construct our signature.
An external cohort from the Sixth A liated Hospital of Sun Yat-sen University with 493 cases, validated our signature. In all, the two cohorts used in our signature included 1200 patients. (Fig. 1).
In our study, we collected clinical information and follow-up data from the internal training cohort patients. The characteristics of the training cohort were shown in Table 1. 431 patients (61.0%) were male, and others were female. The median age of these patients was 65. The Median follow-up time was 47 months. Majority of patients were diagnosed with T4 stage (74.8%), while patients with T1 and T2 accounted for only 2.7%. Most patients had N0 stage (59.3%), and 288 patients had lymph node metastasis, including 170 cases with T1 and 118 cases with T2. The mean values of CEA, CA125 and CA19-9 are 13.97 ng/ml, 21.62 U/ml and 23.8 U/ml, respectively. 495 patients were still alive by the time of follow-up, while 212 patients died. To select the variables that were suitable for inclusion in our signature, we performed univariate analyses.
Our results suggested that 9 variables were prognostic factors for OS, including T stage, N stage, pathological type, histologic differentiation, lymphatic in ltration, vascular in ltration, CEA, CA125 and CA19-9 (Table 2; p < 0.05). 3. Evaluation and determination of the accuracy and predictive power of the signature To evaluate the prognosis of patients more intuitively, we developed nomogram with Cox regression model. All the variables in the nomogram had a weighted score, and we could predict the 3-year or 5-year survival outcome by the sum of the scores (Fig. 2). To further examine the importance of these variables and calculate the risk score, we developed a nonparametric approach in our signature using random survival forest. Logarithmic transformation was performed for CEA and CA125. Our results suggested that N stage had the largest in uence on OS with a VIMP of positive value 0.7217, followed by vascular in ltration, tumor histologic differentiation, CA125, T stage, CEA and CA19-9 (supplementary Fig. 1). The predictive accuracy of the OS signature using time-dependent ROC analysis was relatively high. The AUC of our signature based on the risk score was 0.761 at 3 years and was 0.741 at 5 years ( Fig. 3a and 3b).
The calibration curves for CRC based on our signature showed excellent correlation between predicted and observed outcomes for OS prediction at 3 years and 5 years ( Fig. 3c and 3d). All these results indicated that our signature had good accuracy and prediction ability.

Prognosis Among Groups With Different Risk Scores
We assessed patients according to the risk score achieved from the signature using these different variables. Then, patients were classi ed into three groups (high, intermediate, and low risk) by the cutoff of the risk score. Kaplan-Meier curves were applied to compare survival differences. Compared with the low-risk group, the intermediate-and high-risk groups had hazard ratios of 3.28 (95% CI, 2.37-4.52) and 8.67 (95% CI, 5.86-12.80), respectively (Fig. 4, p < 0.0001). The 5-year OS rate of low risk group was 77%, much higher than intermediate group (46%) and high risk group (8%).

Validation Of Our Signature In An External Cohort
Based on the previous results, our signature showed good accuracy. To verify whether our signature was suitable for other hospitals or centers, we collected data from an external cohort for validation. Data were collected from a total of 493 CRC patients. Univariate analysis results showed that the variables included in our predictive OS signature of CRC were all prognostic factors (supplementary table 2, p < 0.05) except CA125. The calibration curves suggested perfect correlation between predicted and observed outcomes for OS prediction at 2 years and 3 years ( Fig. 5a and 5b). In addition, according to the scoring criteria of our signature, we divided the patients from the external cohort into three groups (high, intermediate, and low risk), and we obtained similar results. The survival among the groups was obviously different. The OS between the three groups were signi cantly different (supplementary Fig. 2, p < 0.001).

Website for predicting the prognosis of stage II and III CRC patients
Based on our signature of 1200 patients from two cohorts, we developed a website for predicting the prognosis of stage II-III CRC (http://115.28.66.83/liuyuan/coad.php).

Discussion
Recurrence and metastasis are characteristics of malignant gastrointestinal tumors and result in poor prognosis of patients. It is well known that the current guidelines for postoperative treatment and patients' follow-up are based on TNM stage, pathological factors, CEA and CA19-9. However, the guidelines do not provide individualized risk assessment for patients. Considering this, we conducted this study, which complements the current guideline evaluation system. How to evaluate the risk of postoperative survival has always been an area of interest in clinical practice. In our study, 1200 patients from two cohorts were included. Then, we systematically analyzed the effects of different variables on the prognosis of CRC and constructed signature that predicted postoperative OS. The C-index was higher than 0.7, indicating good discrimination ability. The calibration curves of the signature were quite good.
These results indicated that our signature had promising accuracy and predictive ability. Furthermore, our validation results demonstrated that our signature was widely applicable and had prominent signi cance in clinical applications.
Multiple studies have focused on how to detect recurrence and metastasis and evaluate the survival of gastrointestinal tumors. Similar to our study, Martin R. Weiser et al. generated a signature for predicting local recurrence from 1,320 CRCs by comprehensively assessing patient clinical characteristics. The nomogram scale found that patients in the high-risk group had a lower probability of relapse than those in the low-risk group (11). A risk classi cation system based on the expression of Apc, Fhit, and Her2 and ve pathological parameters accurately divided the patients with gastric cancer into three classes (12). As reported by a previous study, four DNA methylation signatures could serve as an independent prognostic factors and predicted the prognosis of gastric cancer patients (13). Zhou Z et al. constructed a mathematical model with autophagy-related genes and assessed the risk of recurrence and metastasis in CRC patients according to the expression level of 5 genes. The patients were divided into a high-risk group and a low-risk group, and the results showed a signi cant difference in survival between the two groups (14). A signature for risk estimation taking into account genetic susceptibility also exists (15). Another study analyzed 735 specimens of stage II CRC. They found that comprehensive expression of six miRNAs could evaluate the risk of recurrence in patients, and patients in the low-risk group had longer survival than those in the high-risk group(16). Similarly, there was also evidence that microRNA polymorphisms were closely associated with the prognosis of gastric cancer (17). The immunoscore is another powerful factor affecting the prognosis of gastrointestinal tumors. Many studies have shown that the density of CD3 + and CD8 + lymphocyte populations in the stroma and invasion margins of the tumor microenvironment could effectively predict the prognosis of patients with gastrointestinal tumors(18-21).
These signatures have different characteristics and they need to detect additional metrics. This is not only technically di cult, but also increases the medical burden on patients. Therefore, the clinical application of these signatures is still not promising at present. Our signature is more intuitive, integrating the TNM staging system, pathological parameters and tumor markers without the need for additional testing. These characteristics make our signature more clinically convenient and easy to use. According to current guidelines, patients undergoing radical surgery for gastrointestinal tumors are usually followed up every 3-6 months (3,4,22,23). In addition, the treatment of gastrointestinal tumors after radical surgery still has some problems, especially in CRC (24,25). There was no signi cant disadvantage between 3-month adjuvant chemotherapy and 6-month chemotherapy in low-risk stage III CRC patients(26 Availability of data and materials. The datasets in our study are available from the corresponding author on reasonable request. No administrative permission was required to access the raw data from our databases. The clinical data used in our study is not publicly available because it is being used in other unpublished studies.
Competing interests. The authors have declared that no con ict of interest exists.