Prognostic Prediction Models for Postoperative Patients with Stage I to III Colorectal Cancer: A Retrospective Study Based on Machine Learning Methods

DOI: https://doi.org/10.21203/rs.3.rs-1572496/v2

Abstract

Purpose To utilize the patient, tumor, and treatment features and compare the performance of machine learning algorithms, develop and validate models to predict overall, disease-free, recurrence-free, and distant metastasis-free survival, and screen important variables to improve the prognosis of patients in clinical settings.

Methods More than 1000 colorectal cancer patients who underwent curative resection were grouped according to 4 endpoints and divided into testing sets and training sets (9:1). We applied 4 machine learning algorithms to predict 1-, 3-, and 5-year survival times. The area under the receiver operating characteristic curve (AUC) and average precision (AP) were our accuracy indicators. Vital parameters were screened by multivariate regression models. To achieve better prediction of longitudinal oncological outcomes, we performed 10-fold cross-validation except for the recurrence-free survival model (3-fold cross-validation). We iterated 3000 times after hyperparameter optimization and assessed the internal testing sets.

Results The best AP values were greater than 80%, except for the overall survival model (69.5%). The best AUCs were all greater than 0.70 except for the recurrence free survival model (0.61). The models performed well. Variables that were widely correlated with prognoses, such as the TNM stage, were selected as important features; however, indirectly related indicators, such as Ki-67 level, were also selected.

Conclusion We constructed an independent, high-accuracy "white-box" machine learning system for predicting survival times. This system may help in determining managing strategies for colorectal cancer patients and has future utility in personalized medicine and monitoring.

Introduction

Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates (Bray et al. 2020). The high mortality rate is due to the progression of the disease and inadequate treatment strategies (Koncina et al. 2020). Furthermore, overdiagnosis, overtreatment, false positives, false reassurance, uncertain findings, and complications are common and lead to unnecessary psychological burden on patients (Kalager et al. 2018; Ma et al. 2019; Peery et al. 2018; Vermeer et al. 2017). Therefore, accurate prediction of the prognosis of CRC patients is vital when making clinical decisions. The American Joint Committee on Cancer (AJCC) classification system for CRC is the primary tool for predicting the prognosis of CRC and especially for making adjuvant chemoradiotherapy decisions (Dienstmann et al. 2017; Nagtegaal et al. 2011). However, the survival observations associated with the AJCC classifications for CRC patients have been reported to exhibit certain inconsistencies (Chu et al. 2016; Greene et al. 2002; Mo et al. 2018). Consequentially, several systematic reviews (with/without meta-analyses) have been carried out to investigate and build prognostic models for CRC; some of these models include the TNM stage, others do not (Beaton et al. 2013; Choi et al. 2015; Ha et al. 2017; Rekhraj et al. 2008). The outcomes of these models have been unsatisfactory, however, and likely due to methodological limitations. Thus, we explored the possibility of using machine learning (ML) to build a prognostic model for CRC. ML is a branch of artificial intelligence (AI) in which a computer generates rules underlying or based on raw data (Mitchell, 2003); the use of ML in medicine has gradually become common (Grimm et al. 2022; Kim et al. 2022; Metsky et al. 2022; Xie and Zhuang and Niu and Ai et al. 2022). ML can be used to directly compare the accuracy of two or more quantitative tests for the same disease/condition (Tripepi et al. 2009). Several rules for diagnosis and treatment have recently been formulated (H. Liang et al. 2019; R. Liang et al. 2019; Liu et al. 2020), and ML algorithms have been used to construct risk forecast models that predict the hazard ratio of adverse events at a certain point in time (D'Ascenzo et al. 2021; Liu et al. 2022) or independent of a specific time point (Yala et al. 2021). Patients in high-risk groups have been screened. However, due to the limitations of the models, it is not feasible to longitudinally predict when an event will occur. In other words, these models cannot indicate the specific time of event occurrence. In addition, they are "black-boxes", which reduces their clinical credibility.

In the present study, we used previously designed ML models as a basis to develop a new ML system for predicting the specific time of occurrence of oncological outcomes (death, tumor recurrence, tumor distant metastasis) in patients with stage I, II, and III CRC who undergone curative resection. This system could serve as a reference for clinicians when selecting treatment strategies for CRC patients. Furthermore, it might prompt communication on the purpose of treatments between doctors and patients as well as reduce the psychological burden on patients and their families. In addition, we screened the important variables that affect the outcomes to improve the clinical credibility of the ML models. Thus, the "black-box" feature of these models was eliminated. In addition, physicians who do not have the opportunity to apply ML to their work can use our research to make more accurate prognostic assessments and select management methods.

Materials And Methods

Case selection

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (Grant number QYFYWZLL26957). The need for written informed consent was waived by the Ethics Committee of the Affiliate Hospital of Qingdao University due to retrospective nature of the study. We retrospectively analyzed the data of patients who underwent curative operations for primary stage I, II, and III CRC at the Affiliated Hospital of Qingdao University from 2001 to 2020; the data was acquired through the hospital information system. A detailed flowchart is shown in Fig. 1.

Potential variables included age, sex, body mass index (BMI), hypertension (HP), diabetes mellitus (DM), chronic heart disease (CHD), smoking history, drinking history, family history of tumors, family history of gastrointestinal tumors, serum carcinoembryonic antigen (CEA) level, serum C-reactive protein (CRP) level, tumor position (ascending colon vs. transverse colon vs. descending colon vs. sigmoid colon vs. rectum), tumor differentiation grade, histological type, tumor size (diameter, 20 mm was the cutoff), perineural invasion (PNI), lymphvascular invasion (LVI), lesion amount (unifocal vs. multifocal), Ki-67 protein level, operation method (laparotomy vs. laparoscopy), lymph node ratio(LNR), and tumor node metastasis (TNM) stage. Additionally, disease status (recurrence vs. distant metastasis) was added to the disease-free survival (DFS) model. These characteristics mainly involved patient demographics, health, tumor characteristics, and treatment. There were no missing data.

Regression model selection

Best subset selection regression

To identify the optimal subset, the regularization technique was used. Since it is not feasible to test 2p models in this experiment, the regularization technique was necessary. Our work only considered fitting one model for each λ value, which would greatly improve the efficiency. In the linear model, in regard to the bias trade-off problem, the relationship between the response variable and the predictor variable was close to linear, and the least squares estimate was close to unbiased but may had a high variance in linear models. Thus, a small change could lead to large changes in the results of the least squares coefficient estimates. Regularization optimized the bias trade-off through proper I selection and normalization, thereby improving the effect of model fitting. Regularization of coefficients also solved the problem of overfitting, which is caused by multicollinearity.

Ridge regression

In ridge regression, the normalization term was the sum of the squares of all coefficients, called the L2 norm, which was used for minimizing RSS+(sumj2) in this paper. When λ increased, the coefficient decreased (tending to but never reaching 0). The advantage of ridge regression is that it can improve the prediction accuracy. Because coefficients of 0 are not possible with ridge regression, the least absolute shrinkage and selection operator (LASSO) regression test was conducted.

LASSO regression

Different from ridge regression, the L1 norm (the sum of the absolute values of all feature weights), which needs to minimize RSS + λ(sum |βj|), was used for LASSO regression. This shrinkage penalty decreased the feature weights to 0, which greatly improved the interpretability of the model and represented an important advantage when compared with ridge regression and other regressions.

LASSO cross-validation model

In this experiment, by default, 10-fold cross-validation was used for LASSO regression. In κ-fold cross-validation, the data were divided into identical subsets, whose amount was κ. κ-1 subsets were contained to fit the model, and then the remaining subsets were considered the text set. Then, the results of κ-fold fitting (usually taking the default average) were combined to determine the final parameters. In this paper, each subset was used as a test set only once. Thus, it was very easy to use κ-fold cross-validation, and the results included the value of each fit and the mean squared error (MSE) of the response.

Outcome selection

The primary outcomes were four oncological endpoints-overall survival (OS), DFS, recurrence-free survival (RFS), and distant metastasis-free survival (DMFS). They were, respectively, defined as the date of surgery to the date of death, recurrence/distant metastasis, recurrence, and distant metastasis. The secondary outcomes were the important variables screened from each ML model.

Machine learning model training and validation

Figure 1 shows the process of set division (average precision, APs were the best at this ratio). All the variables were screened through four multivariate regression methods. The most appropriate regression models were selected based on the MSE (the difference between the predicted value and the true value; the smaller the MSE was, the greater the fit) of the regression models. Then, the corresponding number of variables and specific variables were obtained. In addition, the highest scores were screened based on Bayesian information criterion (BIC) scores. Subsequently, the selected predictors were input into four machine learning classifiers: decision tree (coarse), support vector machine (SVM), K-value proximity (KNN), and ensemble (optimal subset). Finally, OS, DFS, RFS, and DMFS were respectively classified and predicted.

Optimizing the classification models and configuring hyperparameters

The optimizer was configured for Bayesian optimization and stochastic optimization, and the maximum number of iterations was 3000. The hyperparameter configuration was as follows: the learning rate was configured as 0.01, the initial configuration of the decision tree was Coarse Tree, the initial configuration of the SVM was Coarse Gaussian SVM, the initial configuration of the KNN was Coarse KNN, and the initial configuration of the integration was Subspace Discriminant. Other hyperparameters were default optimized values.

Statistical analysis

The data analysis used four regression methods (best subset regression, ridge regression, LASSO regression, and LASSO cross-validation regression), which were based on R language (version 4.1.1). These methods were used to perform variable screening on 23 features (except DFS, which had 24 features). Four machine learning algorithms were developed in MATLAB R2019b.

Results

Study population characteristics

The clinical and therapeutic characteristics of the study population are detailed in Table 1. In the OS model, 16.0% (166/1039) of the patients died 1 year after surgery, and 26.0% (44/169) and 37.7% (302/802) of the patients exhibited recurrence and distant metastasis, respectively, within 1 year after surgery. After the 5-year follow-up, 9.5% (99/1039) of the patients were still alive; additionally, 4.1% (7/169) and 3.7% (30/802) of the patients were without tumor recurrence and distant metastasis, respectively.

Table 1

Baseline features of included cohorts

Parameters

OS (N = 1039)

 

DFS (N = 874)

 

RFS (N = 169)

 

DMFS (N = 802)

Training set

(%, n = 1000)

Testing set

(%, n = 39)

 

Training set

(%, n = 849)

Testing set

(%, n = 25)

 

Training set

(%, n = 160)

Testing set

(%, n = 9)

 

Training set

(%, n = 784)

Testing set

(%, n = 18)

Age

                     

> 60

56.4

69.2

 

58.0

152.0

 

61.3

55.6

 

57.5

61.1

Sex

                     

Man

64.5

53.8

 

65.3

56.0

 

69.4

77.8

 

64.4

61.1

BMI

                     

< 18.5

3.7

0.0

 

3.8

0.0

 

3.1

0.0

 

3.8

0.0

18.5–23.9

45.8

56.4

 

43.6

36.0

 

42.5

77.8

 

43.0

44.4

24-27.9

36.0

30.8

 

39.2

13.0

 

40.0

11.1

 

40.6

38.9

≥ 28

14.5

12.8

 

13.4

12.0

 

14.4

11.1

 

12.6

16.7

HP

                     

Presence

27.5

48.7

 

25.9

36.0

 

25.0

44.4

 

26.4

50.0

DM

                     

Presence

13.4

17.9

 

12.6

16.0

 

15.0

11.1

 

12.6

22.2

CHD

                     

Presence

10.3

17.9

 

9.3

0.0

 

6.3

22.2

 

9.7

5.6

Smoking

                     

Presence

32.5

20.5

 

32.4

28.0

 

34.4

22.2

 

31.6

33.3

Drinking

                     

Presence

28.7

20.5

 

29.4

28.0

 

33.1

11.1

 

29.3

27.8

Family history of tumors

                     

Presence

14.1

17.9

 

14.7

20.0

 

16.9

22.2

 

13.8

27.8

Family history of gastroenterology tumors

                     

Presence

9.9

10.3

 

10.0

12.0

 

13.1

0.0

 

9.1

11.1

CEA

                     

High

55.5

33.3

 

57.7

36.0

 

50.6

66.7

 

58.3

33.3

CRP

                     

High

4.4

0.3

 

4.2

8.0

 

3.1

0.0

 

4.3

5.6

Tumor position

                     

Ascending colon

13.0

28.2

 

11.3

24.0

 

9.4

33.3

 

11.4

22.2

Transverse colon

4.5

5.1

 

3.3

8.0

 

7.5

0.0

 

3.1

11.1

Descending colon

2.4

5.1

 

1.8

12.0

 

1.9

0.0

 

1.9

5.6

Sigmoid colon

15.8

7.7

 

18.3

48.0

 

18.8

55.6

 

18.2

55.6

Rectum

64.3

53.8

 

65.4

8.0

 

62.5

11.1

 

65.4

5.6

Tumor differentiation grade

                     

High

0.9

2.6

 

0.4

0.0

 

0.6

0.0

 

0.3

0.0

Moderate

58.4

53.8

 

68.0

76.0

 

73.8

77.8

 

67.6

66.7

Low

40.7

43.6

 

31.7

24.0

 

30.0

22.2

 

32.1

33.3

Histological type

                     

AC

67.9

76.9

 

76.7

80.0

 

69.4

77.8

 

77.4

83.3

AMC

9.3

2.6

 

7.4

4.0

 

11.3

11.1

 

7.1

0.0

MA

20.2

20.5

 

14.0

12.0

 

18.8

11.1

 

13.4

16.7

SRCC

1.8

0.0

 

1.1

0.0

 

0.6

0.0

 

1.0

0.0

The others

0.8

0.0

 

0.8

4.0

 

0.0

0.0

 

1.0

0.0

Tumor size

                     

> 20mm

28.2

33.3

 

30.0

20.0

 

41.9

55.6

 

29.0

16.7

PNI

                     

Presence

49.3

41.0

 

51.9

52.0

 

49.4

22.2

 

52.3

50.0

LVI

                     

Presence

44.3

51.3

 

44.8

56.0

 

33.1

66.7

 

46.9

66.7

Lesion amount

                     

Unifocal

95.6

97.4

 

94.6

100.0

 

93.1

100.0

 

95.2

100.0

Ki-67 protein level

                     

High

46.9

28.2

 

52.9

52.0

 

56.9

33.3

 

52.9

38.9

Operation method

                     

Laparotomy

70.8

82.1

 

67.4

80.0

 

66.3

77.8

 

67.6

83.3

LNR

                     

≤ 0.25

84.4

94.9

 

83.5

84.0

 

85.0

77.8

 

83.5

83.3

0.26–0.5

14.4

5.1

 

14.7

16.0

 

14.4

22.2

 

14.7

16.7

0.51–0.75

1.0

0.0

 

1.6

0.0

 

0.6

0.0

 

1.7

0.0

> 0.75

0.2

0.0

 

0.1

0.0

 

0.0

0.0

 

0.1

0.0

TNM stage

                     

I

6.9

2.6

 

5.5

4.0

 

8.8

0.0

 

4.7

5.6

II

29.2

28.2

 

27.6

32.0

 

38.1

33.3

 

25.4

27.8

III

63.9

69.2

 

66.9

64.0

 

53.1

66.7

 

69.9

66.7

Disease status

                     

Recurrence

     

19.3

0.0

           

Distant metastasis

     

80.7

100.0

           

OS overall survival, DFS disease-free survival, RFS recurrence-free survival, DMFS distant metastasis-free survival, BMI body mass index, HP hypertension, DM diabetes mellitus, CHD chronic heart disease, CEA carcinoembryonic antigen, CRP C-reactive protein, AC adenocarcinoma, AMC adenocarcinoma with mucus composition, MA mucinous adenocarcinoma, SRCC signet ring cell carcinoma, PNI perineural invasion, LVI lymphvascular invasion, LNR lymph node ratio, TNM stage tumor node metastasis stage

Important model variables

The detailed MSEs are shown in Table 2. We chose subset regression for the patient OS model, and the important variables were tumor differentiation grade, Ki-67 protein level, histological type, TNM stage, and serum CRP level (5 in total, Fig. 2). LASSO regression was used for the DFS model, and the 5 vital variables were PNI, tumor differentiation grade, Ki-67 protein level, lesion amount, and TNM stage (Fig. 3 and Table 3). For RFS, we used ridge regression, and the four important features were DM, CHD, operation method, and age (Fig. 4 and Table 3). Subset regression was chosen for the DMFS model, and PNI, Ki-67 protein level, tumor differentiation grade, TNM stage, and histological type were the 5 important indicators (Fig. 5).

Table 2

Comparisons of the MSEs of four regression models

Regression models

MSEs

OS

DFS

RFS

DMFS

Subset regression model

0.4500633

0.3749525

0.7710959

0.4243934

Ridge regression model

0.4602181

0.3678196

0.7513997

0.4449402

LASSO regression model

0.5067479

0.3608976

0.8054069

0.4426869

LASSO cross-validation model

0.4890136

0.4112665

0.8090451

0.4822449

MSE mean square error, OS overall survival, DFS disease-free survival, RFS recurrence-free survival, DMFS distant metastasis-free survival, LASSO the least absolute shrinkage and selection operator

Table 3

Regression coefficients for each variable after LASSO regression on disease-free survival

No.

Intercept

s1

1

PNI

0.166095917

2

Tumor differentiation grade

0.145056847

3

Ki-67 protein level

0.125446209

4

Lesion amount

0.122986628

5

TNM stage

0.083527594

6

Family history of tumors

0.068736389

7

Histological type

0.068138058

8

Family history of gastrointestinal tumors

0.064282454

9

Age

0.052132288

10

Sex

0.049713378

11

Serum CRP level

0.049208782

12

CHD

0.038371955

13

Operation method

0.035456908

14

BMI

0.028557122

15

LNR

0.020353514

16

Smoking

0.018868793

17

Tumor location

0.015574714

18

Serum CEA level

0.012848902

19

Drinking

0.011499902

20

Tumor size

0.009350538

21

LVI

0.008701713

22

HP

0.006665916

23

DM

0.005961741

24

Disease status

0.002272804

LASSO the least absolute shrinkage and selection operator, PNI perineural invasion, TNM stage tumor node metastasis stage, CRP C-reactive protein, CHD chronic heart disease, BMI body mass index, LNR lymph node ratio, CEA carcinoembryonic antigen, LVI lymphvascular invasion, HP hypertension, DM diabetes mellitus

Table 4

Regression coefficients for each variable after ridge regression on recurrence-free survival

No.

Intercept

s1

1

DM

0.184923946

2

CHD

0.108369703

3

Operation method

0.103702878

4

Age

0.097713638

5

Smoking

0.097274712

6

PNI

0.080909316

7

Histological type

0.079653416

8

Family history of tumors

0.074394127

9

Ki-67 protein level

0.067737861

10

Drinking

0.067211567

11

LNR

0.065499595

12

Family history of gastrointestinal tumors

0.056467337

13

Lesion amount

0.047623412

14

Serum CRP level

0.042766487

15

TNM stage

0.037254881

16

Serum CEA level

0.033536644

17

Tumor location

0.02592845

18

BMI

0.020303689

19

Sex

0.0181275

20

Tumor differentiation grade

0.013301381

21

Tumor size

0.01124825

22

LVI

0.006359537

23

HP

0.005023189

DM diabetes mellitus, CHD chronic heart disease, PNI perineural invasion, LNR lymph node ratio, CRP C-reactive protein, TNM stage tumor node metastasis stage, CEA carcinoembryonic antigen, BMI body mass index, LVI lymphvascular invasion, HP hypertension

Model performance

The receiver operating characteristic (ROC) curves of the OS model obtained by the four ML algorithms are shown in Fig. 6 (Tree (AP: 67.8%; the area under the receiver operating characteristic curve, AUC: 0.69), SVM (AP: 69.5%; AUC: 0.70), KNN (AP: 68.7%; AUC: 0.71), and Ensemble (AP: 67.2%; AUC: 0.73)). Regarding DFS, Fig. 7 shows the four ROC curves in detail (Tree (AP: 82.4%; AUC: 0.50), SVM (AP: 82.4%; AUC: 0.56), KNN (AP: 82.4%; AUC: 0.69), and Ensemble (AP: 82.4%; AUC: 0.72). The obtained ROC curves for the RFS model are shown in Fig. 8 (Tree (AP: 82.0%; AUC: 0.45), SVM (AP: 82.0%; AUC: 0.61), KNN (AP: 82.0%; AUC: 0.58), and Ensemble (AP: 82.0%; AUC: 0.57)). Figure 9 shows the ROC curves for DMFS (Tree (AP: 82.1%; AUC: 0.49), SVM (AP: 82.8%; AUC: 0.61), KNN (AP: 82.8%; AUC: 0.71), and Ensemble (AP: 82.8%; AUC: 0.73)).

Discussion

In this study, we developed and validated a promising ML system for predicting the times of oncological outcomes, and we screened the important variables for each ML model. This system represents a simple, practical, and easily assessable tool that clinicians can refer to when selecting treatments. In addition, our system has good tolerance for heterogeneous patients and does not require clear patient medical histories, which lowers the threshold for use. Our work is different from previous risk forecast models. We considered temporal dynamics and found that patients’ prognoses could be understood longitudinally and in the long term. In addition, we screened uncommonly used indicators for predicting outcomes, which provides new insights for pathologists and basic science researchers and even for pharmacologists. Moreover, the adaptability and interpretability of our system promote its application in hospitals at different levels. Our work also demonstrated the feasibility of applying ML models to a large number of heterogeneous CRC patients.

Although several genetic and molecular markers have been proven to be correlated with patient prognoses (Hossain et al. 2021; Liu et al. 2022), we did not select them as potential variables. Because we aimed to build a clinically generalizable system that might be used in data-poor situations, the indicators we selected were clinically applicable and had small heterogeneity among patients. In addition, to avoid selection bias and limitations, we avoided the TNM stage-centric impasse by inputting a large number of potential variables into our work, allowing the ML algorithms to screen the important variables that performed best. Furthermore, there were no missing values in our database, which prevented bias caused by improper filling of missing values.

We managed to choose patients with a clear medical history before the operation who underwent curative initial treatment, so we excluded stage I, II, and III CRC patients who had neoadjuvant chemoradiotherapy (which may lead to a vague history). Patients with stage IV CRC who were not eligible for radical surgery were also excluded. Consequently, the model might also be a rough reference when physicians at higher-level hospitals are redesigning treatment strategies for patients from lower-level hospitals with unclear postoperation radiotherapy and chemotherapy histories. Treatment options for these patients are difficult to determine, and it is difficult for oncologists to obtain references from previous studies that stratify patients by chemotherapy regimens. Moreover, to avoid bias, we excluded patients who did not have endpoint data. When applying our system, patients predicted to not have outcomes would be classified as greater than 5 years (equivalent to oncological recurrence).

To better predict longitudinal oncological outcomes, we configured and optimized hyperparameters wherever possible. To better evaluate the fit of the models, we chose the classification predictors C statistics (to avoid the influence of the threshold value, we used the AUC) and AP, which corresponded to the nonparametric ML algorithms, to evaluate the accuracy of the prediction. In the process of screening variables, due to the influence of sample size, variables (the number of variables was greater than 20), and other factors, we chose methods based on multiple regression models. Evaluation indicators in the process of model selection included error sum of squares (SSE), MSE, and so on. However, the SSE value in this experiment was meaningless (SSE would inevitably increase when the sample size increased). We chose the MSE as the evaluation indicator. Since this experiment involved regression prediction based on small-sample data, there was no model overfitting state. Consequently, the default state of the system configuration was processed by us (when encountering no solution or a local optimal solution, it would be regarded as a convergence failure state) considering the development costs. We think the lower AUC for OS is a reasonable result given the biological complexity and the small sample size. Furthermore, to reduce bias, computer experts were blinded to the meaning of each indicator when building the ML models.

Extending the survival time is the shared goal of clinicians and oncology patients. Quantifying patient outcomes aids in shared decision making (Howard et al. 2020). Because of the heterogeneity of CRCs, physicians and patients must seriously consider the tradeoffs between adverse effects and benefits (Ganguli et al. 2022) when choosing a treatment strategy. It is possible to improve outcomes by closer follow-up or the administration of additional chemoradiotherapy to patients who are predicted to have poorer prognoses. Consequently, we suggest that patients who tend to have a shorter DMFS receive prophylactic chemotherapy or regional radiotherapy for the common metastatic sites of CRC described by Jiang B et al. (2021). Moreover, the identification of patients with better prognoses could reduce the cost of medical care and improve the level of humanistic care by reducing the psychological burden on patients and their families. Therefore, predictive tools such as our system are urgently needed in the clinic.

However, referring to the results output by "black box" systems for managing patients is not always acceptable by clinicians and patients (Pattarabanjird et al. 2022; Watson et al. 2019). The interpretability of models is vital, especially in biomedicine (Murdoch et al. 2019; Yu et al. 2018). To turn a "black box" system into a "white box" system, we screened out the corresponding predictors for different outcomes. TNM stage, the primary indicator for chemoradiotherapy decisions, was screened out in the OS, DFS, and DMFS models, which made our models more credible. Moreover, indicators that had been widely found to correlate with prognoses, such as PNI (Knijn et al. 2016; Nikberg et al. 2016; Song et al. 2019), pathological type (Kim et al. 2013; Nitsche et al. 2013; Sheng et al. 2019; Wu et al. 2019), and tumor differentiation grade (Garrity et al. 2004), were also selected in the models, which further confirmed the credibility of our system. One of the potential benefits of using ML models is that the important variables are identified, and less critical parameters are ignored. Several uncommon predictors were additionally included in our model, which provided new insights into predicting prognoses. The levels of CRP and Ki-67 were shown to be factors affecting prognoses, consistent with studies showing that high serum CRP levels are associated with higher postoperative complication rates (Domínguez-Comesaña et al. 2017; Muñoz et al. 2018) and that Ki-67 levels reflect the proliferative capacity of cells (Schlüter et al. 1993), especially the proliferative capacity of tumor cells (Duchrow et al. 1994; Starborg et al. 1996). Surprisingly, LVI was only modestly predictive. Our models also identified some predictors that were not previously considered to be directly linked to poor prognosis (unifocal vs. multifocal lesions and surgery vs. laparoscopy) (Barz and Stöss C 2021; Chin et al. 2019; Fleshman et al. 2019; Hida et al. 2018). These factors are more likely to be directly related to surgical trauma rather than survival time. For the RFS model, two chronic diseases, DM and CHD, were selected together with age, possibly due to insufficient sample size. In addition, we indirectly focused on elderly oncological patients who had baseline diseases.

Our findings showed the formidable predictive power of ML methods, particularly for heterogeneous diseases that are stratified by outcomes. ML has unique value in clinical applications; it can guide patient managements, improve patient outcomes, and tailor treatment regimens, especially when resources are scarce (when only clinicopathological and surgical variables are available for analysis). However, although we obtained encouraging high prediction accuracy and "white-box" results, more progress is needed before ML can be fully relied upon. In addition, in clinical practice, traditional performance measures such as the AUC must be translated into medically relevant measures to elucidate the patient-centric value of ML models. ML still ways to go.

Moreover, the limitations of our study must be noted. First, the sample size we used to collect data to input into the ML models was small (especially regarding RFS). When the data were input into the ML models for parameter optimization, the sensitivity of parameter adjustment could not be estimated due to the small sample size. Second, as a dilemma of ML (Yu et al. 2018), the sample uniformity in the data could not be estimated, which could affect the final results. Furthermore, this study was conducted based on a retrospective analysis. However, the patient data were obtained from a well-conceived and well-characterized cohort, which adds to the credibility of our results; thus, this study can serve as the basis for subsequent prospective studies. In addition, postoperative treatment information, such as specific radiotherapy and chemotherapy treatments as well as detailed surgical methods were not available in our database.

We believe that subsequent research could improve the accuracy of individual survival time prediction by employing other techniques, obtaining larger sample sizes, improving follow-up accuracy, and so on. Furthermore, it is necessary to add detailed studies on genomics and chemoradiotherapy regimens. Our next goal is to develop a predictive model system as an app and install it in a hospital system.

In conclusion, we successfully designed and validated clinicopathological-based ML prediction models. Our work might promote the application of precision treatments and improve clinical outcomes for CRC patients. We showed the potential of ML in improving the direction of treatment strategies.

Statements And Declarations

Acknowledgment The research was partially supported by the National Natural Science Foundation of China (Grant number 81802777).

Funding This work was supported by the National Natural Science Foundation of China (Grant number 81802777).

Competing Interests The authors have no relevant financial or non-financial interests to disclose.

Author Contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Xiaolin Ji, Shuo Xu and Xiaoyu Li. The first draft of the manuscript was written by Xiaolin Ji and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Data Availability The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code Availability Commercially available MATLAB software was used to process data and develop methods. The codes will be shared by the corresponding author with a request.

Ethics approval This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (reference number QYFYWZLL26957). 

Consent to participate The need for written informed consent was waived by the Ethics Committee of the Affiliate Hospital of Qingdao University due to retrospective nature of the study.

Consent to publish Not Applicable.

References

  1. Barz C and Stöss C, Neumann P, Wilhelm D, Janssen K, Friess H, Nitsche U (2021) Retrospective study of prognosis of patients with multiple colorectal carcinomas: synchronous versus metachronous makes the difference. Int J colorectal dis 36(7), 1487–1498. https://doi.org/10.1007/s00384-021-03926-6
  2. Beaton C, Twine C P, Williams GL, Radcliffe AG (2013) Systematic review and meta-analysis of histopathological factors influencing the risk of lymph node metastasis in early colorectal cancer. Colorectal Dis 15(7), 788–797. https://doi.org/10.1111/codi.12129
  3. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6), 394–424. doi:10.3322/caac.21492
  4. Chin C, Kuo Y, Chiang J (2019) Synchronous colorectal carcinoma: predisposing factors and characteristics. Colorectal Dis 21(4) 432–440. https://doi.org/10.1111/codi.14539
  5. Choi J Y, Jung SA, Shim KN, Cho WY, Keum B, Byeon JS, Huh KC, Jang BI, Chang DK, Jung HY, Kong KA (2015) Meta-analysis of predictive clinicopathologic factors for lymph node metastasis in patients with early colorectal carcinoma. J Korean Med Sci 30(4), 398–406. https://doi.org/10.3346/jkms.2015.30.4.398
  6. Chu Q, Zhou M, Medeiros K, Peddi P, Kavanaugh M, Wu X (2016) Poor survival in stage IIB/C (T4N0) compared to stage IIIA (T1-2 N1, T1N2a) colon cancer persists even after adjusting for adequate lymph nodes retrieved and receipt of adjuvant chemotherapy. BMC Cancer 16, 460. https://doi.org/10.1186/s12885-016-2446-3
  7. D'Ascenzo F, De Filippo O, Gallone G, Mittone G, Deriu M, Iannaccone M, Ariza-Solé A, Liebetrau C, Manzano-Fernández S, Quadri G, Kinnaird T, Campo G, Simao Henriques J, Hughes J, Dominguez-Rodriguez A, Aldinucci M, Morbiducci U, Patti G, Raposeiras-Roubin S, Abu-Assi E, De Ferrari G (2021) Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets. Lancet 397(10270), 199–207. https://doi.org/10.1016/s0140-6736(20)32519-8
  8. Dienstmann R, Mason M, Sinicrope F, Phipps A, Tejpar S, Nesbakken A, Danielsen S, Sveen A, Buchanan D, Clendenning M, Rosty C, Bot B, Alberts S, Milburn Jessup J, Lothe R, Delorenzi M, Newcomb P, Sargent D, Guinney J (2017) Prediction of overall survival in stage II and III colon cancer beyond TNM system: a retrospective, pooled biomarker study. Ann Oncol 28(5), 1023–1031. https://doi.org/10.1093/annonc/mdx052
  9. Domínguez-Comesaña E, Estevez-Fernández S, López-Gómez V, Ballinas-Miranda J, Domínguez-Fernández R (2017) Procalcitonin and C-reactive protein as early markers of postoperative intra-abdominal infection in patients operated on colorectal cancer. Int J Colorectal Dis 32(12), 1771–1774. https://doi.org/10.1007/s00384-017-2902-9
  10. Duchrow M, Gerdes J, Schlüter C (1994) The proliferation-associated Ki-67 protein: definition in molecular terms. Cell prolif 27(5), 235–242. https://doi.org/10.1111/j.1365-2184.1994.tb01421.x
  11. Fleshman J, Branda M, Sargent D, Boller A, George V, Abbas M, Peters W, Maun D, Chang G, Herline A, Fichera A, Mutch M, Wexner S, Whiteford M, Marks J, Birnbaum E, Margolin D, Larson D, Marcello P, Posner M, Read T, Monson J, Wren S, Pisters P, Nelson H (2019) Disease-free Survival and Local Recurrence for Laparoscopic Resection Compared With Open Resection of Stage II to III Rectal Cancer: Follow-up Results of the ACOSOG Z6051 Randomized Controlled Trial. Ann Surg 269(4), 589–595. https://doi.org/10.1097/sla.0000000000003002
  12. Ganguli R, Franklin J, Yu X, Lin A, Heffernan D (2022) Machine learning methods to predict presence of residual cancer following hysterectomy. Sci Rep 12(1), 2738. https://doi.org/10.1038/s41598-022-06585-x
  13. Garrity M, Burgart L, Mahoney M, Windschitl H, Salim M, Wiesenfeld M, Krook J, Michalak J, Goldberg R, O'Connell M, Furth A, Sargent D, Murphy L, Hill E, Riehle D, Meyers C, Witzig T (2004) Prognostic value of proliferation, apoptosis, defective DNA mismatch repair, and p53 overexpression in patients with resected Dukes' B2 or C colon cancer: a North Central Cancer Treatment Group Study. J Clin Oncol 22(9), 1572–1582. https://doi.org/10.1200/jco.2004.10.042
  14. Greene F, Stewart A, Norton H (2002) A new TNM staging strategy for node-positive (stage III) colon cancer: an analysis of 50,042 patients. Ann Surg 236(4), 416–421; discussion 421. https://doi.org/10.1097/00000658-200210000-00003
  15. Grimm L, Plichta J, Hwang E (2022) More Than Incremental: Harnessing Machine Learning to Predict Breast Cancer Risk. J Clin Oncol JCO2102733. https://doi.org/10.1200/jco.21.02733
  16. Ha GW, Kim JH, Lee MR (2017) Oncologic Impact of Anastomotic Leakage Following Colorectal Cancer Surgery: A Systematic Review and Meta-Analysis. Ann Surg Oncol 24(11), 3289–3299. https://doi.org/10.1245/s10434-017-5881-8
  17. Hida K, Okamura R, Sakai Y, Konishi T, Akagi T, Yamaguchi T, Akiyoshi T, Fukuda M, Yamamoto S, Yamamoto M, Nishigori T, Kawada K, Hasegawa S, Morita S, Watanabe M (2018) Open versus Laparoscopic Surgery for Advanced Low Rectal Cancer: A Large, Multicenter, Propensity Score Matched Cohort Study in Japan. Ann Surg 268(2), 318–324. https://doi.org/10.1097/sla.0000000000002329
  18. Hossain M, Chowdhury U, Islam M, Uddin S, Ahmed M, Quinn J, Moni M (2021) Machine learning and network-based models to identify genetic risk factors to the progression and survival of colorectal cancer. Comput Biol Med 135, 104539. https://doi.org/10.1016/j.compbiomed.2021.104539
  19. Howard F, Kochanny S, Koshy M, Spiotto M, Pearson A (2020) Machine Learning-Guided Adjuvant Treatment of Head and Neck Cancer. JAMA Netw Open 3(11), e2025881. https://doi.org/10.1001/jamanetworkopen.2020.25881
  20. Jiang B, Mu Q, Qiu F, Li X, Xu W, Yu J, Fu W, Cao Y, Wang J (2021) Machine learning of genomic features in organotropic metastases stratifies progression risk of primary tumors. Nat Commum 12(1), 6692. https://doi.org/10.1038/s41467-021-27017-w
  21. Kalager M, Wieszczy P, Lansdorp-Vogelaar I, Corley D, Bretthauer M, Kaminski M (2018) Overdiagnosis in Colorectal Cancer Screening: Time to Acknowledge a Blind Spot. Gastroenterology 155(3), 592–595. https://doi.org/10.1053/j.gastro.2018.07.037
  22. Kim M, Chen C, Wang P, Mulvey J, Yang Y, Wun C, Antman-Passig M, Luo H, Cho S, Long-Roche K, Ramanathan L, Jagota A, Zheng M, Wang Y, Heller D (2022) Detection of ovarian cancer via the spectral fingerprinting of quantum-defect-modified carbon nanotubes in serum by machine learning. Nat Biomed Eng 6(3), 267–275. https://doi.org/10.1038/s41551-022-00860-y
  23. Kim S, Shin S, Lee K, Kim H, Kim T, Kang D, Hur H, Min B, Kim N, Chung H, Roh J, Ahn J (2013) Prognostic value of mucinous histology depends on microsatellite instability status in patients with stage III colon cancer treated with adjuvant FOLFOX chemotherapy: a retrospective cohort study. Ann Surg Oncol 20(11), 3407–3413. https://doi.org/10.1245/s10434-013-3169-1
  24. Knijn N, Mogk S, Teerenstra S, Simmer F, Nagtegaal I (2016) Perineural Invasion is a Strong Prognostic Factor in Colorectal Cancer: A Systematic Review. Am J Surg Pathol 40(1), 103–112. https://doi.org/10.1097/pas.0000000000000518
  25. Koncina E, Haan S, Rauh S, Letellier E (2020) Prognostic and Predictive Molecular Biomarkers for Colorectal Cancer: Updates and Challenges. Cancers (Basel) 12(2). https://doi.org/10.3390/cancers12020319
  26. Liang H, Tsui B, Ni H, Valentim C, Baxter S, Liu G, Cai W, Kermany D, Sun X, Chen J, He L, Zhu J, Tian P, Shao H, Zheng L, Hou R, Hewett S, Li G, Liang P, Zang X, Zhang Z, Pan L, Cai H, Ling R, Li S, Cui Y, Tang S, Ye H, Huang X, He W, Liang W, Zhang Q, Jiang J, Yu W, Gao J, Ou W, Deng Y, Hou Q, Wang B, Yao C, Liang Y, Zhang S, Duan Y, Zhang R, Gibson S, Zhang C, Li O, Zhang E, Karin G, Nguyen N, Wu X, Wen C, Xu J, Xu W, Wang B, Wang W, Li J, Pizzato B, Bao C, Xiang D, He W, He S, Zhou Y, Haw W, Goldbaum M, Tremoulet A, Hsu C, Carter H, Zhu L, Zhang K, Xia H (2019) Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med 25(3), 433–438. https://doi.org/10.1038/s41591-018-0335-9
  27. Liang R, Xie J, Zhang C, Zhang M, Huang H, Huo H, Cao X, Niu B (2019) Identifying Cancer Targets Based on Machine Learning Methods via Chou's 5-steps Rule and General Pseudo Components. Curr Top Med Chem 19(25), 2301–2317. https://doi.org/10.2174/1568026619666191016155543
  28. Liu C, Zhao J, Lu W, Dai Y, Hockings J, Zhou Y, Nussinov R, Eng C, Cheng F (2020) Individualized genetic network analysis reveals new therapeutic vulnerabilities in 6,700 cancer genomes. PLoS Comput Biol 16(2), e1007701. https://doi.org/10.1371/journal.pcbi.1007701
  29. Liu Z, Liu L, Weng S, Guo C, Dang Q, Xu H, Wang L, Lu T, Zhang Y, Sun Z, Han X (2022) Machine learning-based integration develops an immune-derived lncRNA signature for improving outcomes in colorectal cancer. Nat Commun 13(1), 816. https://doi.org/10.1038/s41467-022-28421-6
  30. Ma C, Teriaky A, Sheh S, Forbes N, Heitman S, Jue T, Munroe C, Jairath V, Corley D, Lee J (2019) Morbidity and Mortality After Surgery for Nonmalignant Colorectal Polyps: A 10-Year Nationwide Analysis. Am J gastroenterol 114(11), 1802–1810. https://doi.org/10.14309/ajg.0000000000000407
  31. Metsky H, Welch N, Pillai P, Haradhvala N, Rumker L, Mantena S, Zhang Y, Yang D, Ackerman C, Weller J, Blainey P, Myhrvold C, Mitzenmacher M, Sabeti P (2022) Designing sensitive viral diagnostics with machine learning. Nat Biotechnol https://doi.org/10.1038/s41587-022-01213-5
  32. Mitchell TM (2003) Machine Learning. Machine Learning.
  33. Mo S, Dai W, Xiang W, Huang B, Li Y, Feng Y, Li Q, Cai G (2018) Survival Contradiction Between Stage IIA and Stage IIIA Rectal Cancer: A Retrospective Study. J Cancer 9(8), 1466–1475. https://doi.org/10.7150/jca.23311
  34. Muñoz J, Alvarez M, Cuquerella V, Miranda E, Picó C, Flores R, Resalt-Pereira M, Moya P, Pérez A, Arroyo A (2018) Procalcitonin and C-reactive protein as early markers of anastomotic leak after laparoscopic colorectal surgery within an enhanced recovery after surgery (ERAS) program. Surg Endosc 32(9), 4003–4010. https://doi.org/10.1007/s00464-018-6144-x
  35. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci U S A 116(44), 22071–22080. https://doi.org/10.1073/pnas.1900654116
  36. Nagtegaal I, Quirke P, Schmoll H (2011) Has the new TNM classification for colorectal cancer improved care? Nature reviews. Nat Rev Clin Oncol 9(2), 119–123. https://doi.org/10.1038/nrclinonc.2011.157
  37. Nikberg M, Chabok A, Letocha H, Kindler C, Glimelius B, Smedh K (2016) Lymphovascular and perineural invasion in stage II rectal cancer: a report from the Swedish colorectal cancer registry. Acta Oncol 55(12), 1418–1424. https://doi.org/10.1080/0284186x.2016.1230274
  38. Nitsche U, Zimmermann A, Späth C, Müller T, Maak M, Schuster T, Slotta-Huspenina J, Käser S, Michalski C, Janssen K, Friess H, Rosenberg R, Bader F (2013) Mucinous and signet-ring cell colorectal cancers differ from classical adenocarcinomas in tumor biology and prognosis. Ann Surg 258(5), 775–782; discussion 782 – 773. https://doi.org/10.1097/SLA.0b013e3182a69f7e
  39. Pattarabanjird T, McNamara C (2022) The clinicians’ perspectives on machine learning. Nat Cardiovasc Res 1(3), 189–190. https://doi.org/10.1038/s44161-022-00033-9
  40. Peery A, Cools K, Strassle P, McGill S, Crockett S, Barker A, Koruda M, Grimm I (2018) Increasing Rates of Surgery for Patients With Nonmalignant Colorectal Polyps in the United States. Gastroenterology 154(5), 1352–1360.e1353. https://doi.org/10.1053/j.gastro.2018.01.003
  41. Rekhraj S, Aziz O, Prabhudesai S, Zacharakis E, Mohr F, Athanasiou T, Darzi A, Ziprin P (2008) Can intra-operative intraperitoneal free cancer cell detection techniques identify patients at higher recurrence risk following curative colorectal cancer resection: a meta-analysis. Ann Surg Oncol 15(1), 60–68. https://doi.org/10.1245/s10434-007-9591-5
  42. Schlüter C, Duchrow M, Wohlenberg C, Becker M, Key G, Flad H, Gerdes J (1993) The cell proliferation-associated antigen of antibody Ki-67: a very large, ubiquitous nuclear protein with numerous repeated elements, representing a new kind of cell cycle-maintaining proteins. J Cell Biol 123(3), 513–522. https://doi.org/10.1083/jcb.123.3.513
  43. Sheng H, Wei X, Mao M, He J, Luo T, Lu S, Zhou L, Huang Z, Yang A (2019) Adenocarcinoma with mixed subtypes is a rare but aggressive histologic subtype in colorectal cancer. BMC Cancer 19(1), 1071. https://doi.org/10.1186/s12885-019-6245-5
  44. Song J, Yu M, Kang K, Lee J, Kim S, Nam T, Jeong J, Jang H, Lee J, Jung J (2019) Significance of perineural and lymphovascular invasion in locally advanced rectal cancer treated by preoperative chemoradiotherapy and radical surgery: Can perineural invasion be an indication of adjuvant chemotherapy? Radiother Oncol 133, 125–131. https://doi.org/10.1016/j.radonc.2019.01.002
  45. Starborg M, Gell K, Brundell E, Höög C (1996) The murine Ki-67 cell proliferation antigen accumulates in the nucleolar and heterochromatic regions of interphase cells and at the periphery of the mitotic chromosomes in a process essential for cell cycle progression. J Cell Sci 143–153. https://doi.org/10.1242/jcs.109.1.143
  46. Tripepi G, Jager K, Dekker F, Zoccali C (2009) Diagnostic methods 2: receiver operating characteristic (ROC) curves. Kidney Int 76(3), 252–256. https://doi.org/10.1038/ki.2009.171
  47. Vermeer N, Snijders H, Holman F, Liefers G, Bastiaannet E, van de Velde C, Peeters K (2017) Colorectal cancer screening: Systematic review of screen-related morbidity and mortality. Cancer Treat Rev 54, 87–98. https://doi.org/10.1016/j.ctrv.2017.02.002
  48. Watson D, Krutzinna J, Bruce I, Griffiths C, McInnes I, Barnes M, Floridi L (2019) Clinical applications of machine learning algorithms: beyond the black box. BMJ 364, l886. https://doi.org/10.1136/bmj.l886
  49. Wu X, Lin H, Li S (2019) Prognoses of different pathological subtypes of colorectal cancer at different stages: A population-based retrospective cohort study. BMC gastroenterol 19(1), 164. https://doi.org/10.1186/s12876-019-1083-0
  50. Xie C and Zhuang XX and Niu Z and Ai R, Lautrup S, Zheng S, Jiang Y, Han R, Gupta TS, Cao S, Lagartos-Donate MJ, Cai CZ, Xie LM, Caponio D, Wang WW, Schmauck-Medina T, Zhang J, Wang Hl, Lou G, Xiao X, Zheng W, Palikaras K, Yang G, Caldwell KA, Caldwell GA, Shen HM, Nilsen H, Lu JH, Fang EF (2022). Amelioration of Alzheimer’s disease pathology by mitophagy inducers identified via machine learning and a cross-species workflow. Nat Biomed Eng 6(1), 76–93. https://doi.org/10.1038/s41551-021-00819-5
  51. Yala A, Mikhael P, Strand F, Lin G, Smith K, Wan Y, Lamb L, Hughes K, Lehman C, Barzilay R (2021) Toward robust mammography-based models for breast cancer risk. Sci Transl Med 13(578). https://doi.org/10.1126/scitranslmed.aba4373
  52. Yu M, Ma J, Fisher J, Kreisberg J, Raphael B, Ideker T (2018) Visible Machine Learning for Biomedicine. Cell 173(7), 1562–1565. https://doi.org/10.1016/j.cell.2018.05.056