Prognostic Prediction Models for Postoperative Patients with Stage I to III Colorectal Cancer: A Retrospective Study Based on Machine Learning Methods

DOI: https://doi.org/10.21203/rs.3.rs-1572496/v3

Abstract

Background

To utilize the patient, tumor, and treatment features and compare the performance of machine learning algorithms, develop and validate models to predict overall, disease-free, recurrence-free, and distant metastasis-free survival, and screen important variables to improve the prognosis of patients in clinical settings.

Methods

More than 1,000 colorectal cancer patients who underwent curative resection were grouped according to 4 survival times (further categorized by 3- and 5-year) and divided into training sets and testing sets (9:1). Each 3-catergory survival time was predicted by 4 machine learning algorithms. The area under the receiver operating characteristic curve (AUC) and average precision (AP) were our accuracy indicators. Vital parameters were screened by multivariate regression models. To achieve better prediction of multi-categorized survival times, we performed 10-fold cross-validation except for the recurrence-free survival model (5-fold cross-validation). We iterated 1000 times after hyperparameter optimization.

Results

The best AUCs were all greater than 0.90 except for the overall survival model (0.86). The best AP of the disease-free and distant metastasis-free survival models was 82.7%. The models performed well. Some of the important variables we screened were widely used important predictors for colorectal cancer patients’ prognoses, while others were not. Regarding algorithm performance, Logistic Regression, Linear Discriminant Analysis, and Support Vector Machine were chosen for recurrence-free and distant metastasis-free, overall, and disease-free models.

Conclusions

We constructed an independent, high-accuracy, important variable clarified machine learning architecture for predicting 3-catergorized survival times. This architecture might be a vital reference when managing colorectal cancer patients.

Introduction

Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates [1]. The high mortality rate is due to the progression of the disease and inadequate treatment strategies [2]. Furthermore, overdiagnosis, overtreatment, false positives, false reassurance, uncertain findings, and complications are common and lead to unnecessary psychological burden on patients [36]. Therefore, accurate prediction of the prognosis of CRC patients is a vital reference when making clinical decisions. The American Joint Committee on Cancer (AJCC) classification system for CRC is the primary tool for predicting the prognosis of CRC and especially for making adjuvant chemoradiotherapy decisions [7, 8]. However, the survival observations associated with the AJCC classifications for CRC patients have been reported to exhibit certain inconsistencies [911]. Consequentially, several systematic reviews (with/without meta-analyses) have been carried out to investigate and build prognostic models for CRC; some of these models include the TNM stage, others do not [1215]. The outcomes of these models have been unsatisfactory, however, and likely due to methodological limitations. Thus, we explored the possibility of using machine learning (ML) to build a prognostically predictive model for CRC. ML is a branch of artificial intelligence (AI) in which a computer generates rules underlying or based on raw data [16]; the use of ML in medicine has gradually become common [1720]. ML can be used to directly compare the accuracy of two or more quantitative tests for the same disease/condition [21]. Several rules for diagnosis and treatment have recently been formulated [2224], and ML algorithms have been used to construct risk forecast models that predict the hazard ratio of adverse events based on a certain time [25, 26] or predict the possible classification of double-classified/multiclassified endpoints based on a specific time [27]. However, these models cannot indicate the approximate time of the occurrence of specific oncological outcomes for CRC patients from the longitudinal angle. In other words, they were not feasible for predicting the multiclassified survival times of patients. In addition, the important variables of some of the models were unknown, which reduced the clinical credibility.

In the present study, we used previously designed ML models as a basis to develop a new ML architecture for predicting the 3-categorized occurrence time (3- and 5-year were the cutoff) of 4 oncological outcomes (death, tumor recurrence/distant metastasis, tumor recurrence, and tumor distant metastasis) in patients with stage I, II, and III CRC who underwent curative resection. Our predictive angle is different from those of previous studies. This architecture could serve as a reference for clinicians when selecting treatment strategies for CRC patients. Furthermore, it might prompt communication on the purpose of treatments between doctors and patients as well as reduce the psychological burden on patients and their families. In addition, we screened the important variables and their important order that affect the outcomes; in other words, we showed the leading features that were initially unclarified to improve the clinical credibility of the ML models. In addition, physicians who do not have the opportunity to apply ML to their work can use our research to make more accurate prognostic assessments and select management methods.

Materials And Methods

Case selection

This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (Grant number QYFYWZLL26957). The need for written informed consent was waived by the Ethics Committee of the Affiliate Hospital of Qingdao University due to retrospective nature of the study. We retrospectively analyzed the data of patients who underwent curative operations for primary stage I, II, and III CRC at the Affiliated Hospital of Qingdao University from 2001 to 2020. Patients with neoadjuvant chemoradiotherapies and patients with clarified noncancer-specific deaths were cautiously excluded, the postoperative adjuvant chemoradiotherapy history of our CRC patients was unclear, and the data were acquired through the hospital information system. A detailed flowchart is shown in Fig. 1.

Potential variables included age, sex, body mass index (BMI), hypertension (HP), diabetes mellitus (DM), chronic heart disease (CHD), smoking history, drinking history, family history of tumors, family history of gastrointestinal tumors, serum carcinoembryonic antigen (CEA) level, serum C-reactive protein (CRP) level, tumor position (ascending colon vs. transverse colon vs. descending colon vs. sigmoid colon vs. rectum), tumor differentiation grade, histological type, tumor size (diameter, 20 mm was the cutoff), perineural invasion (PNI), lymphvascular invasion (LVI), lesion amount (unifocal vs. multifocal), Ki-67 protein level, operation method (laparotomy vs. laparoscopy), lymph node ratio(LNR), and tumor node metastasis (TNM) stage. Additionally, disease status (recurrence vs. distant metastasis) was added to the disease-free survival (DFS) model. These characteristics mainly involved patient demographics, health, tumor characteristics, and treatment. There were no missing data.

Regression model selection

Best subset selection regression

To identify the optimal subset, the regularization technique was used. Since it is not feasible to test 2p models in this experiment, the regularization technique was necessary. Our work only considered fitting one model for each λ value, which would greatly improve the efficiency. In the linear model, in regard to the bias trade-off problem, the relationship between the response variable and the predictor variable was close to linear, and the least squares estimate was close to unbiased but may had a high variance in linear models. Thus, a small change could lead to large changes in the results of the least squares coefficient estimates. Regularization optimized the bias trade-off through proper I selection and normalization, thereby improving the effect of model fitting. Regularization of coefficients also solved the problem of overfitting, which is caused by multicollinearity. Using regularization techniques [28], we keep all the features regardless of whether there are few or many clinical features, but we will reduce the size of the parameters before the features. The specific modification scheme modifies the form of the loss function in LR to reduce the influence of a small number of clinical features.

Ridge regression

In ridge regression, the normalization term was the sum of the squares of all coefficients, called the L2 norm, which was used for minimizing RSS+(sumj2) in this paper. When λ increased, the coefficient decreased (tending to but never reaching 0). The advantage of ridge regression is that it can improve the prediction accuracy. Because coefficients of 0 are not possible with ridge regression, the least absolute shrinkage and selection operator (LASSO) regression test was conducted.

LASSO regression

Different from ridge regression, the L1 norm (the sum of the absolute values of all feature weights), which needs to minimize RSS + λ(sum |βj|), was used for LASSO regression. This shrinkage penalty decreased the feature weights to 0, which greatly improved the interpretability of the model and represented an important advantage when compared with ridge regression and other regressions.

LASSO cross-validation model

In this experiment, by default, 10-fold cross-validation was used for LASSO regression. In κ-fold cross-validation, the data were divided into identical subsets, whose amount was κ. κ-1 subsets were contained to fit the model, and then the remaining subsets were considered the text set. Then, the results of κ-fold fitting (usually taking the default average) were combined to determine the final parameters. In this paper, each subset was used as a test set only once. Thus, it was very easy to use κ-fold cross-validation, and the results included the value of each fit and the mean squared error (MSE) of the response.

Outcome selection

The primary outcomes were the 3-catergorized survival times (3- and 5-year were the cutoff). The four survival times were overall survival (OS), DFS, recurrence-free survival (RFS), and distant metastasis-free survival (DMFS), which were defined as the date of surgery to the date of patient’s death, tumor recurrence/distant metastasis, tumor recurrence, and tumor distant metastasis, respectively. The secondary outcomes were the important variables screened from each ML model.

Machine learning model training and validation

Figure 1 shows the process of set division (average precision, APs were the best at this ratio). All the variables were screened through four multivariate regression methods. The most appropriate regression models were selected based on the MSE (the difference between the predicted value and the true value; the smaller the MSE was, the greater the fit) of the regression models. Then, the corresponding number of variables and specific variables were obtained. In addition, the highest scores were screened based on Bayesian information criterion (BIC) scores. Subsequently, the selected predictors were input into three ML classifiers: Logistic Regression (LR), Linear Discriminant Analysis (LDA), and Support Vector Machine (SVM); and one ML estimator: K Nearest Neighbor (KNN). Finally, OS, DFS, RFS, and DMFS were respectively classified and predicted.

Optimizing the classification models and configuring hyperparameters

The optimizer was configured for Bayesian optimization and stochastic optimization, and the maximum number of iterations was 1000. The hyperparameter configuration was as follows: the learning rate was configured as 0.01, the initial configuration of the SVM was Coarse Gaussian SVM, and the initial configuration of the KNN was Coarse KNN. Other hyperparameters were default optimized values. The OS, DFS, and DMFS datasets were configured with 10-fold cross-validation, while the RFS dataset was configured with 5-fold cross-validation.

Proposed architecture

The technical architecture in this article was the R language & ML model. The specific implementation plan was to give the data processed by the R language to the ML model that had been built and finally gave the prediction results of the entire architecture. In previous work, only processing data in the R language environment were used to build models, or only prediction tasks were performed in ML models. Our work combined the advantages of the R language itself with mature ML models and finally integrated as the technical architecture of the article. Our data analysis used four regression methods (best subset regression, ridge regression, LASSO regression, and LASSO cross-validation regression), which were based on the R language (version 4.1.1), to perform variable screening on 23 features (except DFS, which had 24 features). Subsequently, four ML algorithms were developed in Python 3.8.

Results

Study population characteristics

The clinical and therapeutic characteristics of the study population are detailed in Table 1. In addition, we further added χ2 tests/Fisher's exact tests to investigate the variable distributions between the training sets and testing sets of the four models (OS model, DFS model, RFS model, and DMFS model) and found that there were almost no statistically significant differences in the baseline characteristics between the training set and the testing set in each model. In the OS model, 66.3% (689/1039) of the patients died 3 years after surgery, 81.9% (716/874), 81.7% (138/169) and 82.7% (663/802) of the patients exhibited recurrence/distant metastasis, recurrence, and distant metastasis, respectively, within 3 years after surgery. After the 5-year follow-up, 18.5% (192/1039) of the patients were still alive; additionally, 3.9% (34/874), 4.7% (8/169) and 3.4% (27/802) of the patients were without tumor recurrence/distant metastasis, recurrence, and distant metastasis, respectively.

Table 1

Baseline features of included cohorts

Parameters

OS (N = 1039)

 

DFS (N = 874)

 

RFS (N = 169)

 

DMFS (N = 802)

Training set

(%, n = 935)

Testing set

(%, n = 104)

 

Training set

(%, n = 787)

Testing set

(%, n = 87)

 

Training set

(%, n = 152)

Testing set

(%, n = 17)

 

Training set

(%, n = 722)

Testing set

(%, n = 80)

Age

                     

> 60

57.6

50.0

 

57.9

56.3

 

61.2

58.5

 

56.4

68.8

Sex

                     

Man

63.6

65.4

 

64.8

66.7

 

69.1

76.5

 

64.1

66.3

BMI

                     

< 18.5

3.6

2.9

 

3.9

1.1

 

3.3

0.0

 

3.5

6.3

18.5–23.9

45.8

50.0

 

43.6

41.4

 

44.1

47.1

 

42.7

46.3

24-27.9

36.0

33.7

 

39.3

42.5

 

40.1

23.5

 

41.6

31.3

≥ 28

14.5

13.5

 

13.2

14.9

 

12.5

29.4

 

12.3

16.3

HP

                     

Absence

29.2

21.2

 

26.3

34.5

 

26.3

23.5

 

28.1

16.2

DM

                     

Absence

12.5

13.7

 

12.8

11.5

 

12.5

35.3

 

13.3

8.8

CHD

                     

Absence

11.2

4.8

 

8.9

11.5

 

7.9

0.0

 

9.8

7.5

Smoking

                     

Absence

30.8

32.2

 

32.3

32.2

 

33.6

35.3

 

30.7

40.0

Drinking

                     

Absence

26.0

28.7

 

29.0

33.3

 

32.9

23.5

 

28.7

35.0

Family history of tumors

                     

Absence

15.4

14.1

 

14.7

16.1

 

16.4

23.5

 

14.5

10.0

Family history of gastroenterology tumors

                     

Absence

12.5

9.6

 

10.2

9.2

 

11.2

23.5

 

9.1

8.8

CEA

                     

High

55.4

48.1

 

58.6

56.9

 

52.0

47.1

 

56.8

66.3

CRP

                     

High

4.4

3.8

 

4.3

4.6

 

3.3

0.0

 

4.3

5.0

Tumor position

                     

Ascending colon

13.7

12.5

 

11.9

9.2

 

9.9

17.6

 

11.9

8.8

Transverse colon

4.8

1.9

 

2.9

8.0

 

6.6

11.8

 

3.3

2.5

Descending colon

2.0

6.7

 

1.5

6.9

 

1.3

5.9

 

1.7

5.0

Sigmoid colon

16.7

22.1

 

18.8

21.8

 

21.7

11.8

 

18.3

26.3

Rectum

62.8

56.7

 

64.8

54.0

 

60.5

52.9

 

64.8

57.5

Tumor differentiation grade

                     

High

1.1

0.0

 

0.4

0.0

 

0.7

0.0

 

0.3

0.0

Moderate

57.5

64.4

 

67.7

72.4

 

68.4

82.4

 

67.0

72.5

Low

41.4

35.6

 

31.9

27.6

 

30.9

17.6

 

32.7

27.5

Histological type

                     

AC

68.9

62.5

 

77.0

74.7

 

73.0

41.2

 

77.8

75.0

AMC

8.4

14.4

 

7.1

9.2

 

11.8

5.9

 

6.4

12.5

MA

20.2

20.2

 

13.7

16.1

 

14.5

52.9

 

13.6

12.5

SRCC

1.6

2.9

 

1.1

0.0

 

0.7

0.0

 

1.1

0.0

The others

0.9

0.0

 

1.0

0.0

 

0.0

0.0

 

1.1

0.0

Tumor size

                     

> 20mm

29.1

22.1

 

29.6

31.0

 

42.8

41.2

 

28.0

35.0

PNI

                     

Absence

48.7

51.9

 

51.2

58.6

 

46.7

58.8

 

51.8

56.3

LVI

                     

Absence

44.9

41.3

 

45.4

42.5

 

34.2

41.2

 

46.5

55.0

Lesion amount

                     

Unifocal

95.7

95.2

 

94.9

93.1

 

93.4

94.1

 

95.4

93.8

Ki-67 protein level

                     

High

46.6

42.3

 

52.9

52.9

 

55.3

58.8

 

53.0

48.8

Operation method

                     

Laparotomy

71.4

69.2

 

67.9

66.7

 

67.8

58.8

 

68.4

63.7

LNR

                     

≤ 0.25

85.5

78.8

 

83.9

80.5

 

84.2

88.2

 

83.4

85.0

0.26–0.5

13.5

19.2

 

14.4

18.4

 

15.1

11.8

 

14.7

15.0

0.51–0.75

0.9

1.9

 

1.7

1.1

 

0.7

0.0

 

1.8

0.0

> 0.75

0.2

0.0

 

0.1

0.0

 

0.0

0.0

 

0.1

0.0

TNM stage

                     

I

6.8

5.8

 

5.5

5.7

 

6.6

23.5

 

4.8

3.8

II

28.9

31.7

 

27.6

28.7

 

42.1

0.0

 

24.9

30.0

III

64.3

62.5

 

67.0

65.5

 

51.3

76.5

 

70.2

66.3

Disease status

                     

Recurrence

     

18.7

19.5

           

Distant metastasis

     

81.3

80.5

           

OS overall survival, DFS disease-free survival, RFS recurrence-free survival, DMFS distant metastasis-free survival, BMI body mass index, HP hypertension, DM diabetes mellitus, CHD chronic heart disease, CEA carcinoembryonic antigen, CRP C-reactive protein, AC adenocarcinoma, AMC adenocarcinoma with mucus composition, MA mucinous adenocarcinoma, SRCC signet ring cell carcinoma, PNI perineural invasion, LVI lymphvascular invasion, LNR lymph node ratio, TNM stage tumor node metastasis stage

Important model variables

Important variables varied by study outcomes. The detailed MSEs are shown in Table 2. We chose subset regression for the patient OS model, and the most vital variables to predict patients’ death was tumor differentiation grade, when tumor differentiation grade, Ki-67 protein level, histological type, TNM stage, and serum CRP level were the leading features to predict multi-categorized OS (5 in total, Additional files 1). LASSO regression was used for the DFS model, and the 5 important indictors were PNI, tumor differentiation grade, Ki-67 protein level, lesion amount, and TNM stage, moreover, the most vital variables was PNI (Table 3 and Additional files 2). For RFS, we used ridge regression, DM was found the most leading features, when the four important indicators were DM, CHD, operation method, and age (Table 4 and Additional files 3). Subset regression was chosen for the DMFS model, and PNI, Ki-67 protein level, tumor differentiation grade, TNM stage, and histological type were the 5 vital features, furthermore, PNI was the most leading variables (Additional files 4). The number of important variables and the most vital features for each model are shown in detail (a detailed order of importance is shown in Tables 2 to 4 and in the Additional files 1 to 4).

Table 2

Comparisons of the MSEs of four regression models

Regression models

MSEs

OS

DFS

RFS

DMFS

Subset regression model

0.4500633

0.3749525

0.7710959

0.4243934

Ridge regression model

0.4602181

0.3678196

0.7513997

0.4449402

LASSO regression model

0.5067479

0.3608976

0.8054069

0.4426869

LASSO cross-validation model

0.4890136

0.4112665

0.8090451

0.4822449

MSE mean square error, OS overall survival, DFS disease-free survival, RFS recurrence-free survival, DMFS distant metastasis-free survival, LASSO the least absolute shrinkage and selection operator

Table 3

Regression coefficients for each variable after LASSO regression on disease-free survival

No.

Intercept

s1

1

PNI

0.166095917

2

Tumor differentiation grade

0.145056847

3

Ki-67 protein level

0.125446209

4

Lesion amount

0.122986628

5

TNM stage

0.083527594

6

Family history of tumors

0.068736389

7

Histological type

0.068138058

8

Family history of gastrointestinal tumors

0.064282454

9

Age

0.052132288

10

Sex

0.049713378

11

Serum CRP level

0.049208782

12

CHD

0.038371955

13

Operation method

0.035456908

14

BMI

0.028557122

15

LNR

0.020353514

16

Smoking

0.018868793

17

Tumor location

0.015574714

18

Serum CEA level

0.012848902

19

Drinking

0.011499902

20

Tumor size

0.009350538

21

LVI

0.008701713

22

HP

0.006665916

23

DM

0.005961741

24

Disease status

0.002272804

LASSO the least absolute shrinkage and selection operator, PNI perineural invasion, TNM stage tumor node metastasis stage, CRP C-reactive protein, CHD chronic heart disease, BMI body mass index, LNR lymph node ratio, CEA carcinoembryonic antigen, LVI lymphvascular invasion, HP hypertension, DM diabetes mellitus

Table 4

Regression coefficients for each variable after ridge regression on recurrence-free survival

No.

Intercept

s1

1

DM

0.184923946

2

CHD

0.108369703

3

Operation method

0.103702878

4

Age

0.097713638

5

Smoking

0.097274712

6

PNI

0.080909316

7

Histological type

0.079653416

8

Family history of tumors

0.074394127

9

Ki-67 protein level

0.067737861

10

Drinking

0.067211567

11

LNR

0.065499595

12

Family history of gastrointestinal tumors

0.056467337

13

Lesion amount

0.047623412

14

Serum CRP level

0.042766487

15

TNM stage

0.037254881

16

Serum CEA level

0.033536644

17

Tumor location

0.02592845

18

BMI

0.020303689

19

Sex

0.0181275

20

Tumor differentiation grade

0.013301381

21

Tumor size

0.01124825

22

LVI

0.006359537

23

HP

0.005023189

DM diabetes mellitus, CHD chronic heart disease, PNI perineural invasion, LNR lymph node ratio, CRP C-reactive protein, TNM stage tumor node metastasis stage, CEA carcinoembryonic antigen, BMI body mass index, LVI lymphvascular invasion, HP hypertension

Model performance

The discriminative performance of the four models for the 3-classified study outcomes as expressed by the area under the receiver operating characteristic curve (AUC) and AP. The receiver operating characteristic (ROC) curves of the OS model obtained by the four ML algorithms are shown in Fig. 2 (LR (AUC: 0.85; AP: 67.8%), LDA (AUC: 0.86; AP: 64.5%), KNN (AUC: 0.76; AP: 63.5%), and SVM (AUC: 0.81; AP: 65.7%)). Regarding DFS, Fig. 3 shows the four ROC curves in detail (LR (AUC: 0.95; AP: 82.3%), LDA (AUC: 0.95; AP: 82.1%), KNN (AUC: 0.91; AP: 80.7%), and SVM (AUC: 0.95; AP: 82.7%)). The obtained ROC curves for the RFS model are shown in Fig. 4 (LR (AUC: 0.96; AP: 79.6%), LDA (AUC: 0.90; AP: 74.3%), KNN (AUC: 0.88; AP: 80.9%), and SVM (AUC: 0.93; AP: 81.6%)). Figure 5 shows the ROC curves for DMFS (LR (AUC: 0.90; AP: 82.1%), LDA (AUC: 0.89; AP: 81.9%), KNN (AUC: 0.83; AP: 81.0%), and SVM (AUC: 0.88; AP: 82.7%)).

Algorithm performance

Specific to each algorithm, in terms of the characteristics of different ML models (OS model, DFS model, RFS model, and DMFS model), the algorithms showed varied predictive performances. We applied 4 algorithms in each model and presented their results in detail. LR achieved an AUC of 0.96 when predicting RFS; therefore, LR had the best predictive performance for the RFS model among the four ML models. Additionally, LR performed well in all models. The best AUC of LDA was 0.95 when classifying the occurrence time of tumor recurrence/distant metastasis, so the best prediction performance of LDA was for the DFS model among our ML models. KNN attained its highest AUC of 0.91 when predicting DFS; therefore, among the four ML models, KNN also performed best for the DFS model. In addition, regarding the best AUC (0.95) of SVM when classifying DFS, SVM also predicted the DFS model best.

The final selected algorithms of specific models were as follows: for the OS model, we chose LDA for prediction, whose optimized AUC was 0.86. Due to the large sample size, the AUCs of LDA, LR, and SVM were equal (0.95) for the DFS model, and we used SVM, which had a more stable performance for prediction. For the RFS model, LR was chosen due to its AUC (0.96) when predicting. LR was also screened for the DMFS model, and its AUC was 0.90.

Discussion

In this study, we developed and validated a promising ML architecture for predicting the 3-classified occurrence time (3- and 5-year were the cutoff) of 4 oncological outcomes and screened the important variables for each ML model, which were categorized by different oncological outcomes. The four outcomes were patient’s death, tumor recurrence/distant metastasis, tumor recurrence, and tumor distant metastasis. This architecture represents a simple, practical, and easily assessable tool that clinicians can refer to when selecting treatments. In addition, our architecture has good tolerance for heterogeneous patients and does not require clear patient medical histories, which lowers the threshold for use. Our work is different from previous studies. We cut off the survival times, predicted them as multicategorized endpoints, and understood patients’ prognoses longitudinally through our results. Our ML models were designed based on specific oncological outcomes to predict the possible occurrence time (multiclassified). In addition, we screened important indicators for each survival time, and some of them were not commonly taken as leading predictors of CRC patients’ prognoses, which provides new insights for pathologists and basic science researchers and even pharmacologists. Moreover, the adaptability and interpretability of our architecture promote its application in hospitals at different levels. Our work also demonstrated the feasibility of applying ML models to a large number of heterogeneous CRC patients.

Although several genetic and molecular markers have been proven to be correlated with patient prognoses [25, 29], we did not select them as potential variables. Because we aimed to build a clinically generalizable architecture that might be used in data-poor situations, the indicators we selected were clinically applicable and had small heterogeneity among patients. In addition, to avoid selection bias and limitations, we avoided the TNM stage-centric impasse by inputting a large number of potential variables into our work, allowing the ML algorithms to screen the important variables that performed best. Furthermore, there were no missing values in our database, which prevented bias caused by improper filling of missing values. We controlled potential confounding factors as much as possible. We randomly grouped the patients into training sets and testing sets to avoid selection biases. Furthermore, we explored the different baseline characteristics between the two sets and found almost no significant differences. Therefore, we did not further explore the potential confounding factors. However, related investigations are valuable and need to be conducted in the future.

We managed to choose patients with a clear medical history before the operation who underwent curative initial treatment, so we excluded stage I, II, and III CRC patients who had neoadjuvant chemoradiotherapy (which may lead to a vague history). Patients with stage IV CRC who were not eligible for radical surgery were also excluded. However, the history of postoperative adjuvant chemoradiotherapies in our patients was unclear. Consequently, the model might be a rough reference when physicians at higher-level hospitals are redesigning treatment strategies for patients from lower-level hospitals with unclear postoperation radiotherapy and chemotherapy histories. Treatment options for these patients are difficult to determine, and it is difficult for oncologists to obtain references from previous studies that stratify patients by chemotherapy regimens. Moreover, to avoid bias, we excluded patients who did not have endpoint data. When applying our architecture, patients predicted to not have outcomes would be classified as greater than 5 years (equivalent to oncological recurrence). To further avoid the bias caused by the time duration as endpoints, we cautiously excluded patients with long follow-up intervals and patients with clarified noncancer-specific deaths. In addition, the number of patients with tumor recurrence was the smallest; therefore, after strict 9:1 classification, the sample size for testing in the RFS model was still small.

To better predict 3-categorized survival time for varied oncological outcomes, we configured and optimized hyperparameters wherever possible. To better evaluate the fit of the models, we chose the classification predictors C statistics (to avoid the influence of the threshold value, we used the AUC) and AP, which corresponded to the nonparametric ML algorithms, to evaluate the accuracy of the prediction. In the process of screening variables, due to the influence of sample size, variables (the number of variables was greater than 20), and other factors, we chose methods based on multiple regression models. Evaluation indicators in the process of model selection included error sum of squares (SSE), MSE, and so on. However, the SSE value in this experiment was meaningless (SSE would inevitably increase when the sample size increased). We chose the MSE as the evaluation indicator. Since this experiment involved regression prediction based on small-sample data, there was no model overfitting state. Consequently, the default state of the system configuration was processed by us (when encountering no solution or a local optimal solution, it would be regarded as a convergence failure state) considering the development costs. We think the lower AUC for OS is a reasonable result given the biological complexity and the small sample size. Furthermore, to reduce bias, computer experts were blinded to the meaning of each indicator when building the ML models.

There were also performance differences between training and testing in each of four classifiers. Regarding the small number of calculations, the advantages of LR were more obvious. When the sample size and the number of variables were large, even if we applied regularization techniques, the performance of LR still degraded due to overfitting. Therefore, in this experiment, LR showed the best performance for the prediction of the training set of RFS, while its performance of OS prediction was relatively reduced. In the DFS prediction process of in our experiments, because DFS was not a Gaussian distribution sample, the LDA predictive performance [30] was the best and better than the other three ML algorithms [31]. When the κ value was configured as the Euclidean distance, the KNN training set performed best among the four ML algorithms in regard to the DFS prediction process in our work. When SVM predicted the training set for DFS in this article, the algorithm was very robust, and the DFS dataset belongs to the small and medium datasets; therefore, SVM performed better than the other algorithms when the kernel function was configured as the default value. The results obtained by the four ML models in the training set were approximately 5% higher than those in the test set, and the overall gap was within a reasonable range [16].

Taking survival time as a categorical variable and making more precise predictions to obtain an approximate time for the occurrence of oncological outcomes also indirectly reflects the potential application of our models in precision medicine. A more accurate prediction of possible prognoses would translate into more precise formulations of treatment therapies and patient management strategies. Extending the survival time is the shared goal of clinicians and oncology patients. Quantifying patient outcomes aids in shared decision making [32]. Because of the heterogeneity of CRCs, physicians and patients must seriously consider the tradeoffs between adverse effects and benefits [33] when choosing a treatment strategy. It is possible to improve outcomes by closer follow-up or the administration of additional chemoradiotherapy to patients who are predicted to have poorer prognoses. Consequently, we suggest that patients who tend to have a shorter DMFS receive prophylactic chemotherapy or regional radiotherapy for the common metastatic sites of CRC described by Jiang B et al. [34]. Moreover, the identification of patients with better prognoses could reduce the cost of medical care and improve the level of humanistic care by reducing the psychological burden on patients and their families. Therefore, predictive tools such as our architecture are urgently needed in the clinic.

However, referring to the results output by models with uncleared vital parameters for managing patients is not always acceptable to clinicians and patients [35, 36]. The interpretability of models is vital, especially in biomedicine [37, 38]. To turn an important parameter uncleared model into an important parameter cleared model, we screened out the corresponding predictors and showed their important order for different outcomes. TNM stage, the primary indicator for chemoradiotherapy decisions, was screened out in the OS, DFS, and DMFS models, which made our models more credible. Moreover, indicators that had been widely found to correlate with prognoses, such as PNI [3941], pathological type [4245], and tumor differentiation grade [46], were also selected in the models, which further confirmed the credibility of our architecture. One of the potential benefits of using ML models is that the important variables are identified, and less critical parameters are ignored. Several predictors that were not widely used as important predictors for CRC patients’ prognoses were additionally included in our model and provided new insights into predicting prognoses. The levels of CRP and Ki-67 were shown to be factors affecting prognoses, consistent with studies showing that high serum CRP levels are associated with higher postoperative complication rates [47, 48] and that Ki-67 levels reflect the proliferative capacity of cells [49], especially the proliferative capacity of tumor cells [50]. Surprisingly, LVI was only modestly predictive. Our models also identified some predictors that were not previously considered to be directly linked to poor prognosis (unifocal vs. multifocal lesions and surgery vs. laparoscopy) [5154]. These factors are more likely to be directly related to surgical trauma rather than survival time. For the RFS model, two chronic diseases, DM and CHD, were selected together with age, possibly due to insufficient sample size. In addition, we indirectly focused on elderly oncological patients who had baseline diseases.

Our findings showed the formidable predictive power of ML methods, particularly for heterogeneous diseases that are stratified by outcomes. ML has unique value in clinical applications; it can guide patient managements, improve patient outcomes, and tailor treatment regimens, especially when resources are scarce (when only clinicopathological and surgical variables are available for analysis). The Cox proportional-hazards model and the multivariate linear model in the ML model are considered to be similar. When relying on the computer for calculation, the function parameters used are the multiple LR model, while the optimization parameter β applied in the Cox proportional-hazards model is consistent with the multiple LR. However, compared with the Cox proportional-hazards model [55], ML still has its advantages. When selecting vital parameters, the Cox proportional-hazards model commonly shows independent prognostic factors and more indirectly compares their predictive value, while ML finds important factors and compares their importance more reliably and directly. When building models and determining their performance, the Cox proportional-hazards model commonly takes a specific time and builds double/multiclassified risk stratification based on it, while we predicted the 3-catergorized occurrence time of oncological outcomes, in other words, patients’ survival times from the longitude angle in our ML models. Moreover, the number of variables when the Cox proportional-hazards model performs best is less than ML, which may contain supermultiple variables [56, 57]. In addition, the ML and Cox proportional-hazards models also differed in the meaning of AUC. The AUC is an evaluation index for ML models (such as AP) in this article, which varies with the AUC obtained from the traditional Cox proportional-hazards model. AP=(TP + TN)/(TP + TN + FP + FN) in our article refers to the percentage of correct prediction results in the total sample and specifically refers to the ratio of 3-classified survival times (OS, DFS, RFS, and DMFS) to the corresponding sample size of the datasets. In the ML model, we binarized the 3 categories, and the AUC solution was the same as the conventional solution. The final AUC value was obtained from the end of the ML model [58], which is the average value of the 3-categoried predictions of each dataset [59]. However, although we obtained encouraging high prediction accuracy and results with transparent important variables, more progress is needed before ML can be fully relied upon. In addition, in clinical practice, traditional performance measures such as the AUC must be translated into medically relevant measures to elucidate the patient-centric value of ML models. ML still ways to go.

Moreover, the limitations of our study must be noted. First, the sample size we used to collect data to input into the ML models was not so large (especially regarding RFS). When the data were input into the ML models for parameter optimization, the sensitivity of parameter adjustment could not be estimated due to the small sample size. Second, as a dilemma of ML [37], the sample uniformity in the data could not be estimated, which could affect the final results. Furthermore, this study was conducted based on a retrospective analysis. However, the patient data were obtained from a well-conceived and well-characterized cohort, which adds to the credibility of our results; thus, this study can serve as the basis for subsequent prospective studies. In addition, postoperative treatment information, such as specific radiotherapy and chemotherapy treatments as well as detailed surgical methods were not available in our database.

We believe that subsequent research could improve the accuracy of individual survival time prediction by employing other techniques, obtaining larger sample sizes, improving follow-up accuracy, and so on. Furthermore, it is necessary to add detailed studies on genomics and chemoradiotherapy regimens. Our next goal is to develop a predictive system as an app and install it in a hospital system.

Conclusions

We successfully designed and validated clinicopathological-based ML prediction models. Our work might promote the application of precision treatments and improve clinical outcomes for CRC patients. We showed the potential of ML in improving the direction of treatment strategies.

Declarations

Funding This work was supported by the National Natural Science Foundation of China (Grant number 81802777).

Competing Interests The authors have no relevant financial or non-financial interests to disclose.

Author Contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Xiaolin Ji, Shuo Xu and Xiaoyu Li. The first draft of the manuscript was written by Xiaolin Ji and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Data Availability The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Code Availability Open source Python 3.8 was used to process data and develop methods. These codes will be shared by the respective authors by request.

Ethics approval This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (reference number QYFYWZLL26957). 

Consent to participate The need for written informed consent was waived by the Ethics Committee of the Affiliate Hospital of Qingdao University due to retrospective nature of the study.

Consent to publish Not Applicable.

Acknowledgment This work was supported by the National Natural Science Foundation of China (Grant number 81802777).

References

  1. Bray F, Ferlay J, Soerjomataram I, Siegel R, Torre L, Jemal A: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 2018, 68(6):394-424.
  2. Koncina E, Haan S, Rauh S, Letellier E: Prognostic and Predictive Molecular Biomarkers for Colorectal Cancer: Updates and Challenges. Cancers 2020, 12(2).
  3. Vermeer N, Snijders H, Holman F, Liefers G, Bastiaannet E, van de Velde C, Peeters K: Colorectal cancer screening: Systematic review of screen-related morbidity and mortality. Cancer Treat Rev 2017, 54:87-98.
  4. Kalager M, Wieszczy P, Lansdorp-Vogelaar I, Corley D, Bretthauer M, Kaminski M: Overdiagnosis in Colorectal Cancer Screening: Time to Acknowledge a Blind Spot. Gastroenterology 2018, 155(3):592-5.
  5. Ma C, Teriaky A, Sheh S, Forbes N, Heitman S, Jue T, Munroe C, Jairath V, Corley D, Lee J: Morbidity and Mortality After Surgery for Nonmalignant Colorectal Polyps: A 10-Year Nationwide Analysis. Am J Gastroenterol 2019, 114(11):1802-10.
  6. Peery A, Cools K, Strassle P, McGill S, Crockett S, Barker A, Koruda M, Grimm I: Increasing Rates of Surgery for Patients With Nonmalignant Colorectal Polyps in the United States. Gastroenterology 2018, 154(5):1352-60.e3.
  7. Nagtegaal I, Quirke P, Schmoll H: Has the new TNM classification for colorectal cancer improved care? Nat Rev Clin Oncol 2011, 9(2):119-23.
  8. Dienstmann R, Mason M, Sinicrope F, Phipps A, Tejpar S, Nesbakken A, Danielsen S, Sveen A, Buchanan D, Clendenning M, Rosty C, Bot B, Alberts S, Milburn Jessup J, Lothe R, Delorenzi M, Newcomb P, Sargent D, Guinney J: Prediction of overall survival in stage II and III colon cancer beyond TNM system: a retrospective, pooled biomarker study. Ann Oncol 2017, 28(5):1023-31.
  9. Mo S, Dai W, Xiang W, Huang B, Li Y, Feng Y, Li Q, Cai G: Survival Contradiction Between Stage IIA and Stage IIIA Rectal Cancer: A Retrospective Study. J Cancer 2018, 9(8):1466-75.
  10. Chu Q, Zhou M, Medeiros K, Peddi P, Kavanaugh M, Wu X: Poor survival in stage IIB/C (T4N0) compared to stage IIIA (T1-2 N1, T1N2a) colon cancer persists even after adjusting for adequate lymph nodes retrieved and receipt of adjuvant chemotherapy. BMC cancer 2016, 16:460.
  11. Greene F, Stewart A, Norton H: A new TNM staging strategy for node-positive (stage III) colon cancer: an analysis of 50,042 patients. Ann Surg 2002, 236(4):416-21, discussion 21.
  12. Beaton C, Twine CP, Williams GL, Radcliffe AG: Systematic review and meta-analysis of histopathological factors influencing the risk of lymph node metastasis in early colorectal cancer. Colorectal Dis 2013, 15(7):788-97.
  13. Rekhraj S, Aziz O, Prabhudesai S, Zacharakis E, Mohr F, Athanasiou T, Darzi A, Ziprin P: Can intra-operative intraperitoneal free cancer cell detection techniques identify patients at higher recurrence risk following curative colorectal cancer resection: a meta-analysis. Ann Surg Oncol 2008, 15(1):60-8.
  14. Choi JY, Jung SA, Shim KN, Cho WY, Keum B, Byeon JS, Huh KC, Jang BI, Chang DK, Jung HY, Kong KA: Meta-analysis of predictive clinicopathologic factors for lymph node metastasis in patients with early colorectal carcinoma. J Korean Med Sci 2015, 30(4):398-406.
  15. Ha GW, Kim JH, Lee MR: Oncologic Impact of Anastomotic Leakage Following Colorectal Cancer Surgery: A Systematic Review and Meta-Analysis. Ann Surg Oncol 2017, 24(11):3289-99.
  16. Mitchell TM: Machine Learning. Machine Learning 2003.
  17. Grimm L, Plichta J, Hwang E: More Than Incremental: Harnessing Machine Learning to Predict Breast Cancer Risk. J Clin Oncol 2022:JCO2102733.
  18. Metsky H, Welch N, Pillai P, Haradhvala N, Rumker L, Mantena S, Zhang Y, Yang D, Ackerman C, Weller J, Blainey P, Myhrvold C, Mitzenmacher M, Sabeti P: Designing sensitive viral diagnostics with machine learning. Nat Biotechnol 2022.
  19. Xie C, Zhuang X-X, Niu Z, Ai R, Lautrup S, Zheng S, Jiang Y, Han R, Gupta TS, Cao S, Lagartos-Donate MJ, Cai CZ, Xie LM, Caponio D, Wang WW, Schmauck-Medina T, Zhang J, Wang Hl, Lou G, Xiao X, Zheng W, Palikaras K, Yang G, Caldwell KA, Caldwell GA, Shen HM, Nilsen H, Lu JH, Fang EF: Amelioration of Alzheimer’s disease pathology by mitophagy inducers identified via machine learning and a cross-species workflow. Nat Biomed Eng 2022, 6(1):76-93.
  20. Kim M, Chen C, Wang P, Mulvey J, Yang Y, Wun C, Antman-Passig M, Luo H, Cho S, Long-Roche K, Ramanathan L, Jagota A, Zheng M, Wang Y, Heller D: Detection of ovarian cancer via the spectral fingerprinting of quantum-defect-modified carbon nanotubes in serum by machine learning. Nat Biomed Eng 2022, 6(3):267-75.
  21. Tripepi G, Jager K, Dekker F, Zoccali C: Diagnostic methods 2: receiver operating characteristic (ROC) curves. Kidney Int 2009, 76(3):252-6.
  22. Liang R, Xie J, Zhang C, Zhang M, Huang H, Huo H, Cao X, Niu B: Identifying Cancer Targets Based on Machine Learning Methods via Chou's 5-steps Rule and General Pseudo Components. Curr Top Med Chem 2019, 19(25):2301-17.
  23. Liu C, Zhao J, Lu W, Dai Y, Hockings J, Zhou Y, Nussinov R, Eng C, Cheng F: Individualized genetic network analysis reveals new therapeutic vulnerabilities in 6,700 cancer genomes. PLoS Comput Biol 2020, 16(2):e1007701.
  24. Liang H, Tsui B, Ni H, Valentim C, Baxter S, Liu G, Cai W, Kermany D, Sun X, Chen J, He L, Zhu J, Tian P, Shao H, Zheng L, Hou R, Hewett S, Li G, Liang P, Zang X, Zhang Z, Pan L, Cai H, Ling R, Li S, Cui Y, Tang S, Ye H, Huang X, He W, Liang W, Zhang Q, Jiang J, Yu W, Gao J, Ou W, Deng Y, Hou Q, Wang B, Yao C, Liang Y, Zhang S, Duan Y, Zhang R, Gibson S, Zhang C, Li O, Zhang E, Karin G, Nguyen N, Wu X, Wen C, Xu J, Xu W, Wang B, Wang W, Li J, Pizzato B, Bao C, Xiang D, He W, He S, Zhou Y, Haw W, Goldbaum M, Tremoulet A, Hsu C, Carter H, Zhu L, Zhang K, Xia H: Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med 2019, 25(3):433-8.
  25. Liu Z, Liu L, Weng S, Guo C, Dang Q, Xu H, Wang L, Lu T, Zhang Y, Sun Z, Han X: Machine learning-based integration develops an immune-derived lncRNA signature for improving outcomes in colorectal cancer. Nat Commun 2022, 13(1):816.
  26. D'Ascenzo F, De Filippo O, Gallone G, Mittone G, Deriu M, Iannaccone M, Ariza-Solé A, Liebetrau C, Manzano-Fernández S, Quadri G, Kinnaird T, Campo G, Simao Henriques J, Hughes J, Dominguez-Rodriguez A, Aldinucci M, Morbiducci U, Patti G, Raposeiras-Roubin S, Abu-Assi E, De Ferrari G: Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets. Lancet 2021, 397(10270):199-207.
  27. Yala A, Mikhael P, Strand F, Lin G, Smith K, Wan Y, Lamb L, Hughes K, Lehman C, Barzilay R: Toward robust mammography-based models for breast cancer risk. Sci Transl Med 2021, 13(578).
  28. Wei T, Hon YC, Ling L: Method of fundamental solutions with regularization techniques for Cauchy problems of elliptic operators. ENG ANAL BOUND ELEM 2007, 31(4):373-85.
  29. Hossain M, Chowdhury U, Islam M, Uddin S, Ahmed M, Quinn J, Moni M: Machine learning and network-based models to identify genetic risk factors to the progression and survival of colorectal cancer. Comput Biol Med 2021, 135:104539.
  30. Hoffman M, Blei DM, Bach FR: Online Learning for Latent Dirichlet Allocation. International Conference on Neural Information Processing Systems 2010.
  31. Blei DM, Ng A, Jordan MI: Latent dirichlet allocation. J MACH LEARN RES 2003.
  32. Howard F, Kochanny S, Koshy M, Spiotto M, Pearson A: Machine Learning-Guided Adjuvant Treatment of Head and Neck Cancer. JAMA Netw Open 2020, 3(11):e2025881.
  33. Ganguli R, Franklin J, Yu X, Lin A, Heffernan D: Machine learning methods to predict presence of residual cancer following hysterectomy. Sci Rep 2022, 12(1):2738.
  34. Jiang B, Mu Q, Qiu F, Li X, Xu W, Yu J, Fu W, Cao Y, Wang J: Machine learning of genomic features in organotropic metastases stratifies progression risk of primary tumors. Nat Commum 2021, 12(1):6692.
  35. Pattarabanjird T, McNamara C: The clinicians’ perspectives on machine learning. Nat Cardiovasc Res 2022, 1(3):189-90.
  36. Watson D, Krutzinna J, Bruce I, Griffiths C, McInnes I, Barnes M, Floridi L: Clinical applications of machine learning algorithms: beyond the black box. BMJ 2019, 364:l886.
  37. Yu M, Ma J, Fisher J, Kreisberg J, Raphael B, Ideker T: Visible Machine Learning for Biomedicine. Cell 2018, 173(7):1562-5.
  38. Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B: Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci U S A 2019, 116(44):22071-80.
  39. Nikberg M, Chabok A, Letocha H, Kindler C, Glimelius B, Smedh K: Lymphovascular and perineural invasion in stage II rectal cancer: a report from the Swedish colorectal cancer registry. Acta Oncol 2016, 55(12):1418-24.
  40. Song J, Yu M, Kang K, Lee J, Kim S, Nam T, Jeong J, Jang H, Lee J, Jung J: Significance of perineural and lymphovascular invasion in locally advanced rectal cancer treated by preoperative chemoradiotherapy and radical surgery: Can perineural invasion be an indication of adjuvant chemotherapy? Radiother Oncol 2019, 133:125-31.
  41. Knijn N, Mogk S, Teerenstra S, Simmer F, Nagtegaal I: Perineural Invasion is a Strong Prognostic Factor in Colorectal Cancer: A Systematic Review. Am J Surg Pathol 2016, 40(1):103-12.
  42. Wu X, Lin H, Li S: Prognoses of different pathological subtypes of colorectal cancer at different stages: A population-based retrospective cohort study. BMC gastroenterol 2019, 19(1):164.
  43. Sheng H, Wei X, Mao M, He J, Luo T, Lu S, Zhou L, Huang Z, Yang A: Adenocarcinoma with mixed subtypes is a rare but aggressive histologic subtype in colorectal cancer. BMC cancer 2019, 19(1):1071.
  44. Kim S, Shin S, Lee K, Kim H, Kim T, Kang D, Hur H, Min B, Kim N, Chung H, Roh J, Ahn J: Prognostic value of mucinous histology depends on microsatellite instability status in patients with stage III colon cancer treated with adjuvant FOLFOX chemotherapy: a retrospective cohort study. Ann Surg Oncol 2013, 20(11):3407-13.
  45. Nitsche U, Zimmermann A, Späth C, Müller T, Maak M, Schuster T, Slotta-Huspenina J, Käser S, Michalski C, Janssen K, Friess H, Rosenberg R, Bader F: Mucinous and signet-ring cell colorectal cancers differ from classical adenocarcinomas in tumor biology and prognosis. Ann Surg 2013, 258(5):775-82, discussion 82-3.
  46. Garrity M, Burgart L, Mahoney M, Windschitl H, Salim M, Wiesenfeld M, Krook J, Michalak J, Goldberg R, O'Connell M, Furth A, Sargent D, Murphy L, Hill E, Riehle D, Meyers C, Witzig T: Prognostic value of proliferation, apoptosis, defective DNA mismatch repair, and p53 overexpression in patients with resected Dukes' B2 or C colon cancer: a North Central Cancer Treatment Group Study. J Clin Oncol 2004, 22(9):1572-82.
  47. Domínguez-Comesaña E, Estevez-Fernández S, López-Gómez V, Ballinas-Miranda J, Domínguez-Fernández R: Procalcitonin and C-reactive protein as early markers of postoperative intra-abdominal infection in patients operated on colorectal cancer. Int J Colorectal Dis 2017, 32(12):1771-4.
  48. Muñoz J, Alvarez M, Cuquerella V, Miranda E, Picó C, Flores R, Resalt-Pereira M, Moya P, Pérez A, Arroyo A: Procalcitonin and C-reactive protein as early markers of anastomotic leak after laparoscopic colorectal surgery within an enhanced recovery after surgery (ERAS) program. Surg Endosc 2018, 32(9):4003-10.
  49. Schlüter C, Duchrow M, Wohlenberg C, Becker M, Key G, Flad H, Gerdes J: The cell proliferation-associated antigen of antibody Ki-67: a very large, ubiquitous nuclear protein with numerous repeated elements, representing a new kind of cell cycle-maintaining proteins. J Cell Biol 1993, 123(3):513-22.
  50. Starborg M, Gell K, Brundell E, Höög C: The murine Ki-67 cell proliferation antigen accumulates in the nucleolar and heterochromatic regions of interphase cells and at the periphery of the mitotic chromosomes in a process essential for cell cycle progression. J Cell Sci 1996, 143-53.
  51. Barz C, Stöss C, Neumann P, Wilhelm D, Janssen K, Friess H, Nitsche U: Retrospective study of prognosis of patients with multiple colorectal carcinomas: synchronous versus metachronous makes the difference. Int J colorectal dis 2021, 36(7):1487-98.
  52. Chin C, Kuo Y, Chiang J: Synchronous colorectal carcinoma: predisposing factors and characteristics. Colorectal Dis 2019, 21(4):432-40.
  53. Fleshman J, Branda M, Sargent D, Boller A, George V, Abbas M, Peters W, Maun D, Chang G, Herline A, Fichera A, Mutch M, Wexner S, Whiteford M, Marks J, Birnbaum E, Margolin D, Larson D, Marcello P, Posner M, Read T, Monson J, Wren S, Pisters P, Nelson H: Disease-free Survival and Local Recurrence for Laparoscopic Resection Compared With Open Resection of Stage II to III Rectal Cancer: Follow-up Results of the ACOSOG Z6051 Randomized Controlled Trial. Ann Surg 2019, 269(4):589-95.
  54. Hida K, Okamura R, Sakai Y, Konishi T, Akagi T, Yamaguchi T, Akiyoshi T, Fukuda M, Yamamoto S, Yamamoto M, Nishigori T, Kawada K, Hasegawa S, Morita S, Watanabe M: Open versus Laparoscopic Surgery for Advanced Low Rectal Cancer: A Large, Multicenter, Propensity Score Matched Cohort Study in Japan. Ann Surg 2018, 268(2):318-24.
  55. Poortmans P, Collette S, Kirkove C, Van Limbergen E, Budach V, Struikmans H, Collette L, Fourquet A, Maingon P, Valli M, De Winter K, Marnitz S, Barillot I, Scandolaro L, Vonk E, Rodenhuis C, Marsiglia H, Weidner N, van Tienhoven G, Glanzmann C., Kuten A, Arriagada R, Bartelink H, Van den Bogaert W, EORTC Radiation Oncology and Breast Cancer Groups: Internal Mammary and Medial Supraclavicular Irradiation in Breast Cancer. N Engl J Med 2015, 373(4):317-27.
  56. Lee S, Han S, Shim JH, Kim SY, Won HJ, Shin YM, Kim PN, An J, Lee D, Kim KM, Lim YS, Chung YH, Lee YS, Lee HC: A Patient-Based Nomogram for Predicting Overall Survival after Radiofrequency Ablation for Hepatocellular Carcinoma. J Vasc Interv Radiol 2015, 26(12):1787-94.e1.
  57. Gunter MJ, Murphy N, Cross AJ, Dossus L, Dartois L, Fagherazzi G, Kaaks R, Kühn T, Boeing H, Aleksandrova K, Tjønneland A, Olsen A, Overvad K, Larsen SC, Redondo Cornejo ML, Agudo A, Sánchez Pérez MJ, Altzibar JM, Navarro C, Ardanaz E, Khaw KT, Butterworth A, Bradbury KE, Trichopoulou A, Lagiou P, Trichopoulos D, Palli D, Grioni S, Vineis P, Panico S, Tumino R, Bueno-de-Mesquita B, Siersema P, Leenders M, Beulens JWJ, Uiterwaal CU, Wallström P, Nilsson LM, Landberg R, Weiderpass E, Skeie G, Braaten T, Brennan P, Licaj I, Muller DC, Sinha R, Wareham N, Riboli E: Coffee Drinking and Mortality in 10 European Countries: A Multinational Cohort Study. Ann Intern Med 2017, 167(4):236-47.
  58. Rasmussen CE, Williams C: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press 2005.
  59. Goldberg DE: Genetic Algorithms in Search, Optimization, and Machine Learning. Queen's University Belfast 2010.