DOI: https://doi.org/10.21203/rs.3.rs-1572496/v1
Purpose To utilize the patient, tumor, and treatment features and compare the performance of machine learning algorithms, develop and validate models to predict overall, disease-free, recurrence-free, and distant metastasis-free survival, and screen important variables to improve the prognosis of patients in clinical settings.
Methods More than 1000 colorectal cancer patients who underwent curative resection were grouped according to 4 endpoints and divided into testing sets and training sets (9:1). We applied 4 machine learning algorithms to predict 1-, 3-, and 5-year survival times. The area under the receiver operating characteristic curve (AUC) and average precision (AP) were our accuracy indicators. Vital parameters were screened by multivariate regression models. To achieve better prediction of longitudinal oncological outcomes, we performed 10-fold cross-validation except for the recurrence-free survival model (3-fold cross-validation). We iterated 3000 times after hyperparameter optimization and assessed the internal testing sets.
Results The best AP values were greater than 80%, except for the overall survival model (69.5%). The best AUCs were all greater than 0.70 except for the recurrence free survival model (0.61). The models performed well. Variables that were widely correlated with prognoses, such as the TNM stage, were selected as important features; however, indirectly related indicators, such as Ki-67 level, were also selected.
Conclusion We constructed an independent, high-accuracy "white-box" machine learning system for predicting survival times. This system may help in determining managing strategies for colorectal cancer patients and has future utility in personalized medicine and monitoring.
Colorectal cancer (CRC) is characterized by high heterogeneity, aggressiveness, and high morbidity and mortality rates (Bray et al. 2020). The high mortality rate is due to the progression of the disease and inadequate treatment strategies (Koncina et al. 2020). Furthermore, overdiagnosis, overtreatment, false positives, false reassurance, uncertain findings, and complications are common and lead to unnecessary psychological burden on patients (Kalager et al. 2018; Ma et al. 2019; Peery et al. 2018; Vermeer et al. 2017). Therefore, accurate prediction of the prognosis of CRC patients is vital when making clinical decisions. The American Joint Committee on Cancer (AJCC) classification system for CRC is the primary tool for predicting the prognosis of CRC and especially for making adjuvant chemoradiotherapy decisions (Dienstmann et al. 2017; Nagtegaal et al. 2011). However, the survival observations associated with the AJCC classifications for CRC patients have been reported to exhibit certain inconsistencies (Chu et al. 2016; Greene et al. 2002; Mo et al. 2018). Consequentially, several systematic reviews (with/without meta-analyses) have been carried out to investigate and build prognostic models for CRC; some of these models include the TNM stage, others do not (Beaton et al. 2013; Choi et al. 2015; Ha et al. 2017; Rekhraj et al. 2008). The outcomes of these models have been unsatisfactory, however, and likely due to methodological limitations. Thus, we explored the possibility of using machine learning (ML) to build a prognostic model for CRC. ML is a branch of artificial intelligence (AI) in which a computer generates rules underlying or based on raw data (Mitchell, 2003); the use of ML in medicine has gradually become common (Grimm et al. 2022; Kim et al. 2022; Metsky et al. 2022; Xie and Zhuang and Niu and Ai et al. 2022). ML can be used to directly compare the accuracy of two or more quantitative tests for the same disease/condition (Tripepi et al. 2009). Several rules for diagnosis and treatment have recently been formulated (H. Liang et al. 2019; R. Liang et al. 2019; Liu et al. 2020), and ML algorithms have been used to construct risk forecast models that predict the hazard ratio of adverse events at a certain point in time (D'Ascenzo et al. 2021; Liu et al. 2022) or independent of a specific time point (Yala et al. 2021). Patients in high-risk groups have been screened. However, due to the limitations of the models, it is not feasible to longitudinally predict when an event will occur. In other words, these models cannot indicate the specific time of event occurrence. In addition, they are "black-boxes", which reduces their clinical credibility.
In the present study, we used previously designed ML models as a basis to develop a new ML system for predicting the specific time of occurrence of oncological outcomes (death, tumor recurrence, tumor distant metastasis) in patients with stage I, II, and III CRC who undergone curative resection. This system could serve as a reference for clinicians when selecting treatment strategies for CRC patients. Furthermore, it might prompt communication on the purpose of treatments between doctors and patients as well as reduce the psychological burden on patients and their families. In addition, we screened the important variables that affect the outcomes to improve the clinical credibility of the ML models. Thus, the "black-box" feature of these models was eliminated. In addition, physicians who do not have the opportunity to apply ML to their work can use our research to make more accurate prognostic assessments and select management methods.
Case selection
This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (Grant number QYFYWZLL26957). The need for informed consent was waived given that this was a retrospective study. We retrospectively analyzed the data of patients who underwent curative operations for primary stage I, II, and III CRC at the Affiliated Hospital of Qingdao University from 2001 to 2020; the data was acquired through the hospital information system. A detailed flowchart is shown in Fig. 1.
Potential variables included age, sex, body mass index (BMI), hypertension (HP), diabetes mellitus (DM), chronic heart disease (CHD), smoking history, drinking history, family history of tumors, family history of gastrointestinal tumors, serum carcinoembryonic antigen (CEA) level, serum C-reactive protein (CRP) level, tumor position (ascending colon vs. transverse colon vs. descending colon vs. sigmoid colon vs. rectum), tumor differentiation grade, histological type, tumor size (diameter, 20 mm was the cutoff), perineural invasion (PNI), lymphvascular invasion (LVI), lesion amount (unifocal vs. multifocal), Ki-67 protein level, operation method (laparotomy vs. laparoscopy), lymph node ratio(LNR), and tumor node metastasis (TNM) stage. Additionally, disease status (recurrence vs. distant metastasis) was added to the disease-free survival (DFS) model. These characteristics mainly involved patient demographics, health, tumor characteristics, and treatment. There were no missing data.
Best subset selection regression
To identify the optimal subset, the regularization technique was used. Since it is not feasible to test 2p models in this experiment, the regularization technique was necessary. Our work only considered fitting one model for each λ value, which would greatly improve the efficiency. In the linear model, in regard to the bias trade-off problem, the relationship between the response variable and the predictor variable was close to linear, and the least squares estimate was close to unbiased but may had a high variance in linear models. Thus, a small change could lead to large changes in the results of the least squares coefficient estimates. Regularization optimized the bias trade-off through proper I selection and normalization, thereby improving the effect of model fitting. Regularization of coefficients also solved the problem of overfitting, which is caused by multicollinearity.
Ridge regression
In ridge regression, the normalization term was the sum of the squares of all coefficients, called the L2 norm, which was used for minimizing RSS+(sumj2) in this paper. When λ increased, the coefficient decreased (tending to but never reaching 0). The advantage of ridge regression is that it can improve the prediction accuracy. Because coefficients of 0 are not possible with ridge regression, the least absolute shrinkage and selection operator (LASSO) regression test was conducted.
LASSO regression
Different from ridge regression, the L1 norm (the sum of the absolute values of all feature weights), which needs to minimize RSS + λ(sum |βj|), was used for LASSO regression. This shrinkage penalty decreased the feature weights to 0, which greatly improved the interpretability of the model and represented an important advantage when compared with ridge regression and other regressions.
LASSO cross-validation model
In this experiment, by default, 10-fold cross-validation was used for LASSO regression. In κ-fold cross-validation, the data were divided into identical subsets, whose amount was κ. κ-1 subsets were contained to fit the model, and then the remaining subsets were considered the text set. Then, the results of κ-fold fitting (usually taking the default average) were combined to determine the final parameters. In this paper, each subset was used as a test set only once. Thus, it was very easy to use κ-fold cross-validation, and the results included the value of each fit and the mean squared error (MSE) of the response.
Outcome selection
The primary outcomes were four oncological endpoints-overall survival (OS), DFS, recurrence-free survival (RFS), and distant metastasis-free survival (DMFS). They were, respectively, defined as the date of surgery to the date of death, recurrence/distant metastasis, recurrence, and distant metastasis. The secondary outcomes were the important variables screened from each ML model.
Machine learning model training and validation
Figure 1 shows the process of set division (average precision, APs were the best at this ratio). All the variables were screened through four multivariate regression methods. The most appropriate regression models were selected based on the MSE (the difference between the predicted value and the true value; the smaller the MSE was, the greater the fit) of the regression models. Then, the corresponding number of variables and specific variables were obtained. In addition, the highest scores were screened based on Bayesian information criterion (BIC) scores. Subsequently, the selected predictors were input into four machine learning classifiers: decision tree (coarse), support vector machine (SVM), K-value proximity (KNN), and ensemble (optimal subset). Finally, OS, DFS, RFS, and DMFS were respectively classified and predicted.
Optimizing the classification models and configuring hyperparameters
The optimizer was configured for Bayesian optimization and stochastic optimization, and the maximum number of iterations was 3000. The hyperparameter configuration was as follows: the learning rate was configured as 0.01, the initial configuration of the decision tree was Coarse Tree, the initial configuration of the SVM was Coarse Gaussian SVM, the initial configuration of the KNN was Coarse KNN, and the initial configuration of the integration was Subspace Discriminant. Other hyperparameters were default optimized values.
The data analysis used four regression methods (best subset regression, ridge regression, LASSO regression, and LASSO cross-validation regression), which were based on R language (version 4.1.1). These methods were used to perform variable screening on 23 features (except DFS, which had 24 features). Four machine learning algorithms were developed in MATLAB R2019b.
Study population characteristics
The clinical and therapeutic characteristics of the study population are detailed in Table 1. In the OS model, 16.0% (166/1039) of the patients died 1 year after surgery, and 26.0% (44/169) and 37.7% (302/802) of the patients exhibited recurrence and distant metastasis, respectively, within 1 year after surgery. After the 5-year follow-up, 9.5% (99/1039) of the patients were still alive; additionally, 4.1% (7/169) and 3.7% (30/802) of the patients were without tumor recurrence and distant metastasis, respectively.
Parameters | OS (N = 1039) | DFS (N = 874) | RFS (N = 169) | DMFS (N = 802) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Training set (%, n = 1000) | Testing set (%, n = 39) | Training set (%, n = 849) | Testing set (%, n = 25) | Training set (%, n = 160) | Testing set (%, n = 9) | Training set (%, n = 784) | Testing set (%, n = 18) | ||||||
Age | |||||||||||||
> 60 | 56.4 | 69.2 | 58.0 | 152.0 | 61.3 | 55.6 | 57.5 | 61.1 | |||||
Sex | |||||||||||||
Man | 64.5 | 53.8 | 65.3 | 56.0 | 69.4 | 77.8 | 64.4 | 61.1 | |||||
BMI | |||||||||||||
< 18.5 | 3.7 | 0.0 | 3.8 | 0.0 | 3.1 | 0.0 | 3.8 | 0.0 | |||||
18.5–23.9 | 45.8 | 56.4 | 43.6 | 36.0 | 42.5 | 77.8 | 43.0 | 44.4 | |||||
24-27.9 | 36.0 | 30.8 | 39.2 | 13.0 | 40.0 | 11.1 | 40.6 | 38.9 | |||||
≥ 28 | 14.5 | 12.8 | 13.4 | 12.0 | 14.4 | 11.1 | 12.6 | 16.7 | |||||
HP | |||||||||||||
Presence | 27.5 | 48.7 | 25.9 | 36.0 | 25.0 | 44.4 | 26.4 | 50.0 | |||||
DM | |||||||||||||
Presence | 13.4 | 17.9 | 12.6 | 16.0 | 15.0 | 11.1 | 12.6 | 22.2 | |||||
CHD | |||||||||||||
Presence | 10.3 | 17.9 | 9.3 | 0.0 | 6.3 | 22.2 | 9.7 | 5.6 | |||||
Smoking | |||||||||||||
Presence | 32.5 | 20.5 | 32.4 | 28.0 | 34.4 | 22.2 | 31.6 | 33.3 | |||||
Drinking | |||||||||||||
Presence | 28.7 | 20.5 | 29.4 | 28.0 | 33.1 | 11.1 | 29.3 | 27.8 | |||||
Family history of tumors | |||||||||||||
Presence | 14.1 | 17.9 | 14.7 | 20.0 | 16.9 | 22.2 | 13.8 | 27.8 | |||||
Family history of gastroenterology tumors | |||||||||||||
Presence | 9.9 | 10.3 | 10.0 | 12.0 | 13.1 | 0.0 | 9.1 | 11.1 | |||||
CEA | |||||||||||||
High | 55.5 | 33.3 | 57.7 | 36.0 | 50.6 | 66.7 | 58.3 | 33.3 | |||||
CRP | |||||||||||||
High | 4.4 | 0.3 | 4.2 | 8.0 | 3.1 | 0.0 | 4.3 | 5.6 | |||||
Tumor position | |||||||||||||
Ascending colon | 13.0 | 28.2 | 11.3 | 24.0 | 9.4 | 33.3 | 11.4 | 22.2 | |||||
Transverse colon | 4.5 | 5.1 | 3.3 | 8.0 | 7.5 | 0.0 | 3.1 | 11.1 | |||||
Descending colon | 2.4 | 5.1 | 1.8 | 12.0 | 1.9 | 0.0 | 1.9 | 5.6 | |||||
Sigmoid colon | 15.8 | 7.7 | 18.3 | 48.0 | 18.8 | 55.6 | 18.2 | 55.6 | |||||
Rectum | 64.3 | 53.8 | 65.4 | 8.0 | 62.5 | 11.1 | 65.4 | 5.6 | |||||
Tumor differentiation grade | |||||||||||||
High | 0.9 | 2.6 | 0.4 | 0.0 | 0.6 | 0.0 | 0.3 | 0.0 | |||||
Moderate | 58.4 | 53.8 | 68.0 | 76.0 | 73.8 | 77.8 | 67.6 | 66.7 | |||||
Low | 40.7 | 43.6 | 31.7 | 24.0 | 30.0 | 22.2 | 32.1 | 33.3 | |||||
Histological type | |||||||||||||
AC | 67.9 | 76.9 | 76.7 | 80.0 | 69.4 | 77.8 | 77.4 | 83.3 | |||||
AMC | 9.3 | 2.6 | 7.4 | 4.0 | 11.3 | 11.1 | 7.1 | 0.0 | |||||
MA | 20.2 | 20.5 | 14.0 | 12.0 | 18.8 | 11.1 | 13.4 | 16.7 | |||||
SRCC | 1.8 | 0.0 | 1.1 | 0.0 | 0.6 | 0.0 | 1.0 | 0.0 | |||||
The others | 0.8 | 0.0 | 0.8 | 4.0 | 0.0 | 0.0 | 1.0 | 0.0 | |||||
Tumor size | |||||||||||||
> 20mm | 28.2 | 33.3 | 30.0 | 20.0 | 41.9 | 55.6 | 29.0 | 16.7 | |||||
PNI | |||||||||||||
Presence | 49.3 | 41.0 | 51.9 | 52.0 | 49.4 | 22.2 | 52.3 | 50.0 | |||||
LVI | |||||||||||||
Presence | 44.3 | 51.3 | 44.8 | 56.0 | 33.1 | 66.7 | 46.9 | 66.7 | |||||
Lesion amount | |||||||||||||
Unifocal | 95.6 | 97.4 | 94.6 | 100.0 | 93.1 | 100.0 | 95.2 | 100.0 | |||||
Ki-67 protein level | |||||||||||||
High | 46.9 | 28.2 | 52.9 | 52.0 | 56.9 | 33.3 | 52.9 | 38.9 | |||||
Operation method | |||||||||||||
Laparotomy | 70.8 | 82.1 | 67.4 | 80.0 | 66.3 | 77.8 | 67.6 | 83.3 | |||||
LNR | |||||||||||||
≤ 0.25 | 84.4 | 94.9 | 83.5 | 84.0 | 85.0 | 77.8 | 83.5 | 83.3 | |||||
0.26–0.5 | 14.4 | 5.1 | 14.7 | 16.0 | 14.4 | 22.2 | 14.7 | 16.7 | |||||
0.51–0.75 | 1.0 | 0.0 | 1.6 | 0.0 | 0.6 | 0.0 | 1.7 | 0.0 | |||||
> 0.75 | 0.2 | 0.0 | 0.1 | 0.0 | 0.0 | 0.0 | 0.1 | 0.0 | |||||
TNM stage | |||||||||||||
I | 6.9 | 2.6 | 5.5 | 4.0 | 8.8 | 0.0 | 4.7 | 5.6 | |||||
II | 29.2 | 28.2 | 27.6 | 32.0 | 38.1 | 33.3 | 25.4 | 27.8 | |||||
III | 63.9 | 69.2 | 66.9 | 64.0 | 53.1 | 66.7 | 69.9 | 66.7 | |||||
Disease status | |||||||||||||
Recurrence | 19.3 | 0.0 | |||||||||||
Distant metastasis | 80.7 | 100.0 | |||||||||||
OS overall survival, DFS disease-free survival, RFS recurrence-free survival, DMFS distant metastasis-free survival, BMI body mass index, HP hypertension, DM diabetes mellitus, CHD chronic heart disease, CEA carcinoembryonic antigen, CRP C-reactive protein, AC adenocarcinoma, AMC adenocarcinoma with mucus composition, MA mucinous adenocarcinoma, SRCC signet ring cell carcinoma, PNI perineural invasion, LVI lymphvascular invasion, LNR lymph node ratio, TNM stage tumor node metastasis stage |
Important model variables
The detailed MSEs are shown in Table 2. We chose subset regression for the patient OS model, and the important variables were tumor differentiation grade, Ki-67 protein level, histological type, TNM stage, and serum CRP level (5 in total, Fig. 2). LASSO regression was used for the DFS model, and the 5 vital variables were PNI, tumor differentiation grade, Ki-67 protein level, lesion amount, and TNM stage (Fig. 3 and Table 3). For RFS, we used ridge regression, and the four important features were DM, CHD, operation method, and age (Fig. 4 and Table 3). Subset regression was chosen for the DMFS model, and PNI, Ki-67 protein level, tumor differentiation grade, TNM stage, and histological type were the 5 important indicators (Fig. 5).
Regression models | MSEs | ||||||
---|---|---|---|---|---|---|---|
OS | DFS | RFS | DMFS | ||||
Subset regression model | 0.4500633 | 0.3749525 | 0.7710959 | 0.4243934 | |||
Ridge regression model | 0.4602181 | 0.3678196 | 0.7513997 | 0.4449402 | |||
LASSO regression model | 0.5067479 | 0.3608976 | 0.8054069 | 0.4426869 | |||
LASSO cross-validation model | 0.4890136 | 0.4112665 | 0.8090451 | 0.4822449 | |||
MSE mean square error, OS overall survival, DFS disease-free survival, RFS recurrence-free survival, DMFS distant metastasis-free survival, LASSO the least absolute shrinkage and selection operator |
No. | Intercept | s1 |
---|---|---|
1 | PNI | 0.166095917 |
2 | Tumor differentiation grade | 0.145056847 |
3 | Ki-67 protein level | 0.125446209 |
4 | Lesion amount | 0.122986628 |
5 | TNM stage | 0.083527594 |
6 | Family history of tumors | 0.068736389 |
7 | Histological type | 0.068138058 |
8 | Family history of gastrointestinal tumors | 0.064282454 |
9 | Age | 0.052132288 |
10 | Sex | 0.049713378 |
11 | Serum CRP level | 0.049208782 |
12 | CHD | 0.038371955 |
13 | Operation method | 0.035456908 |
14 | BMI | 0.028557122 |
15 | LNR | 0.020353514 |
16 | Smoking | 0.018868793 |
17 | Tumor location | 0.015574714 |
18 | Serum CEA level | 0.012848902 |
19 | Drinking | 0.011499902 |
20 | Tumor size | 0.009350538 |
21 | LVI | 0.008701713 |
22 | HP | 0.006665916 |
23 | DM | 0.005961741 |
24 | Disease status | 0.002272804 |
LASSO the least absolute shrinkage and selection operator, PNI perineural invasion, TNM stage tumor node metastasis stage, CRP C-reactive protein, CHD chronic heart disease, BMI body mass index, LNR lymph node ratio, CEA carcinoembryonic antigen, LVI lymphvascular invasion, HP hypertension, DM diabetes mellitus |
No. | Intercept | s1 |
---|---|---|
1 | DM | 0.184923946 |
2 | CHD | 0.108369703 |
3 | Operation method | 0.103702878 |
4 | Age | 0.097713638 |
5 | Smoking | 0.097274712 |
6 | PNI | 0.080909316 |
7 | Histological type | 0.079653416 |
8 | Family history of tumors | 0.074394127 |
9 | Ki-67 protein level | 0.067737861 |
10 | Drinking | 0.067211567 |
11 | LNR | 0.065499595 |
12 | Family history of gastrointestinal tumors | 0.056467337 |
13 | Lesion amount | 0.047623412 |
14 | Serum CRP level | 0.042766487 |
15 | TNM stage | 0.037254881 |
16 | Serum CEA level | 0.033536644 |
17 | Tumor location | 0.02592845 |
18 | BMI | 0.020303689 |
19 | Sex | 0.0181275 |
20 | Tumor differentiation grade | 0.013301381 |
21 | Tumor size | 0.01124825 |
22 | LVI | 0.006359537 |
23 | HP | 0.005023189 |
DM diabetes mellitus, CHD chronic heart disease, PNI perineural invasion, LNR lymph node ratio, CRP C-reactive protein, TNM stage tumor node metastasis stage, CEA carcinoembryonic antigen, BMI body mass index, LVI lymphvascular invasion, HP hypertension |
Model performance
The receiver operating characteristic (ROC) curves of the OS model obtained by the four ML algorithms are shown in Fig. 6 (Tree (AP: 67.8%; the area under the receiver operating characteristic curve, AUC: 0.69), SVM (AP: 69.5%; AUC: 0.70), KNN (AP: 68.7%; AUC: 0.71), and Ensemble (AP: 67.2%; AUC: 0.73)). Regarding DFS, Fig. 7 shows the four ROC curves in detail (Tree (AP: 82.4%; AUC: 0.50), SVM (AP: 82.4%; AUC: 0.56), KNN (AP: 82.4%; AUC: 0.69), and Ensemble (AP: 82.4%; AUC: 0.72). The obtained ROC curves for the RFS model are shown in Fig. 8 (Tree (AP: 82.0%; AUC: 0.45), SVM (AP: 82.0%; AUC: 0.61), KNN (AP: 82.0%; AUC: 0.58), and Ensemble (AP: 82.0%; AUC: 0.57)). Figure 9 shows the ROC curves for DMFS (Tree (AP: 82.1%; AUC: 0.49), SVM (AP: 82.8%; AUC: 0.61), KNN (AP: 82.8%; AUC: 0.71), and Ensemble (AP: 82.8%; AUC: 0.73)).
In this study, we developed and validated a promising ML system for predicting the times of oncological outcomes, and we screened the important variables for each ML model. This system represents a simple, practical, and easily assessable tool that clinicians can refer to when selecting treatments. In addition, our system has good tolerance for heterogeneous patients and does not require clear patient medical histories, which lowers the threshold for use. Our work is different from previous risk forecast models. We considered temporal dynamics and found that patients’ prognoses could be understood longitudinally and in the long term. In addition, we screened uncommonly used indicators for predicting outcomes, which provides new insights for pathologists and basic science researchers and even for pharmacologists. Moreover, the adaptability and interpretability of our system promote its application in hospitals at different levels. Our work also demonstrated the feasibility of applying ML models to a large number of heterogeneous CRC patients.
Although several genetic and molecular markers have been proven to be correlated with patient prognoses (Hossain et al. 2021; Liu et al. 2022), we did not select them as potential variables. Because we aimed to build a clinically generalizable system that might be used in data-poor situations, the indicators we selected were clinically applicable and had small heterogeneity among patients. In addition, to avoid selection bias and limitations, we avoided the TNM stage-centric impasse by inputting a large number of potential variables into our work, allowing the ML algorithms to screen the important variables that performed best. Furthermore, there were no missing values in our database, which prevented bias caused by improper filling of missing values.
We managed to choose patients with a clear medical history before the operation who underwent curative initial treatment, so we excluded stage I, II, and III CRC patients who had neoadjuvant chemoradiotherapy (which may lead to a vague history). Patients with stage IV CRC who were not eligible for radical surgery were also excluded. Consequently, the model might also be a rough reference when physicians at higher-level hospitals are redesigning treatment strategies for patients from lower-level hospitals with unclear postoperation radiotherapy and chemotherapy histories. Treatment options for these patients are difficult to determine, and it is difficult for oncologists to obtain references from previous studies that stratify patients by chemotherapy regimens. Moreover, to avoid bias, we excluded patients who did not have endpoint data. When applying our system, patients predicted to not have outcomes would be classified as greater than 5 years (equivalent to oncological recurrence).
To better predict longitudinal oncological outcomes, we configured and optimized hyperparameters wherever possible. To better evaluate the fit of the models, we chose the classification predictors C statistics (to avoid the influence of the threshold value, we used the AUC) and AP, which corresponded to the nonparametric ML algorithms, to evaluate the accuracy of the prediction. In the process of screening variables, due to the influence of sample size, variables (the number of variables was greater than 20), and other factors, we chose methods based on multiple regression models. Evaluation indicators in the process of model selection included error sum of squares (SSE), MSE, and so on. However, the SSE value in this experiment was meaningless (SSE would inevitably increase when the sample size increased). We chose the MSE as the evaluation indicator. Since this experiment involved regression prediction based on small-sample data, there was no model overfitting state. Consequently, the default state of the system configuration was processed by us (when encountering no solution or a local optimal solution, it would be regarded as a convergence failure state) considering the development costs. We think the lower AUC for OS is a reasonable result given the biological complexity and the small sample size. Furthermore, to reduce bias, computer experts were blinded to the meaning of each indicator when building the ML models.
Extending the survival time is the shared goal of clinicians and oncology patients. Quantifying patient outcomes aids in shared decision making (Howard et al. 2020). Because of the heterogeneity of CRCs, physicians and patients must seriously consider the tradeoffs between adverse effects and benefits (Ganguli et al. 2022) when choosing a treatment strategy. It is possible to improve outcomes by closer follow-up or the administration of additional chemoradiotherapy to patients who are predicted to have poorer prognoses. Consequently, we suggest that patients who tend to have a shorter DMFS receive prophylactic chemotherapy or regional radiotherapy for the common metastatic sites of CRC described by Jiang B et al. (2021). Moreover, the identification of patients with better prognoses could reduce the cost of medical care and improve the level of humanistic care by reducing the psychological burden on patients and their families. Therefore, predictive tools such as our system are urgently needed in the clinic.
However, referring to the results output by "black box" systems for managing patients is not always acceptable by clinicians and patients (Pattarabanjird et al. 2022; Watson et al. 2019). The interpretability of models is vital, especially in biomedicine (Murdoch et al. 2019; Yu et al. 2018). To turn a "black box" system into a "white box" system, we screened out the corresponding predictors for different outcomes. TNM stage, the primary indicator for chemoradiotherapy decisions, was screened out in the OS, DFS, and DMFS models, which made our models more credible. Moreover, indicators that had been widely found to correlate with prognoses, such as PNI (Knijn et al. 2016; Nikberg et al. 2016; Song et al. 2019), pathological type (Kim et al. 2013; Nitsche et al. 2013; Sheng et al. 2019; Wu et al. 2019), and tumor differentiation grade (Garrity et al. 2004), were also selected in the models, which further confirmed the credibility of our system. One of the potential benefits of using ML models is that the important variables are identified, and less critical parameters are ignored. Several uncommon predictors were additionally included in our model, which provided new insights into predicting prognoses. The levels of CRP and Ki-67 were shown to be factors affecting prognoses, consistent with studies showing that high serum CRP levels are associated with higher postoperative complication rates (Domínguez-Comesaña et al. 2017; Muñoz et al. 2018) and that Ki-67 levels reflect the proliferative capacity of cells (Schlüter et al. 1993), especially the proliferative capacity of tumor cells (Duchrow et al. 1994; Starborg et al. 1996). Surprisingly, LVI was only modestly predictive. Our models also identified some predictors that were not previously considered to be directly linked to poor prognosis (unifocal vs. multifocal lesions and surgery vs. laparoscopy) (Barz and Stöss C 2021; Chin et al. 2019; Fleshman et al. 2019; Hida et al. 2018). These factors are more likely to be directly related to surgical trauma rather than survival time. For the RFS model, two chronic diseases, DM and CHD, were selected together with age, possibly due to insufficient sample size. In addition, we indirectly focused on elderly oncological patients who had baseline diseases.
Our findings showed the formidable predictive power of ML methods, particularly for heterogeneous diseases that are stratified by outcomes. ML has unique value in clinical applications; it can guide patient managements, improve patient outcomes, and tailor treatment regimens, especially when resources are scarce (when only clinicopathological and surgical variables are available for analysis). However, although we obtained encouraging high prediction accuracy and "white-box" results, more progress is needed before ML can be fully relied upon. In addition, in clinical practice, traditional performance measures such as the AUC must be translated into medically relevant measures to elucidate the patient-centric value of ML models. ML still ways to go.
Moreover, the limitations of our study must be noted. First, the sample size we used to collect data to input into the ML models was small (especially regarding RFS). When the data were input into the ML models for parameter optimization, the sensitivity of parameter adjustment could not be estimated due to the small sample size. Second, as a dilemma of ML (Yu et al. 2018), the sample uniformity in the data could not be estimated, which could affect the final results. Furthermore, this study was conducted based on a retrospective analysis. However, the patient data were obtained from a well-conceived and well-characterized cohort, which adds to the credibility of our results; thus, this study can serve as the basis for subsequent prospective studies. In addition, postoperative treatment information, such as specific radiotherapy and chemotherapy treatments as well as detailed surgical methods were not available in our database.
We believe that subsequent research could improve the accuracy of individual survival time prediction by employing other techniques, obtaining larger sample sizes, improving follow-up accuracy, and so on. Furthermore, it is necessary to add detailed studies on genomics and chemoradiotherapy regimens. Our next goal is to develop a predictive model system as an app and install it in a hospital system.
In conclusion, we successfully designed and validated clinicopathological-based ML prediction models. Our work might promote the application of precision treatments and improve clinical outcomes for CRC patients. We showed the potential of ML in improving the direction of treatment strategies.
Funding This work was supported by the National Natural Science Foundation of China (Grant number 81802777).
Competing Interests The authors have no relevant financial or non-financial interests to disclose.
Author Contributions All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Xiaolin Ji, Shuo Xu and Xiaoyu Li. The first draft of the manuscript was written by Xiaolin Ji and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Data Availability The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
Code Availability Commercially available MATLAB software was used to process data and develop methods. The codes will be shared by the corresponding author with a request.
Ethics approval This study was performed in line with the principles of the Declaration of Helsinki. Approval was granted by the Ethics Committee of the Affiliate Hospital of Qingdao University (reference number QYFYWZLL26957).
Consent to participate The need for informed consent was waived given that this was a retrospective study.
Consent to publish The need for consent was waived given that there were no individual person’s data in any form contained in this study.