In this study, we developed and validated a promising ML system for predicting the times of oncological outcomes, and we screened the important variables for each ML model. This system represents a simple, practical, and easily assessable tool that clinicians can refer to when selecting treatments. In addition, our system has good tolerance for heterogeneous patients and does not require clear patient medical histories, which lowers the threshold for use. Our work is different from previous risk forecast models. We considered temporal dynamics and found that patients’ prognoses could be understood longitudinally and in the long term. In addition, we screened uncommonly used indicators for predicting outcomes, which provides new insights for pathologists and basic science researchers and even for pharmacologists. Moreover, the adaptability and interpretability of our system promote its application in hospitals at different levels. Our work also demonstrated the feasibility of applying ML models to a large number of heterogeneous CRC patients.
Although several genetic and molecular markers have been proven to be correlated with patient prognoses (Hossain et al. 2021; Liu et al. 2022), we did not select them as potential variables. Because we aimed to build a clinically generalizable system that might be used in data-poor situations, the indicators we selected were clinically applicable and had small heterogeneity among patients. In addition, to avoid selection bias and limitations, we avoided the TNM stage-centric impasse by inputting a large number of potential variables into our work, allowing the ML algorithms to screen the important variables that performed best. Furthermore, there were no missing values in our database, which prevented bias caused by improper filling of missing values.
We managed to choose patients with a clear medical history before the operation who underwent curative initial treatment, so we excluded stage I, II, and III CRC patients who had neoadjuvant chemoradiotherapy (which may lead to a vague history). Patients with stage IV CRC who were not eligible for radical surgery were also excluded. Consequently, the model might also be a rough reference when physicians at higher-level hospitals are redesigning treatment strategies for patients from lower-level hospitals with unclear postoperation radiotherapy and chemotherapy histories. Treatment options for these patients are difficult to determine, and it is difficult for oncologists to obtain references from previous studies that stratify patients by chemotherapy regimens. Moreover, to avoid bias, we excluded patients who did not have endpoint data. When applying our system, patients predicted to not have outcomes would be classified as greater than 5 years (equivalent to oncological recurrence).
To better predict longitudinal oncological outcomes, we configured and optimized hyperparameters wherever possible. To better evaluate the fit of the models, we chose the classification predictors C statistics (to avoid the influence of the threshold value, we used the AUC) and AP, which corresponded to the nonparametric ML algorithms, to evaluate the accuracy of the prediction. In the process of screening variables, due to the influence of sample size, variables (the number of variables was greater than 20), and other factors, we chose methods based on multiple regression models. Evaluation indicators in the process of model selection included error sum of squares (SSE), MSE, and so on. However, the SSE value in this experiment was meaningless (SSE would inevitably increase when the sample size increased). We chose the MSE as the evaluation indicator. Since this experiment involved regression prediction based on small-sample data, there was no model overfitting state. Consequently, the default state of the system configuration was processed by us (when encountering no solution or a local optimal solution, it would be regarded as a convergence failure state) considering the development costs. We think the lower AUC for OS is a reasonable result given the biological complexity and the small sample size. Furthermore, to reduce bias, computer experts were blinded to the meaning of each indicator when building the ML models.
Extending the survival time is the shared goal of clinicians and oncology patients. Quantifying patient outcomes aids in shared decision making (Howard et al. 2020). Because of the heterogeneity of CRCs, physicians and patients must seriously consider the tradeoffs between adverse effects and benefits (Ganguli et al. 2022) when choosing a treatment strategy. It is possible to improve outcomes by closer follow-up or the administration of additional chemoradiotherapy to patients who are predicted to have poorer prognoses. Consequently, we suggest that patients who tend to have a shorter DMFS receive prophylactic chemotherapy or regional radiotherapy for the common metastatic sites of CRC described by Jiang B et al. (2021). Moreover, the identification of patients with better prognoses could reduce the cost of medical care and improve the level of humanistic care by reducing the psychological burden on patients and their families. Therefore, predictive tools such as our system are urgently needed in the clinic.
However, referring to the results output by "black box" systems for managing patients is not always acceptable by clinicians and patients (Pattarabanjird et al. 2022; Watson et al. 2019). The interpretability of models is vital, especially in biomedicine (Murdoch et al. 2019; Yu et al. 2018). To turn a "black box" system into a "white box" system, we screened out the corresponding predictors for different outcomes. TNM stage, the primary indicator for chemoradiotherapy decisions, was screened out in the OS, DFS, and DMFS models, which made our models more credible. Moreover, indicators that had been widely found to correlate with prognoses, such as PNI (Knijn et al. 2016; Nikberg et al. 2016; Song et al. 2019), pathological type (Kim et al. 2013; Nitsche et al. 2013; Sheng et al. 2019; Wu et al. 2019), and tumor differentiation grade (Garrity et al. 2004), were also selected in the models, which further confirmed the credibility of our system. One of the potential benefits of using ML models is that the important variables are identified, and less critical parameters are ignored. Several uncommon predictors were additionally included in our model, which provided new insights into predicting prognoses. The levels of CRP and Ki-67 were shown to be factors affecting prognoses, consistent with studies showing that high serum CRP levels are associated with higher postoperative complication rates (Domínguez-Comesaña et al. 2017; Muñoz et al. 2018) and that Ki-67 levels reflect the proliferative capacity of cells (Schlüter et al. 1993), especially the proliferative capacity of tumor cells (Duchrow et al. 1994; Starborg et al. 1996). Surprisingly, LVI was only modestly predictive. Our models also identified some predictors that were not previously considered to be directly linked to poor prognosis (unifocal vs. multifocal lesions and surgery vs. laparoscopy) (Barz and Stöss C 2021; Chin et al. 2019; Fleshman et al. 2019; Hida et al. 2018). These factors are more likely to be directly related to surgical trauma rather than survival time. For the RFS model, two chronic diseases, DM and CHD, were selected together with age, possibly due to insufficient sample size. In addition, we indirectly focused on elderly oncological patients who had baseline diseases.
Our findings showed the formidable predictive power of ML methods, particularly for heterogeneous diseases that are stratified by outcomes. ML has unique value in clinical applications; it can guide patient managements, improve patient outcomes, and tailor treatment regimens, especially when resources are scarce (when only clinicopathological and surgical variables are available for analysis). However, although we obtained encouraging high prediction accuracy and "white-box" results, more progress is needed before ML can be fully relied upon. In addition, in clinical practice, traditional performance measures such as the AUC must be translated into medically relevant measures to elucidate the patient-centric value of ML models. ML still ways to go.
Moreover, the limitations of our study must be noted. First, the sample size we used to collect data to input into the ML models was small (especially regarding RFS). When the data were input into the ML models for parameter optimization, the sensitivity of parameter adjustment could not be estimated due to the small sample size. Second, as a dilemma of ML (Yu et al. 2018), the sample uniformity in the data could not be estimated, which could affect the final results. Furthermore, this study was conducted based on a retrospective analysis. However, the patient data were obtained from a well-conceived and well-characterized cohort, which adds to the credibility of our results; thus, this study can serve as the basis for subsequent prospective studies. In addition, postoperative treatment information, such as specific radiotherapy and chemotherapy treatments as well as detailed surgical methods were not available in our database.
We believe that subsequent research could improve the accuracy of individual survival time prediction by employing other techniques, obtaining larger sample sizes, improving follow-up accuracy, and so on. Furthermore, it is necessary to add detailed studies on genomics and chemoradiotherapy regimens. Our next goal is to develop a predictive model system as an app and install it in a hospital system.
In conclusion, we successfully designed and validated clinicopathological-based ML prediction models. Our work might promote the application of precision treatments and improve clinical outcomes for CRC patients. We showed the potential of ML in improving the direction of treatment strategies.