Machine learning-based prediction of surgical benefit in borderline resectable and locally advanced pancreatic cancer

Surgery represents a primary therapeutic approach for borderline resectable and locally advanced pancreatic cancer (BR/LAPC). However, BR/LAPC lesions exhibit high heterogeneity and not all BR/LAPC patients who undergo surgery can derive beneficial outcomes. The present study aims to employ machine learning (ML) algorithms to identify those who would obtain benefits from the primary tumor surgery. We retrieved clinical data of patients with BR/LAPC from the Surveillance, Epidemiology, and End Results (SEER) database and classified them into surgery and non-surgery groups based on primary tumor surgery status. To eliminate confounding factors, propensity score matching (PSM) was employed. We hypothesized that patients who underwent surgery and had a longer median cancer-specific survival (CSS) than those who did not undergo surgery would certainly benefit from surgical intervention. Clinical and pathological features were utilized to construct six ML models, and model effectiveness was compared through measures such as the area under curve (AUC), calibration plots, and decision curve analysis (DCA). We selected the best-performing algorithm (i.e., XGBoost) to predict postoperative benefits. The SHapley Additive exPlanations (SHAP) approach was used to interpret the XGBoost model. Additionally, data from 53 Chinese patients prospectively collected was used for external validation of the model. According to the results of the tenfold cross-validation in the training cohort, the XGBoost model yielded the best performance (AUC = 0.823, 95%CI 0.707–0.938). The internal (74.3% accuracy) and external (84.3% accuracy) validation demonstrated the generalizability of the model. The SHAP analysis provided explanations independent of the model, highlighting important factors related to postoperative survival benefits in BR/LAPC, with age, chemotherapy, and radiation therapy being the top three important factors. By integrating of ML algorithms and clinical data, we have established a highly efficient model to facilitate clinical decision-making and assist clinicians in selecting the population that would benefit from surgery.


Introduction
The occurrence of pancreatic cancer (PC) has been on the rise over the last few decades, making it one of the most lethal cancers. The latest data released by the American Cancer Society (ACS) revealed that the number of newly identified cases of PC in the USA was 62,210 in the year 2022. Among all malignant tumors, PC ranked tenth for incidence in males and eighth in females (Siegel et al. 2022). The main clinical characteristics of PC are insidious onset, high rate of drug resistance, lack of effective treatment, early recurrence, and metastasis (Chen et al. 2020;Sung et al. 2021). Thus, at the time of initial diagnosis, the majority of patients present with Leiming Zhang and Zehao Yu have contributed equally to this work and share first authorship.
The criteria for resectability of PC have been varying in the guidelines proposed by different academic organizations, and there is no unified opinion among experts worldwide regarding the resectability of PC. The latest National Comprehensive Cancer Network (NCCN) guidelines classify PC (T4, Nx, M0) into two major categories based on whether the tumor invades large blood vessels beyond 180°: borderline resectable pancreatic cancer (BRPC) and locally advanced unresectable pancreatic cancer (LAPC) (Tempero et al. 2021). Due to the high risk of postoperative complications and the complexity of the procedure, BR/LAPC patients are often in a therapeutic gray area where the feasibility of the procedure is uncertain and individualized treatment planning is required for each patient (Freelove and Walling 2006;Tsujimoto et al. 2019). Previous studies have demonstrated that multimodal therapies, such as radiation therapy, chemotherapy, and surgical resection, can achieve better treatment outcomes and improve survival and prognosis for patients with BR/LAPC (Li et al. 2019;Yoo et al. 2019). However, when treating BR/LAPC patients, personalized treatment plans must be developed that fully consider factors such as the location and distribution of the lesions, the extent of involvement with adjacent organs, and the patient's overall health status. Moreover, the available literature indicates that imaging modalities may have limitations in identifying individuals who will derive benefit from surgery (Kang et al. 2018). Thus, there is an urgent requirement for a precise and dependable tool to predict the subset of the population that would benefit the most from surgical intervention.
Machine learning (ML), one of the primary tools in data mining, boasts greater flexibility and scalability compared with traditional statistical methods (Ngiam and Khor 2019). ML builds risk models for predicting disease, diagnosing disease severity, and assessing disease prognosis by learning from data obtained from existing medical tests or investigations on patients. Clinicians can leverage this tool to consider more evidence and provide personalized predictions. Therefore, this study employed diverse machine learning algorithms to construct prognostic models for patients undergoing BR/LAPC resection, aiming to identify the most effective predictive model. By utilizing ML techniques, this research endeavors to provide robust support for identifying BR/LAPC who would benefit from surgery.

Data source and extraction
This study strictly adhered to the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement to ensure transparent and comprehensive reporting of its findings. To obtain our data, we utilized the highly respected and comprehensive Surveillance, Epidemiology, and End Results (SEER) database. The SEER database contains cancer incidence and survival data from 18 different registries, covering more than a quarter of the overall US population. It is noteworthy that the SEER database garners data from both academic and non-academic hospitals, thereby providing a broad representation of the American populace as a whole.
Histopathological confirmation of pancreatic ductal adenocarcinoma (PDAC) was a prerequisite for patient enrollment in the study, and with tumor growth extending outside the pancreas and involving major blood vessels nearby (T4, AJCC 8th Edition). Patients meeting any of the following exclusion criteria were ineligible for this study: (a) age below 18 years; (b) patients with non-pathological diagnosis; (c) patients confirmed by autopsy or death; (d) non-primary tumor; (e) postoperative period ≤ 30d after surgery; (f) patients with evidence of metastasis; (g) missing or insufficient clinical information.
To further verify the accuracy of the model, we prospectively collected patient data from those who underwent surgery for BR/LAPC at Ningbo University Affiliated Li Huili Hospital between December 2016 and February 2018.

Propensity score matching
Propensity score matching (PSM) is a statistical approach extensively employed to remedy selection bias or confounding variables in observational studies. Its fundamental notion involves constructing a propensity score that predicts the probability of acquiring a particular treatment or exposure based on the baseline characteristics of an individual. PSM aims to minimize confounding effects and reduce the bias in the estimated treatment effect by matching individuals with similar propensity scores. The study divided the samples into two categories based on whether the patients underwent surgery, namely the surgical group and the non-surgical group. Furthermore, 1:1 PSM on the logit scale was performed to ensure comparable characteristics between patients in both groups.
We hypothesized that the median cancer-specific survival (CSS) time of patients who received primary tumor resection was longer compared to those who did not (8 months, as obtained by post-data adjustment using PSM). Based on this, the patients were assigned to either the benefit or non-benefit group.

Establishment and evaluation of the predictive model
Given that the variables included in this study were recoded as categorical variables, the divergence in categorical variables was assessed using chi-square tests or Fisher's exact tests. To minimize potential confounding, we utilized the variables that demonstrated statistical significance in both univariate and multivariate logistic regression to create ML models. Subsequently, the final selection of ML model candidates was determined by the subset of variables with a p value of less than 0.05. Patients undergoing primary tumor resection were randomly divided into a training set and an internal test set in a 7:3 ratio. The training set was used to train six ML algorithms to predict the postoperative benefit of BR/ LAPC. Our models consisted of the following: extreme gradient boosting classifier (XGBoost), complement naive Bayes (ComplementNB), random forest (RF), k-nearest neighbor algorithm (kNN), support vector machine (SVM) and logistic regression (LR), and the test set was applied to evaluate them. To ensure robustness of our models, we used k-fold cross-validation with tenfold as our resampling method, and tuned the hyperparameters using grid search. Cross-validation ensures better evaluation of model performance by averaging metrics over multiple trials. We used the validation set to fine-tune the model parameters, while the test set was utilized to evaluate the system's performance. To evaluate the clinical value of our predictive models, we conducted three measures of model quality, namely discriminative power, calibration, and clinical utility. Firstly, we quantified the discriminative power of the models using receiver operating characteristic curve (ROC) analysis. Subsequently, we evaluated model performance through calibration plots to assess the level of deviation of the models' calibration and prediction results from the actual results. The clinical effectiveness of ML algorithms was evaluated using decision curve analysis (DCA) to compute the net benefit across various threshold probabilities. Moreover, we assessed six models for confusion matrix metrics, including average accuracy (AP), accuracy, sensitivity, specificity, and F1 score.
Additionally, we used data from 53 patients at Ningbo University Affiliated Li Huili Hospital in China as an external validation cohort to further evaluate the applicability of our ML model.

Model interpretation
SHAP (SHapley Additive exPlanations) was utilized to provide an explanation for the best-performing machine model. SHAP is a ML explanation method that introduces the concept of Shapley value, which measures the contribution of each feature to an ensemble or coalition, into ML. It calculates the effect of each feature to the model output and sums the importance of all features to derive the impact of each sample on the model output, providing a global explanation of the model along with the benefit and risk factors for each prediction. This makes the model's final prediction more understandable and interpretable, enabling us to evaluate the model's reliability and stability and increase our confidence in its output. A comprehensive flowchart is illustrated in Fig. 1.

Selection of study cohort and propensity score matching
The present study cohort comprised a total of 8011 patients with BR/LAPC identified from the SEER database between 2004 and 2015. According to the screening criteria, data from 6425 patients were extracted. Of the eligible patients, the sample was divided into surgical and non-surgical groups based on whether or not the primary tumor site was resected, and 696 (10.83%) of these patients underwent surgical treatment. Before PSM, significant differences in variables such as age, marital status, tumor site, tumor differentiation grade, tumor size, N stage, mode of spread, surrounding organ invasion, radiotherapy, and chemotherapy were observed between the surgical and non-surgical groups (Table 1, p < 0.05), indicating an imbalance in baseline characteristics that may impact subsequent findings. After matching patients in the surgical group 1:1 with those in the non-surgical group, there were no significant differences in covariates between the groups (Table 2, p > 0.05). PSM appears to minimize potential confounding and there were finally 451 patients in each group and the average propensity score was comparable between the two groups ( Fig. 2).

Survival analysis and key variables after PSM
Kaplan-Meier (KM) survival analysis and log-rank test were performed on the post-PSM population. The prognostic impact of BR/LAPC was investigated in this study by comparing the median CSS between PS-matched groups, with a sample size of 451 in each group. Patients who underwent surgery had a significantly higher median CSS (15 months, 95% CI 13-17) when compared to those in the non-surgical group (8 months, 95% CI 7-9). We assumed that resection for primary tumors in BR/LAPC patients may confer benefits to patients with BR/LAPC, provided their survival time exceeds the median CSS (8 months) of those who did not receive this surgical intervention. Among the cohort of patients who underwent resection, 294 (65.19%) survived beyond the median CSS. According to our hypothesis, patients who underwent resection and survived longer than 8 months were categorized as the beneficial group, while those who survived less than 8 months were classified as the non-beneficial group (Table 3). As shown in Table 4, univariate and multivariate logistic regression analyses were conducted to clarify the potential prognostic factors related to surgical benefits. Based on our analysis, the following Fig. 1 Steps involved in developing the models and a flowchart outlining the study procedure. a This figure outlines the process of obtaining data from the SEER databases and utilizing PSM to minimize inter-group disparities across 13 study variables. Additionally, an external validation cohort of 53 Chinese patients with BR/LAPC was included for further confirmation. b This figure outlines the development and testing, including internal and external validation, of six machine learning algorithms. c An explanation of the optimal model and the ranking of the importance of feature variables

Model performance
After identifying the eight variables mentioned above, six ML algorithms (i.e., XGBoost, ComplementNB, SVM, kNN, RF and LR) were used to predict the postoperative benefit of BR/LAPC. The tenfold cross-validation randomly divided the patients in the training set (n = 316) into the training and validation sets in a 9:1 ratio and ten times of validation was calculated. ML models. Moreover, calibration plots and DCA were conducted on validation sets to evaluate the performance of these prediction models (Fig. 3b, c). These two types of plots showed a strong correspondence between the predicted and observed probabilities for the six ML models, and all models achieved net clinical benefit against to an all-or-nothing treatment plan. Overall, we selected the XGBoost as the ultimate predictive model. For the test dataset (n = 135), we validated the performance of the established XGBoost model. The effectiveness of XGBoost in the test set was evaluated based on the following specific indicators: AUC = 0.828, 95%CI 0.756-0.900 (Fig. 4a), accuracy = 0.743, sensitivity = 0.679, specificity = 0.888, PPV = 0.648, NPV = 0.846 and F1 score = 0.663. The calibration curve was generated by plotting the predicted probabilities against the observed proportion of events in the test dataset (Fig. 4b).
The DCA curves showed that the model had the highest net benefit of 0.375, and the probability threshold rang was 0.08-0.75, highlighting the potential value of using the model to inform clinical decision-making in this range (Fig. 4c). Considering that the performance of the test set under the AUC metric is comparable to that of the validation set, the fit can be deemed successful and the XGBoost model can be used for the classification modeling task of the dataset.

Model interpretation and feature importance
SHAP was used to explain the XGBoost model (Fig. 5a). SHAP illustrated the effect of each feature to the ML algorithm by visually displaying the impact of each feature on the predicted output value near each point. The hue of the SHAP plot indicated whether each feature had a positive or negative impact on the model prediction results, while the horizontal position of the graph indicated the value of the feature. The SHAP value then showed the magnitude of each feature's impact on the model output. In the SHAP plot, points with higher feature values tended to indicate a greater contribution associated with higher model output. In addition, the feature importance ranking based on permutation importance for XGBoost can be seen in Fig. 5b.

Prospective validation
To investigate the generalizability and reliability of XGBoost across different datasets, prospectively collected data from 53 Chinese patients who underwent surgery with an initial diagnosis of (BR/LAPC) were used for external validation. We grouped the cohort using XGBoost algorithm into the predicted surgical benefit group and predicted surgical non-benefit group and compared their survival using Kaplan-Meier survival analysis. As shown in Fig. 6, the median survival times for patients in the predicted surgical benefit group (n = 34)   and predicted surgical non-benefit group (n = 19) were 22 (95% CI 17-39) and 6 (95% CI 3-8) months, respectively (p < 0.05). In addition, the XGBoost model proposed in this study achieved 84.31% accuracy on the external validation dataset with a sensitivity and specificity of 0.762 and 0.900, respectively. It followed that the established model can be widely suggested.

Discussion
The prognosis of BR/LAPC is exceedingly poor, characterized by a low likelihood of long-term survival. The standard approach for managing localized disease typically entails a multimodal treatment regimen consisting of chemotherapy, radiation therapy, and surgical resection (Tempero et al. 2021). Multidisciplinary treatment (MDT)-based combination therapy can improve the prognosis and survival outcomes of BR/LAPC patients, increase the 5-year survival rate and median survival, and increase the overall survival (OS) to 35-60 months for those who can undergo R0 resection after treatment (Reames et al. 2021). However, even with aggressive treatment, the five-year overall survival (OS) rate for BRPC/LAPC is still less than 15%, and not all patients benefit from surgery (Rawla et al. 2019). Currently, the clinical assessment and prediction of the comprehensive therapeutic efficacy for BR/LAPC primarily rely on imaging examinations. However, this method can only evaluate and predict efficacy based on the changes of tumor morphology, surrounding vascular relationships, and affected lymph nodes (Eisenhauer et al. 2009), without precision in assessing and predicting therapeutic effects. Related studies have shown that this method overestimates tumor unresectability (White et al. 2001). Despite radiology predicting persistent unresectability, Ferrone et al. demonstrated that 92% of the studied patients were able to successfully have their tumors removed while avoiding tumoral involvement at the resection margin and have a better prognosis (Ferrone et al. 2015). Therefore, further research and development of more accurate evaluation and prediction methods is needed to guide clinical treatment selection and optimize treatment outcomes. In the present study, we analyzed the baseline characteristics of a total of 6425 BR/LAPC patients from the SEER databases. To compare whether primary tumor resection prolongs survival, we utilized PSM to balance the variables between the surgical and non-surgical groups and eliminate selection bias. The PSM process involved building a predictive model that calculated the probability of surgery for each patient based on a range of covariates, such as age, gender, disease severity, etc., and then pairing the surgical and non-surgical groups based on this propensity score. This method effectively minimized the differences in the distribution of potential confounding variables between the two groups, allowing for a more accurate comparison of survival outcomes. According to the median CSS of the non-surgical group (8 months), the surgical group was divided into two subgroups: a surgical benefit group and a non-surgical benefit group. Subsequently, we trained and validated six ML models to predict postoperative benefit for BR/LAPC patients, revealing four important findings. First, all six ML models achieved high AUC values of > 0.75. Second, after comparing the performance of the six ML-based models, XGBoost exhibited the best predictive performance. Third, based on the XGBoost model, the relative importance of the variables was ranked with age as the most important, followed by chemotherapy and radiotherapy. Furthermore, through internal and external validation, we were able to test the generalizability of the XGBoost algorithm, and its findings indicated that the model had the potential to facilitate personalized treatment planning for BR/LAPC patients.
In addition to age, the implementation of chemotherapy and radiation regimens was found to be the most crucial variable in this study, which highlighted the critical role of radiochemotherapy in a comprehensive multidimensional treatment model for BR/LAPC. With advancements in treatment concepts and therapeutic technologies, neoadjuvant radiochemotherapy has gained significant importance in the comprehensive management of PC and its clinical value has been explicitly recommended in authoritative guidelines (Cloyd et al. 2019;Cohen et al. 2005;Moertel et al. 1981). In neoadjuvant therapy for BR/LAPC, there are several reasons for its rationale (Versteijne et al. 2016). Firstly, as it targets tissue that has not been dissected and is well-oxygenated, neoadjuvant radiochemotherapy can maximize the potential benefits of both radiation and chemotherapy, compared to adjuvant therapy. Secondly, it can shrink tumor volume, improve the feasibility of surgical resection, and decrease involvement of regional lymph nodes, thereby mitigating the risk of local recurrence. Thirdly, by extirpating adjacent structures that are infiltrated by the tumor, neoadjuvant radiochemotherapy can reduce staging and augment the proportion of R0 reconstruction, and extensive lymph node dissection, which significantly augment the complexity of the surgery. These interventions are exclusively achievable in large pancreatic centers and require collaboration among multidisciplinary teams (MDTs) (Christians et al. 2014). However, according to a consensus statement from ISGPS (International Study Group for Pancreatic Surgery) (Hartwig et al. 2014), extended resections are associated with higher rates of perioperative complications and no significant difference in overall survival compared to standard resections. This could be due to the fact that pancreatectomy with arterial resection for PC is associated with a total operative mortality rate as high as 10-20%, which limits the potential benefits of tumor resection itself. This indicates that "the scalpel ≠ omnipotence", as surgery alone cannot address the highly invasive tumor biology of PC (Katz and Varadhachary 2019). Therefore, the implementation of a comprehensive multidimensional treatment approach is crucial in the management of PC, where neoadjuvant radiochemotherapy plays a significant role.
As far as we know, there is currently no reliable ML model that can accurately predict the postoperative prognosis of BR/ LAPC. This is because the effectiveness of BR/LAPC depends on multiple factors, including tumor stage, biological characteristics, treatment plan, and patient's overall health status. Although some recent studies have found certain molecules that are related to the prognosis of PC (Friess et al. 1998;Jin et al. 2021;Yu et al. 2010), prognostic indicators at the molecular level are not as convenient and practical as easily obtainable clinical pathological indicators. In this study, we established a high-performance ML model through eight readily available clinical pathological indicators, and confirmed the generalization ability of the model through internal and external validation. ML in this study involved heterogeneous data with regional differences, racial differences, treatment Kaplan-Meier analysis for overall survival of Chinese BR/ LAPC according to the predicted results method differences related to years, and different combinations of treatment methods. This is an innovative attempt to help us establish a relatively stable model from complex data.
However, this study has several limitations to be considered. Firstly, the study outcomes may be affected by potential bias as a result of utilizing retrospective data. Secondly, specific biochemical parameters such as CA199 were not included in the SEER databases, which warrants further investigation. Additionally, the study did not provide information on the sequence of chemotherapy and surgery. Moreover, due to the potential differences in clinical practices across different countries and regions, the external validation cohort only included Chinese patients. Therefore, the generalizability of the ML model to other countries and populations remains unclear and requires further exploration.

Conclusions
In this study, we developed six different ML algorithms and compared them using metrics such as AUC, calibration plots, and DCA to select the best performing model. Subsequently, we selected the most practical and well-performing ML algorithm and tested its generalizability through internal and external validation. The model developed in this study enables individualized prediction of postoperative BR/LAPC, and we believe it serves as an important tool for identifying the potential population that may benefit from radical surgery. In conclusion, the severity of the lesion and the patient's physical condition should be fully considered when treating BR/ LAPC patients, and personalized treatment plans should be formulated accordingly.