Developing prediction models for short-term mortality after surgery for colorectal cancer using a Danish national quality assurance database

The majority of colorectal cancer surgeries are performed electively, and treatment is often decided at the multidisciplinary team conference. Although the average 30-day mortality rate is low, there is substantial population heterogeneity from young, healthy patients to frail, elderly patients. The individual risk of surgery can vary widely, and tailoring treatment for colorectal cancer may lead to better outcomes. This requires prediction of risk that is accurate and available prior to surgery. Data from the Danish Colorectal Cancer Group database was transformed into the Observational Medical Outcomes Partnership Common Data Model. Models were developed to predict the risk of mortality within 30, 90, and 180 days after colorectal cancer surgery using only covariates decided at the multidisciplinary team conference. Several machine-learning models were trained, but due to superior performance, a Least Absolute Shrinkage and Selection Operator logistic regression was used for the final model. Performance was assessed with discrimination (area under the receiver operating characteristic and precision recall curve) and calibration measures (calibration in large, intercept, slope, and Brier score). The cohort contained 65,612 patients operated for colorectal cancer in the period from 2001 to 2019 in Denmark. The Least Absolute Shrinkage and Selection Operator model showed an area under the receiver operating characteristic for 30-, 90-, and 180-day mortality after colorectal cancer surgery of 0.871 (95% CI: 0.86–0.882), 0.874 (95% CI: 0.864–0.882), and 0.876 (95% CI: 0.867–0.883) and calibration in large of 1.01, 0.98, and 1.01, respectively. The postoperative short-term mortality prediction model showed excellent discrimination and calibration using only preoperatively known predictors.


Introduction
Colorectal cancer (CRC) is the third most common malignant neoplastic disease in the world, with an incidence of 1.8 million patients and 935,000 deaths per year [1]. The only definitive cure is surgery; however, exposure to surgery is also related to risk of adverse events related to morbidity, dependency, and ultimately mortality.
The balance between the short-and long-term beneficial effects of surgery and potential harms is at the very core of any decision when scheduling patients for surgery. The correct identification of patients who face a higher risk of surgery-related complications can lead to a more individually optimized treatment plan facilitating shared decision-making and potentially decrease perioperative morbidity and mortality through interventions before surgery [2]. The potential benefit 1 3 of prehabilitation and optimization will most likely be in patients with limited physical resources, who generally have an increased risk of adverse outcomes after surgery [2,3]. At the same time, in patients with very low risk of complications, a good prediction model that can identify patients with very low risk of mortality can lead to accelerated treatment strategies both before and after surgery.
Although there are many clinical prediction models on short-term mortality, very few are actually used in a clinical context, and even fewer include only preoperative information, which is a prerequisite for them to be used before surgery, e.g., in the multidisciplinary team (MDT) setting [4,5].
We aimed to develop a prognostic clinical prediction model for short-term postoperative mortality after colorectal cancer surgery, including only predictors known at the MDT conference in order to address the unmet need of risk assessment prior to surgery. These can be either covariates describing previous diseases or scores like American Society of Anesthesiology (ASA) score, or covariates, which are decided at the MDT conference such as surgery type. Our main goal was to assess the performance of this model using calibration and discrimination metrics.

Data sources
In Denmark, access to register data does not require ethical approval. Processing of health register data was filed under the research inventory of Region Zealand (record number: REG-047-2020). Quality assurance data from the Danish Colorectal Cancer Group's database (DCCG) were obtained from The Danish Clinical Quality Program. DCCG is a national quality assurance database for diagnosis and treatment of patients with CRC [6]. The registry contains data for all patients diagnosed with CRC, and having a contact at a surgical department in Denmark since May 1st 2001, and has coverage of > 95% of patients [7]. The database contains detailed information of demography, cancer diagnosis, surgery, oncological treatment, and patient outcomes [6] on over 76,000 patients. Data from DCCG was transformed into the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) and Danish vocabularies were translated into standard vocabularies [8,9].

Study population
We included patients over the age of 18 from the DCCG database who have undergone CRC surgery from May 1st 2001 to December 31st 2019.

Outcomes
We assessed all-cause mortality during the times at risk (TAR), 30, 90, and 180 days after surgery.

Included covariates
For development of the model, all concepts in DCCG [6,10] were mapped to OMOP-CDM standard concepts. Covariates from DCCG include demographic, diagnostic, pathological, and perioperative data. Categorical covariates were converted using one-hot-encoding that meant they all were considered binary. All covariates that contributed to the models are reported in Supplementary Table 1, including description of the original source covariate.

Statistical analysis
The open-source tool ATLAS, provided by the Observational Health Data Science and Informatics (OHDSI) community [9], was used for model development. Data were randomly split into a training set used for model development, containing 75% of patients, and a test set used for internal validation containing 25% of patients. As such, models were trained using a random 75% and 25% training and tested using threefold cross-validation to optimize hyperparameter settings [9]. The ATLAS version used was 2.9.0, and models were trained using R v. 4.0.0 with the "PatientLevelPrediction" package v. 4.3.7 and Anaconda3 v. 4.4.0 with Python v. 3.6.10. We trained Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression, decision tree, random forest, gradient boosting machine, K-nearest neighbor, multilayer perception neural network, and AdaBoost models. We assessed the performance of the best-performing model using area under the receiver operating characteristic (AUROC) [11] and area under the precision-recall curve (AUPRC) and calibration using calibration-in-large, calibration slope, calibration intercept, and Brier score [12].
We trained and tested the models, using both a simple model containing only sex and age as predictors and a more complex model containing all preoperative covariates and covariates, which are decided at the MDT conference, available in DCCG to assess whether or not adding more clinical granularity improved performance. The models are not able to differentiate between a missing or negative value of a covariate which means that there is a risk of misclassification of patients with missing data. However, age and sex are mandatory fields in OMOP-CDM and will therefore never be missing.
The study adheres to the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines [13].

Participants
A total of 65,612 patients underwent surgery for colorectal cancer from 2001 to 2019, consisting of 53.4% female patients with a median age of 71 years. Most of the patients (86.9%) underwent elective surgery. The incidence of 30-day mortality was 5.42%, 90-day mortality was 8.53%, and 180day mortality was 11.42%. Patient characteristics can be viewed in Table 1.

Model development
Out of the models tested, the LASSO logistic regression performed best in the three times at risk with excellent discrimination (Figs. 1 and 2a-c) and good calibration (Fig. 3a-c).

Model performance
Discrimination of the LASSO regression model is presented as AUROC and AUPRC in Table 4 and Fig. 2a-c, and calibration is shown in Table 4 and Fig. 3a-c, and top 15 positive and negative weighted covariates from the 30-day mortality model are seen in Tables 2 and 3. For all three LASSO mortality models, we found excellent discrimination with AUROC of 0.871, 0.874, and 0.876, respectively. The AUPRC of 0.35, 0.44, and 0.54, respectively, should be considered good as a AUPRC and should be larger than the incidence of the outcome, and AUPRC here is 7-, 5-, and fivefold higher ( Table 1). Calibration was also excellent seen by calibrationin-large from 0.98 to 1.02, and calibration slope very close to 1 and calibration intercept and Brier score near 0 for all times at risk can be seen in Table 4 and through assessment of weak calibration plots in Fig. 3a-c. When comparing the complex models with the models using only age and sex as covariates, we saw markedly better performance in the complex models, which was excellent in terms of discrimination (AUROC > 0.8) and good in terms of calibration, whereas the simple model only showed moderate to fair discrimination (AUROC > 0.6 and > 0.7) and although still a higher AUPRC than the event incidence, markedly lower. Calibration measures were more similar than discrimination, as seen in Table 4.

Discussion and conclusion
We trained prediction models for short-term mortality after colorectal cancer surgery based solely on covariates available at the MDT conference with excellent discrimination and good calibration. Discrimination in terms of AUROC ranged from 0.871 to 0.876 and AUPRC from 0.35 to 0.54 and calibration ranging   Fig. 3a−c. Compared to models based on only age and sex as predictors, the data-driven prediction models showed vastly better performance. Based on the calibration plots, the model slightly underpredicts risks for patients with more than 50% risk of mortality. All predictors used in the prediction model could be available at a pre-operative MDT-conference. The risk factors for short-term postoperative mortality are aligned with the current literature, namely, that high age, high ASA score, exploratory procedures, and poor tumor differentiation were risk factors for mortality [14]. We found that predictors such as young age, low World Health Organization Performance Status (WHO PS), low ASA score, and slightly overweight body mass index (BMI) were associated with a lower risk of death during the time at risk.
We found the incidence of postoperative mortality to be 5.42%, 8.53%, and 11.42% for 30-day, 90-day, and 180-day time at risk, respectively. Although this may be considered high in today's context, it is important to emphasize that the model is based on data from 2001 to late 2019. During this time in Denmark, 30-day mortality for elective colorectal cancer surgery has decreased from 7.3 (2001) [14] to 1.4% (2018) [15]. Similar studies in France, England, and the Netherlands have shown a 30-day mortality in the time of 2006-2008 of 5-5.8% [16][17][18] and a subsequent decrease in 30-day mortality to 1.2% and 90-day mortality to 4.6% in 2017 [18], which is similar to the development in Denmark during the same time. This is likely due to changes in surgical approach from primarily open to primarily laparoscopic surgery [19] and implementation of (ERAS) regimens [20]. The model includes both elective and emergency surgery, and it is well-known that the mortality rate for emergency surgery is considerably higher with an incidence of 15.8% in 2018 [15].
Designing prediction models targeted for clinical use is not a new phenomenon. The most well-known surgical risk assessment tool is the American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP) surgical risk calculator. Discriminative accuracy for 30-day mortality showed an AUROC of 0.944 and a Brier score of 0.011 [21]; however, in validation for colorectal cancer patients, performance was somewhat lower with AUROC of 0.86 and a Brier score of 0.018. Comparably, Fazio et al. designed a 30-day mortality model after colorectal surgery with an AUROC of 0.801 [22], van der Sluis et al. created the Identification of Risk in Colorectal Surgery score with an AUROC Fig. 3 a- [24]. Generally, most previous studies do not include as many performance metrics, and a majority show no calibration measures such as calibration-in-large, intercept, and calibration slope. Brier score has previously been criticized for not being an optimal measure of performance and calibration in clinical models [25], and other parameters such as calibration-in-large are considered essential for external validation [26]. Therefore, we decided to report all performance measures in order to provide full transparency and optimal interpretation of model performance. All of the four studies above defined the covariates of the model  We view the use of our model as a tool to estimate mortality risk and tailor different patient treatment trajectories. This is because the current treatment guidelines for colorectal cancer lead some patients to overtreatment and some to undertreatment-both with unnecessarily high risk for the patient. The model should be viewed as a decision-support tool rather than a decision-making tool, where the individual patient risks should be put into context by experienced clinicians and fuel multidisciplinary treatment approaches.
Knowledge about individual risks of mortality shortly after surgery can support the MDT-conferences in making individualized treatment plans, which takes individual risk factors into consideration. This personalization of treatment to individual risk profiles may limit both over-and undertreatment and consequences thereof.
A significant limitation of this study is the lack of external validation, which is essential for proving model generalizability and has been shown to improve clinicians' trust in the model and its predictions [27]. Also, due to the complexity in the treatment of colorectal cancer and the multitude of different variables in DCCG, some variables may be proxy for outcomes or actions in the patient course, which can lead to multicollinearity [28]. However, this is partly addressed using LASSO logistic regression, which considers whether or not multicolleniarity seems to occur between variables and downscales their predictive weight [29].
The strengths of this study are the development of a prediction model based on a large national database including more than 95% of all patients with the condition and the model is only including data known prior to surgery, making the model available as a clinical decision support in the preoperative setting. The utilization of OMOP-CDM eases future external validation and enrichment of data from other databases.
In conclusion, we found that designing a short-term postoperative mortality model for outcomes after colorectal surgery using a data-driven approach and utilizing only covariates known prior to surgery is feasible and led to models with excellent discrimination and good calibration.
Acknowledgements The authors thank The Danish Clinical Quality Program and Danish Colorectal Cancer Group for access to their outstanding data. We also sincerely thank the European Health Data Evidence Network (EHDEN), edenceHealth, and Computerome for contributions and support during the project. Lastly, we thank Peter Rijnbeek and Iannis Drakos for sparring and advice in the process.

Conflict of interest
The authors declare no competing interests.