Predicting postoperative delirium after hip arthroplasty for elderly patients using machine learning

Postoperative delirium (POD) is a common and severe complication in elderly hip-arthroplasty patients. This study aims to develop and validate a machine learning (ML) model that determines essential features related to POD and predicts POD for elderly hip-arthroplasty patients. The electronic record data of elderly patients who received hip-arthroplasty surgery between January 2017 and April 2021 were enrolled as the dataset. The Confusion Assessment Method (CAM) was administered to the patients during their perioperative period. The feature section method was employed as a filter to determine leading features. The classical machine learning algorithms were trained in cross-validation processing, and the model with the best performance was built in predicting the POD. Metrics of the area under the curve (AUC), accuracy (ACC), sensitivity, specificity, and F1-score were calculated to evaluate the predictive performance. 476 Arthroplasty elderly patients with general anesthesia were included in this study, and the final model combined feature selection method mutual information (MI) and linear binary classifier using logistic regression (LR) achieved an encouraging performance (AUC = 0.94, ACC = 0.88, sensitivity = 0.85, specificity = 0.90, F1-score = 0.87) on a balanced test dataset. The model could predict POD with satisfying accuracy and reveal important features of suffering POD such as age, Cystatin C, GFR, CHE, CRP, LDH, monocyte count, history of mental illness or psychotropic drug use and intraoperative blood loss. Proper preoperative interventions for these factors could reduce the incidence of POD among elderly patients.


Introduction
Hip arthroplasty is a commonly performed surgical intervention for treating serious degenerative arthritis or fractures in elderly patients [1,2]. As the population is aging, the demand for hip arthroplasty continues to increase [3]. As Daiyu Chen and Weijia Wang have contributed equally to this work and share first authorship. one of the clinical challenges, postoperative delirium (POD) is a common and severe complication after hip arthroplasty, with a high incidence of 17.6% [3]. POD is a neurological complication represented as an acute and fluctuating disturbance in attention and awareness over time [4]. Previous studies indicated that POD is related to poorer outcomes, such as prolonging the length of stay, grievous infection, poor functional recovery, cognitive decline, future institutionalization, and even superior mortality [4][5][6][7][8], which is not conducive to early functional exercises and rehabilitation process [9].
Though the pathological changes of POD were unclear, some studies asserted that the incidence of POD could be significantly decreased if clinicians identify and intervene those high-risk elderly patients before surgery [10,11], and they claimed that addressing modifiable risk factors is an approach to reducing the incidence of POD [10]. Therefore, existing studies investigated the risk factors of POD, such as gender, age, malnutrition, comorbidity and drug factors, operation duration, etc., which were found to be related to POD [12][13][14][15][16][17][18][19]. Moreover, some laboratory test results, like C-reactive protein (CRP), Lactate dehydrogenase (LDH), Cholinesterase (CHE), Cystatin C, and other laboratory test results were risk factors for POD [20][21][22][23][24][25]. Based on these risk factors, various predictive models were explored to predict POD [21,22,26]. However, most existing studies used conventional statistical methods rather than machine learning (ML) methods. Recently, ML attracted growing attention in medical research [27][28][29]. A study used ML methods to establish a model to predict POD. However, it only focused on patients with microvascular decompression surgery and did not consider laboratory risk factors of POD [30]. Considering the significantly high occurrence rate of POD for elderly patients after hip arthroplasty, there is a call for developing clinical predictive models using techniques like ML.
Therefore, in this study, we employed ML models to predict POD for patients undertaking hip arthroplasty. We aim to assist the clinician in forecasting and implementing appropriate interventions to minimize the risk of POD in hip arthroplasty among elderly patients.

Methods
This study has been approved by the Ethics Committee of the First Affiliated Hospital of Chongqing Medical University (Ethics code: 2021-201) and conducted according to the principles of the Declaration of Helsinki. Due to the retrospective nature of this study, the requirement for informed consent was waived by the Ethics Committee of the First Affiliated Hospital of Chongqing Medical University. The overall workflow is illustrated in Fig. 1.

Patients
This study retrospectively reviewed elderly patients who received hip arthroplasty from Orthopedics of First Affiliated Hospital of Chongqing Medical University between January 2017 and April 2021.
Patients who underwent hip arthroplasty were enrolled in our study IF: (I) They were aged 60 years or older; (II) they were scheduled to undergo the first hip-arthroplasty surgery; (III) their surgery was elective and included unilateral or bilateral hip arthroplasty; (IV) they underwent general anesthesia.
Patients were excluded from our study IF: (I) they had incomplete medical records with more than one missing laboratory indicator; (II) they were diagnosed with preoperative delirium; (III) they received an unclear diagnosis of postoperative delirium.

Diagnosis of POD
Since January 2017, collaborating with the Departments of Orthopedics, Nutrition, and Anesthesiology, we have established an Enhanced Recovery After Surgery (ERAS) team. Two experienced orthopedic nurses with training in the assessment of delirium joined the ERAS team to assess POD. They participated in the clinical evaluation of all joint arthroplasty patients and used the Confusion Assessment Method (CAM). They diagnose POD to decide whether patients had POD on the previous day and the first, second, and third post-operation days of joint replacement.

Data collection
We collected preoperative and intraoperative clinical data and laboratory tests of arthroplasty patients. All records were carefully reviewed by experienced clinicians to avoid missingness. The categorical features had been encoded by 0/1 binary form, and the continuous feature had been standard scaled following the common preprocessing methods. The preoperative clinical data included gender, age, smoking (continuously or cumulatively for 6 months or more), drinking (continuously or cumulatively for 6 months or more), diabetes, hypertension, coronary heart disease (CHD), history of mental illness or psychotropic drug use [including all disease combined with psychiatric manifestations, such as cerebral vascular accident (CVA), neurological conditions and psychiatric disorders], arrhythmia (abnormality of the frequency, rhythm, origin, conduction velocity, and activation sequence of cardiac impulses), ejection fraction (EF), left ventricular diastolic dysfunction (LVD, diagnosed by ASE/EACVI Guidelines and Standards) [31].
Intraoperative data included blood transfusion, intraoperative blood loss, operation duration, and surgical grade.

Feature selection method and machine learning model
To identify the most significant features from the collected data to predict POD, we proposed a two-stage framework, including feature selection and machine learning prediction. Correlation analysis was performed to investigate the correlations among features. In this study, we systematically investigated 9 feature selection methods and 11 classifiers. Therefore, we considered 99 combinations of feature selection methods and classifiers in total. In addition, we also  considered all cases of the different number of features from 2 to 39 in feature selections. After the evaluation using the cross-validation, we utilized the mutual information (MI) based method to select the most significant features among the 39 variables in predicting POD. MI is rooted in information theory. Generally speaking, MI measures how one variable can eliminate the uncertainty of another variable, namely the mutual dependence, which gives an insight into the relationship from the information perspective. A larger MI indicates more significant dependencies [32]. More recently, MI has been widely employed before machine learning or deep learning (DL) [33][34][35]. We computed MI values between the considered features and the POD. The features were ranked according to the MI values. We extracted the top features with the highest MI values as the selected factors for the subsequent binary classification in the prediction of POD. We considered machine learning algorithms like the random forest, decision tree, support vector machine, AdaBoost classifier, and several other methods to construct the predictive model. After the cross-validation, we chose the model with the best performance. In this study, we utilized the binary classifier logistic regression (LR), which has been widely adopted in classification tasks [36,37]. Considering that the LR belongs to a kind of regression model, we applied the variance inflation factor (VIF) calculation as the collinearity judgment [38]. The prediction model should be built with features with low collinearity. The ML models were trained using the train-validation set (POD = 66, non-POD = 370) and tested using the balanced test set (POD = 20, non-POD = 20), respectively. First, feature selection was conducted to select leading features using the train-validation set. Then, the train-validation set was randomly divided into three equal subsets for cross-validation processing. After the ML models were trained using the three cross-train sets, the trained models were evaluated on each validation set. The model we finally selected was the combination of the ML method that achieved the highest mean AUC performance with the corresponding selected features on three validation sets. Before the evaluation on the test set, we re-trained the selected model with the train-validation set.

Statistical analysis
All continuous variables were conducted for the normality test. Continuous variables with normal distribution were presented as mean ± standard deviation (SD), and non-normal distribution variables were reported as the median and interquartile range (IQR).
All computation, analysis, and visualization of the ML model were developed in Python Matplotlib (3.4.3) were employed. All codes with the above functions were deployed on a typical ML server (Intel ® Xeon ® Silver 4110 CPU at 2.10 GHz, Debian GNU/Linux 10) with NVIDIA GeForce RTX 3090 GPU (CUDA 11.3). It is worth mentioning that a conventional computer rather than the advanced server, we used can also be enough to train and evaluate the ML models in an acceptable time.

Results
A total of 476 patients who underwent arthroplasty surgery in our hospital were included in the study. Among them, 86 patients (18.1%) suffered from POD. Briefly, the data information of the patients with POD and those with non-POD are summarized in Table 1.
We collected data from 476 patients after hip arthroplasty to assemble the dataset (POD = 86, non-POD = 390). After correlation analysis, the dataset was randomly divided into two independent subsets, namely, one trainvalidation set (n = 436) and one test set (n = 40). To avoid bias introduced by an imbalanced test set, we randomly selected the same number of POD and non-POD cases to assemble the balanced test set (POD = 20, non-POD = 20).
Correlation analysis was conducted on all the features. The correlations among features were illustrated as a heatmap, as shown in Fig. 2. We calculated the pairwise correlation coefficients (CC) for all the variables (39 features and POD label). The correlations (max = 0.71, min = − 0.32, mean = 0.04, SD = 0.13) indicated that there were no significant correlations among the considered features. We further calculated the VIF for the features (see SI Table 1). We found most features showed insignificant collinearities, namely VIF values of 31 features out of the total 39 features were less than 5, indicating small collinearities among the features.
To investigate the feature combination from all 39 variables, we computed the weights for all features. Figure 3 plots the normalized MI weights of the selected ten most significant features. Age was the most important feature, followed by Cystatin C, GFR, CHE, CRP, LDH, Mg 2+ , MONO, History of mental illness or psychotropic drug use, and Intraoperative blood loss. As shown in SI Table 1, the VIF value of Mg 2+ is much bigger than others. In the published research, age is widely reported in recent studies. We trained and evaluated the classifiers and achieved the best performance with the LR classifier in cross-validation. The LR classifier was trained using the selected features without Mg 2+ . The regression coefficients are summarized in SI Table 2.

Discussion
As a severe complication of surgery for elderly patients, the pathophysiologic mechanism of POD was still not been sufficiently elaborated so far. Moreover, major surgeries like hip arthroplasty are more likely to occur with POD [9]. Various studies suggested that pharmacological interventions may have no distinct effect on POD, but managing some modifiable risk factors might lead to a positive outcome in avoiding POD [10,39]. Hence, previous studies investigated risk factors of POD for various patients with neurosurgical procedures, cardiac surgery, orthopedic surgery, and vascular surgery [40][41][42][43]. Our study focused on POD for elderly patients undertaking hip arthroplasty, and found age, Cystatin C, GFR, CHE, CRP, LDH, MONO, history of mental illness or psychotropic drug use, and intraoperative blood loss were crucial features of POD.
Despite our study excluding patients under 60 years, we found that age is still a critical POD feature in elderly patients. The results revealed that the incidence of POD would increase along with age, and this agreed with existing studies in which age was also identified as a risk factor for POD [15,17,44]. As is well known, POD is an acute and reversible cognitive disorder [25]. Hence, advanced age patients with cognitive dysfunction are potentially at risk for POD. Elderly patients suffer from cognitive impairment, with a high occurrence rate of 53.5% [45]. Some of these patients need to use psychotropic drugs to control their symptoms. We need to take care of patients with a history of mental illness or psychotropic drug use to the occurrence of POD. Current studies reveal that dementia, Alzheimer's, and other neurological disorders can also be considered risk factors of POD [46], and they suggest avoiding exacerbation of preoperative psychotic symptoms and using benzodiazepines [47,48]. Besides, dexmedetomidine might be considered to decrease the risk of POD [49,50].
Additionally, advanced age signifies the functional decline of multiple organs or systems, and transient decline or injury of organ function may also lead to the progression of POD. From a systems neuroscience perspective, cholinesterase imbalances were commonly connected with POD in recent years with the development of neurocognitive research [51]. Intercept of the acetylcholine muscarinic receptor can result in some symptoms of POD like hallucinations, confusion, and cognitive disorder [52]. Some studies have confirmed a decline in the content of acetylcholine in the cerebrospinal fluid of POD patients [20]. And several studies reported that systemic inflammatory response could lead to POD [25]. Pro-inflammatory cytokines were detected in the cerebrospinal fluid of patients with POD after orthopedic surgery [53]. The rise of CRP was found as a feature of POD, which has been confirmed in our previous studies [54]. Indeed, most hip-arthroplasty patients were in inflammatory statuses, such as osteoarthritis, rheumatoid arthritis, or fracture before operating. The combination of preoperative and intraoperative inflammation may result in aberrant stress responses under arthroplasty, which is vulnerable to POD [3]. Therefore, when patients are found to have elevated CRP preoperatively, clinicians should screen patients for inflammatory conditions other than arthritis and prevent respiratory and urinary system-related infections accordingly to reduce the patient's preoperative inflammatory response. MONO is a possible available peripheral marker of microglia's activation in a real-world setting, indicating systemic activation of the mononuclear phagocyte system and central inflammation [55]. The majority of present studies consider that central inflammation is associated with POD. The invasive procedure activated inflammatory cascade and accelerated peripheral and central nervous systems to release pro-inflammatory mediators. This inflammation affects the neurofunction of the aging brain, which is thought to be a mechanism of cognitive impairments associated with POD [56,57]. As a peripheral cell, MONO contributes to informing the changes in the central nervous system like central inflammation without accessing CSF (Cerebrospinal Fluid) or performing neuroimaging [58]. Another study supported that the decreased ability to release acetylcholine exacerbates systemic and cerebral inflammation [59], but it is not clear what role the relationship between acetylcholine imbalance and inflammation plays in the development of POD.
Elderly patients are usually accompanied by decreased hepatic and renal function. GFR and cystatin C can reflect the renal function of patients. Decreased renal function prolongs the duration of narcotic drug accumulation, which may be one reason for POD [23]. In the same way, a few narcotic drugs are also metabolized by the liver, and reduced hepatic function increases the effective time of narcotic drugs. Hepatic function injury can also result in hyperammonemia, which modulates the impact of ammonia on the brain [60]. In general, when laboratory indicators suggest liver and kidney function decline, organ function during the perioperative period should be considered to protect. Many measures can be adopted, including changing the diet during the perioperative period, avoiding using drugs that increase the burden on the liver or kidneys, and taking measures for organ protection during the operation.
Studies have shown that intraoperative blood loss was associated with POD [61,62], and increased blood loss means an increasing incidence of hypovolemia or hypohemoglobinemia. Intraoperative blood loss may also reduce cerebral blood flow, and then the oxygen supply to the brain is further reduced during anesthesia. Eventually, brain function becomes temporarily and reversibly impaired. However, it must be clear that the intraoperative blood loss in our study just represented the overt bleeding, and actually, hidden hemorrhage was difficult to assess. Accordingly, the anesthesiologist promptly assesses the bleeding situation and decides whether to transfuse blood to improve the patient's circulatory status and thus reduce the negative effects of intraoperative blood loss.
In this study, we adopted the ML method to predict POD. To deal with the feature engineering issue, we proposed the two-stage ML framework, namely conducting feature selection to identify leading features before applying ML classifiers. This approach was adopted in other feature-based ML classifications in medical studies [63][64][65]. In the feature selection, too many features might lead to poor interpretation and difficulty in clinical practice, while too few features might lose information and hinder the prediction performance of different ML methods. Therefore, different combinations of selected features and classifiers should be systematically investigated. The number of features selected was critical for later analysis. We evaluated cases of different numbers of features. The strengths of the ML approach were twofold. First, based on ML methods, we could combine feature selection and classification in one unified ML framework. Second, ML was a data-driven approach that could be further improved by training with more cumulated data. Therefore, our proposed ML framework for POD prediction was superior to conventional statistical methods. From the perspective of methods, it is possible to first evaluate the correlations of features and POD status, select highly correlated features, further calculate VIF values to evaluate the collinearities, and select those features with small VIF values for later classification. However, our work provided a unified ML framework to conduct feature selections and classifications, becoming common in feature-based clinical studies based on ML [66][67][68][69].
This study still had some limitations. First, due to the limited data of a single center, this study does not include postoperative laboratory indicators, which makes it impossible to observe perioperative changes in elderly patients with joint replacement. Second, we only use CAM to assess delirium for patients 3 days after the operation, but POD assessment should have other scales and contain the 7 days after the operation, which has missed some POD patients. Third, we only included patients who underwent general anesthesia, and the effect of anesthesia modality on postoperative delirium needs to be further explored. Finally, in this study, we developed the feature selection method MI and classifier LR in a dataset that included 476 patients (POD = 86, non-POD = 390), which is limited due to the data availability. In the future, multicenter studies could be conducted to collect more data, especially the POD samples. In fact, we have obtained a 3-year research project (CSTB2022NSCQ-MSX0854) to further collect data and investigate this topic. With more data, we could train the ML framework for better accuracy, generalization, and robustness of the prediction system. Lastly, our model was developed using feature data, which was a single modal data. It would be interesting to further include multiple modal data, such as medical images, to develop deep learning prediction models.

Conclusion
This study developed a ML framework to identify leading features of POD and achieved promising performance in predicting POD in hip arthroplasty for elderly patients. The ML preoperative POD prediction could enable clinicians with computer-assisted clinical decision-making to prevent POD and improve prognosis.

Supplementary Information
The online version contains supplementary material available at https:// doi. org/ 10. 1007/ s40520-023-02399-7. Data availability The datasets can be available from the corresponding author on reasonable request.