Machine Learning Prediction of Length of Stay in Hand Injury Patients Undergoing Emergency Surgery


 Background: The In-hospital length of stay (LOS) is expected to increase as the complexity of hand injuries increases. This will impact healthcare systems, especially given the current situation of reduced bed capacity and rising costs. Therefore, accurate prediction of LOS would positively impact healthcare interventions. This research aims to develop a machine learning-based model to predict the length of hospital stay in patients with hand injuries.Methods: Patients who underwent emergency surgery with hand injuries were selected from the hand surgery department of Wuhan Union Hospital. Prolonged LOS was defined as a LOS of more than or equal to 9 days. Data were analyzed using Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGboost) in R software with glm, caret, and random forest packages. Prediction accuracy and area under the curve (AUC) were calculated. Results: A prospective cohort with 3858 patients with hand injuries were identified from 2016 to 2020. The area under the curve (AUC) , accuracy in the test set were LR(0.805,0.760), NB(0.786,0.715), RF(0.883,0.809), SVM(0.821,0.752), and XGboost(0.851,0.763), respectively. Multivariable regression indicated that longer operation time, lower hemoglobin (HGB), vascular injury, bone injury, general anesthesia, payment (commercial insurance, self-paying), and tourniquet use were linked to longer LOS. The random forest model performed best among these models and was developed into an encrypted web-based interface: https://union.shinyapps.io/PredictLOS/.Conclusions: We showed that machine learning methods accurately predict LOS for hand injury patients. The model we developed can be used in clinical bed management and resources allocation.


Background
According to the US National Center for Health Statistics 1 , injuries to the hands and wrists are common in emergency departments, accounting for 14.4 percent of 29.9 million injury-related emergency department visits nationwide in 2016. Due to the volume of injuries sustained annually, hand and wrist injuries rank rst in the order of most expensive injury types, before knee and lower limb fractures, hip fractures, and skull-brain in jury 2 . The number of hand injured patients also shows high volatility due to various factors, including weather 3 , calendar data 4 , etc. Coupled with the limited number of beds, the Department of Hand Surgery must operate at full capacity during certain admission seasons, resulting in great di culty in allocating hospital resources.
LOS is a simple and commonly used metric for measuring resource allocation and healthcare costs in health care. Accurate prediction of LOS is critical for capacity management, resource planning, and sta ng levels. Several factors have been studied as predictors or risk factors for LOS in trauma, including trauma severity factors 5,6 and in ammatory response factors [7][8][9] , etc. In addition, a variety of different statistical algorithms have also been used to develop optimal LOS prediction models, as shown in a review 10 that summarized 74 published studies of LOS in hand surgery 11,12 , thermal burns 13 , neonatal units 11,14 , and intensive care units(ICU) 15 , etc. Regression has been the most popular prediction method among these models, but it is gradually being supplanted by more sophisticated techniques such as machine learning 10 .
Machine learning is a branch of arti cial intelligence and is widely used in healthcare due to its advantages in discovering knowledge from high-dimensional data. In this report, we compare the performance of regression model and four commonly used machine learning models in predicting LOS.
We also developed an online web application based on the best models to support clinical decision making. To our knowledge, This is the rst study to apply machine learning to LOS prediction in patients with hand injuries. The results will provide clinicians and policy makers with needed insights.

Study Population and Data Source
A total of 3858 patients with hand injuries who underwent emergency surgery at the Department of Hand Surgery, Wuhan Union Hospital between January 2016 and January 2020 were analysed. Patient data and attributes were extracted from electronic medical records (EMRs) by experienced physicians and trained nurses. The inclusion criteria were: (1) patients older than 18 years and younger than 90 years, (2) patients who underwent emergency surgery for hand and wrist injuries, (3) patients with complete clinical records, and (4) patients discharged after standard treatment procedures.

Variable de nition and collection
A prolonged LOS is considered more than or equal to nine days. Hospitalizations that were shorter than nine days are considered short LOS. Factors that may be relevant to LOS were in this study, including age, gender, history of alcohol abuse, history of smoking, diabetes mellitus, hypertension, insurance status, intraoperative discovery, time from injury to surgery (time to surgery), and preoperative blood test results.
To avoid pain, patients with hand injuries are generally not evaluated further before surgery after determining the need for surgery. Therefore, intraoperative anatomic ndings were selected as predictors, including bone injury, muscle injury, nerve injury, tendon injury, and vascular injury. Insurance status was categorized into three groups: commercial insurance, national medical insurance, self-paying. Finally, white blood cell (WBC), red blood cell (RBC), neutrophil count (NE#), and hemoglobin (HGB) were selected from the preoperative blood test to assess the degree of blood loss and in ammatory response of the body.

Feature Selection
Feature selection can reduce the dimension of the feature space and the complexity of modeling 16 . In this paper, we rst excluded signi cantly collinear variables based on the correlation coe cient between each variable. Next, we propose an embedded feature selection scheme that ranks features based on the feature importance values derived from the Random Forest. Then, the lower-ranked features are removed one by one, with 10-fold cross-validation of the performance of the selected feature group in each round. In addition, we also performed the least absolute shrinkage and selection operator method (LASSO) with 10-fold cross-validation to further test the relationship between the selected features and the performance of the linear model.

Linear Regression Model
Linear Regression (LR) is a standard statistical generalized linear model method used in data mining, automatic disease diagnosis, economic forecasting, and other broad applications 17 . The algorithm is essentially a conventional two-category model, with the object's category determined by inputting the object's attribute sequence. To classify the data, the model assumes that the data follows the Bernoulli distribution and employs the method of maximizing the likelihood function to solve the parameters with gradient descent. In our study, a multivariable LR model was built using the function glm of the R package stats. Odds ratios were calculated for each risk factor by exponentiating the LR coe cients. Odds ratios less than 1.0 indicate a decreased risk, while odds ratios greater than 1.0 indicate an increased risk, and the p-value of <0.05 was considered signi cant.

Machine Learning Models
Four classical machine learning models with ve-fold cross-validation were developed to predict LOS, namely Naive Bayes (NB), Random Forest (RF), Support Vector Machines (SVM), and XGboost. All machine learning models were built using the randomforest, caret, and XGboost packages in the R programming language (version 3.3.1). As an extra precaution, the models were trained and tested by two team members (L.Z.Y. and K.W.) to ensure that the models were not inadvertently trained with withheld test data.

RF
The random forest method is a machine learning technique that mixes many decision trees to create a single classi cation model. The random forest approach generates a forest of multiple decision trees by selecting various dividing characteristics and training samples. When predicting unknown samples, each tree in the forest is trained to make decisions, signi cantly increasing prediction accuracy compared to a single decision tree. After statistically assessing the decision outcomes, the classi cation with the most votes is recognized as the o cial classi cation result. NB NB classi er is a highly scalable supervised learning technique. Bayesian reasoning is based on probability to derive conclusions about the ideal decision's probability distribution. It has been effectively implemented in various scienti c areas and consistently performs well even when a few variables are considered. SVM SVM is a machine learning approach that is based on statistical learning theory. A support vector machine aims to minimize generalization error by creating a hyperplane in a high-dimensional space and utilizing a maximum margin to separate feature vectors belonging to distinct classes. When a support vector machine is used for linear classi cation, an n-1 dimensional hyperplane is used, where n is the dimension of the data.

XGboost
XGboost is one of the most extensively used machine learning classi ers in bioinformatics. It is based on a tree model that classi es using a boosting method. Regularization elements are added to the cost function to minimize the model's complexity and prevent over tting. Additionally, the parallel computing function is enabled by the XGboost algorithm, which signi cantly accelerates calculation.

Model evaluation and validation
We evaluated the predictive performance of the models in two ways: calibration plots, which represent the agreement between the predicted likelihood and the actual likelihood; AUC and ROC curves, which evaluate the classi cation performance of each speci c model.

Basic Characteristics
A total of 3858 patients with hand injuries who had undergone emergency surgery were included in the study. A total of 20 features were also extracted from the EMRs. The baseline information of the patients at the time of enrollment is shown in Table 1. Their mean age was 43.85 ± 12.48 years, and 18.7% were female. The LOS classes included 40.6% and 59.4% for short and long, respectively. Of note, there was a signi cant difference between patients with prolonged LOS and short LOS in terms of age (p= 0.048), operation time (p < 0.001), and time to surgery (p < 0.001). At the same time, the proportions of low HGB, low RBC, diabetes mellitus, hypertension, general anesthesia, tourniquet use, and complicated injuries (bone, muscle, nerve, and vascular) were higher in the prolong-LOS group than in the short-LOS group (all p < 0.05). These patients were randomly divided into a training group (n=3087) and a test group (n=771) ( Table 2). In the training and validation sets, the distribution of baseline characteristics was similar in both, and no signi cant difference was found between the two cohorts, indicating a high value of internal validation. Figure 1 illustrates the work ow to build machine learning models.

Feature Selection
We excluded RBC, NE#, and muscle injury based on the correlation coe cient, which appears to be collinear with others ( Figure 2). For the remaining 17 variables, we included them in the random forest algorithm for feature selection. After ve rounds of 10-fold cross-validation, it is found that the model is generally stable when the number of variables is greater than 8, and the average error rate reaches the lowest value when the number of variables is reduced to 8 (Figure 3). Therefore, the rst eight variables were selected in order of importance ( Figure 4). In addition, we conducted LASSO regression for feature selection, and seven coe cients were determined as the best combination according to the 1SE rule (selecting the highest lambda whose error was within one standard error of the optimum) ( Figure 5, Figure 6). Given the stable performance of the two models with more variables and considering the different characteristics of linear and nonlinear models, we decided to include the union of the variables selected by the two methods. Finally, nine variables were selected for model training: operation time, age, HGB, time to surgery, vascular injury, anesthesia, payment, bone injury, tourniquet use.

Model Comparisons
The predictive performance of the machine learning and LR models is shown in Table 3. All models performed well in terms of AUC, of which the RF model had the highest AUC of 0.883 ( Figure 7). Notably, the RF model also had good clinical applicability with an accuracy of 80.93%, sensitivity of 84.03%, speci city of 78.82%, positive predictive value (PPV) of 73.06%, and negative predictive value (NPV) of 87.83% (Table 3). NB (AUC-ROC=0.786) signi cantly underperformed as compared to other algorithms.
The calibration plots (Figure 8) showed that the predicted probabilities compared to the observed LOS showed excellent agreement for the RF model, followed by the XGboost and SVM models. The calibration of the NB model tended to overestimate the LOS over the entire range of prediction probabilities compared to the other models. The LR model appeared to have a good calibration except for the lower left, which under-predicted the probabilities.
Furthermore, risk factors for LOS used to t an LR model are summarized in Table 3. Factors such as longer operation time, lower HGB, vascular injury, bone injury, general anesthesia, payment (commercial insurance, self-paying), and tourniquet use increased the chances of prolonged LOS.
Predictive modeling interface RF is the best-performing algorithm for predicting LOS and was developed into an encrypted web-based interface that can be accessed at https://union.shinyapps.io/PredictLOS/.

Discussion
Prolonged LOS correlates with overcrowding in inpatient units and, as a result, lower quality of care and medical staff satisfaction 18 . Models that improve patient care while increasing e ciency for providers are urgently needed. This study investigated LOS factors and developed the rst ML model to predict LOS in hand injury patients undergoing emergency surgery.
Using computers to guide the treatment of critically ill patients is not a new concept. In a 1977 published study, computerized systems have been proposed to monitor ICU patients 19 . Compared to traditional methods, machine learning techniques could process multidimensional parameters simultaneously and were not constrained by data distribution 20 .
Presently, there have been many studies on surgery using machine learning [21][22][23] . Still, only a few studies applied machine learning models to predict the length of hospital stay in trauma patients. Some studies exploited the arti cial neural network to predict individual time to colorectal cancer surgery, and the highest AUC was up to 0.865 22 . Also, Houthooft et al. 24 trained a support vector machine model to forecast patient survival and length of stay using data from 14,480 patients. The model's area under the curve (AUC) for predicting a prolonged length of stay was 0.82. However, they did not compare multiple machine learning models' performances in predicting ability.
Our study successfully provided a novel strategy to predict the length of hospital stay in hand injury patients by machine learning models. Based on the results of ROC plots for four machine learning methods and LR methods, the RF model best predicted the LOS. And the three most important factors affecting RF's diagnostic performance were operation time, time to surgery, and age. In our study, longer operation time and older patients were associated with the prolonged LOS, indicating that the patient was in a poor physical condition, consistent with the previous study 25,26 . All the models were validated by the internal data and showed good performance. The results of ROC plots might contribute to better managing e ciency in the hand surgery department, including the cost of patients and the waiting time for newly-clinical admissions.
There were some limitations in this study. First, because it was a retrospective study, it might result in a potential selection bias. Second, our prediction models were not validated in the external cohort to assure the model's generalizability across centers. Thus, a multi-center much larger sample size prospective clinical research was required in the future.

Conclusions
The ndings of this study reveal that machine learning algorithms, speci cally RF, are excellent at predicting the length of stay for patients undergoing emergency surgery for hand injuries. These algorithms have signi cant clinical implications, as they offer patients and clinicians with a means of predicting the predicted length of stay following surgery. Speci c efforts can be made using this information to reduce hospitalization costs, hence lowering expenditures for patients and contributing to capacity management, resource planning, and sta ng levels.

Ethics approval and consent to participate
The ethics committee approved the study protocol of Wuhan Union Hospital of Huazhong University of Science and Technology.

Consent for publication
Informed consent was obtained from all participants.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests.

Funding
Not applicable.

Authors' contributions
Yuxiong Weng, Lizhao Yan, Kan Wang designed the work. Lizhao Yan and Kan Wang analyzed datasets. Fangxing Ai , Yu Kang and Nan Gao wrote this paper. All authors read and approved the nal manuscript.    Variable importance of features included in random forest algorithm.

Figure 5
Feature selection in the LASSO model used tenfold cross-validation via minimum criteria.
Red-dotted vertical lines were drawn at the optimal values using the minimum criteria (minimize the mean-squared error). The value 7 represents that 19 features were reduced to 7 nonzero features.

Figure 6
Coe cients of the multidimensional binary LR model. ROC plots for four machine learning methods and the LR method.

Figure 8
Calibration curve of the four machine learning and LR models shows the relationships between observed and predicted LOS. Data are presented as mean and 95% CIs.