Patient selection
We retrospectively collected patient data from two institutions. The patients who underwent their first burr hole surgery between June 2017 and March 2021 at Handa City Hospital were included into the internal cohort. Patients data from Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital was used for external validation dataset. All patients had a preoperative diagnosis of CSDH based on a plain head CT scan or brain magnetic resonance imaging. The exclusion criteria were patients with a previous history of craniotomy, CSDH surgery, shunt surgery, and those with subdural abscess.
Operative procedure and perioperative management
All patients underwent hematoma evacuation via single burr hole craniostomy under local anesthesia. There are three types of surgical procedures; irrigation, simple drainage, and combination. During an irrigation procedure, a flexible rubber tube was inserted into the hematoma area to drain the fluid and consequently irrigate the subdural space with normal saline solution. Simple drainage procedure requires placing a drainage tube in the subdural space. Whether to place a drainage tube after irrigation depended on the operators’ discretion. Postoperatively, patients with good postoperative neurological status were discharged from the hospital on postoperative days 3–7. Patients with speech or physical difficulties were enrolled in an in-hospital rehabilitation program. Those who could not return to home due to physical difficulties or comorbidities were transferred to a rehabilitation hospital or a long-term nursing facility.
Data collection
Clinical findings, including age, sex, medical history, neurological symptoms, preoperative CT scan findings, drainage tube placement, and surgical information, were collected using a retrospective chart review. Baseline blood samples were routinely collected from the patients upon hospital admission for performing blood tests. The type of collected laboratory data varied at the attending neurosurgeon’s discretion. The following characteristics of hematomas were obtained from the preoperative CT: volume, thickness, and internal architecture. Hematoma volume was calculated using the method described by Won S-Y[16]. Hematoma thickness was measured on axial images. When hematomas were seen bilaterally, their volumes were summed, as were their thickness. The internal architecture of hematomas was divided into two groups based on previous reports on preoperative hematoma characteristics and recurrence [17]. One group consisted of hyperdensity, separated, laminar, gradation, or trabecular types of hematoma density on CT scan. The other was hypodensity or isodensity types. Surgeons were classified as residents (with <5 years of training) and others (with >5 years of experience).
Functional outcomes of patients were assessed upon discharge using the modified Rankin scale (mRS) [18]. This scale was originally developed for patients with stroke. It is also used for the evaluation of difficulties in performing daily activities in patients with CSDH [19,20]. The mRS is an ordered scale that runs from 0 (no symptoms) to 5 (severe disability) and 6 (death). The scores between 0 and 5 represent no significant disability (1), slight disability (2), moderate disability (3), and moderately severe disability (4). This study defined the favorable functional outcome as an mRS score of 0–2 based on previous studies [3,21]. In contrast, an unfavorable functional outcome was defined as an mRS score of 3–6, indicating the inability to manage daily activities without assistance.
Statistical analyses
Statistical analyses were performed to examine the association between each patient parameter and postoperative functional outcome. Univariate analysis of the dataset was conducted using R version 4.1.2 (R Foundation for Statistical Computing, Vienna, Austria) and RStudio version 2021.09.0+35 (Boston, Massachusetts: RStudio, Inc. 2021). The Mann–Whitney U test was used to identify perioperative variables associated with an unfavorable outcome. Fisher’s exact test was used to assess the categorical variables. P values of <0.05 were considered significant. To prevent data contamination, the statistical analysis results were not used in constructing the machine learning prediction models described below.
Dataset creation
Data cleaning was conducted using the dataset with R and RStudio. Demographic characteristics and laboratory data were extracted from the patients in the internal cohort as input variables for the internal dataset. Variables with missing values above 8% of all subjects were excluded as input features. Information about operative procedure and postoperative Goreisan usage were excluded from predictors because it was unavailable before surgery. Categorical variables were converted to binary features with one-hot encoding. For the external validation dataset, the same variables were collected from the patients in the external cohort.
Machine learning and prediction model development
Data splitting and normalization
The internal dataset was randomly split into the training and test datasets with a split ratio of 3:1 [22]. The process was performed in a stratified fashion on Python version 3.9.2. Subsequently, the variables were normalized with StandardScaler (default setting) in the Sklearn library.
Supervised machine learning approach
The machine learning framework was written on Python version 3.9.2 using standard machine learning libraries. The framework comprised three steps (feature selection, data balancing, and training of classifiers) using the training dataset.
Variables were selected using two methods: the filter and wrapper methods. As one of the standard filter methods, univariate feature selection with SelectPercentile was conducted using sklearn.feature_selection package. SelectPercentile is a method used to rank each variable and to extract variables according to the percentile of the highest scores. Percentiles of 5, 10, 15, 20, 50, and 100 were applied. Conversely, recursive feature elimination, one of the popular wrapper-type feature selection methods, was used. It fits a model (Random forest) and eliminates the least important variables until the specified number of features is reached [23]. The recursive feature elimination process was cross-validated using the training dataset to identify the optimal number of features utilized to build a machine learning model.
The outcome variable of the dataset was imbalanced (favorable, 67%: unfavorable, 33%); therefore, two data balancing techniques were applied. One is undersampling of most data under the condensed nearest neighbor rule. The other is the upsampling of the minority of the data. A commonly used approach—the Synthetic Minority Over-sampling Technique (SMOTE)—was applied [24].
Four machine learning classifiers (logistic regression, support vector machine (SVM), random forest, and light gradient boosting machine (light GBM)) were trained to predict unfavorable outcomes after CSDH surgery. Their hyperparameters were optimized to give the largest area under the curve of the receiver operating characteristic curve (ROC-AUC) via grid search with 5-fold cross-validation. These algorithms were selected to compare various machine learning methods based on different learning styles (logistic regression: regression, SVM: instance-based, random forest: bagged decision trees, light GBM: boosted decision trees).
Logistic regression is a traditional statistical method used for dichotomous classification, and it is a technique used to apply regression algorithms to classification tasks using an S curve to provide input values between 0 and 1. Logistic regression is now considered as one of the basic machine learning algorithms.
SVM is another machine learning method used for binary classification. Its name is derived from support vectors, which are the points closest to the line or hyperplane dividing the dataset. The distance between the support vectors and line/hyperplane is called the margin. SVM creates a boundary that maximizes the margin, thereby separating the two classes.
Random forest, a combination of decision trees, is a robust machine learning method invented in 2001 [25]. A large number of decision trees are created via random sampling using datasets. Subsequently, the majority vote of the predicted results for each tree is taken to determine the final predictive value. Random forest has a lower risk of overfitting than a single decision tree because it uses multiple trees to prevent the influence of variance or bias.
Light GBM is a relatively novel type of gradient boosting decision tree algorithm developed in 2017 [26]. A gradient-boosting decision tree builds decision trees (weak learners) one-by-one to minimize the error of the previous model. Light GBM is a modified version of the original gradient-boosting decision tree model by reducing computation time while maintaining prediction accuracy.
Evaluation of the machine learning models
After hyperparameters were fixed, the machine learning models were tested using the test data of the internal dataset. This evaluation was conducted independently of the training process. ROC curve analysis was conducted to evaluate the discrimination ability of the models. The accuracy, sensitivity, specificity, and f1 score of each model were compared at the optimal cutoff point as set by Youden’s index [27]. In terms of the interpretation of algorithm predictions, values of standardized beta coefficients of the logistic regression were calculated. The coefficients describe the size and direction of the association between the predictor and outcome. Moreover, the feature importance of the random forest and light GBMs were evaluated. Feature importance is expressed as a score assigned to features based on their contribution to the prediction model. The score was computed based on the total impurity reduction of splits (Gini importance). The best-performing model was validated using the external validation dataset for each type of the four machine learning algorithms.
Ethical statement
This retrospective study is approved by the Ethics Review Committee of Nagoya University Graduate School of Medicine (2021-0442). Since this study was noninvasive, the Ethics Review Committee of Nagoya University Graduate School of Medicine approved that the requirement for written informed consent from patients was waived, but the opt-out method was adopted in accordance with the Japanese ethics guidelines. A public notice regarding this study was given on the websites of Handa City Hospital and Japanese Red Cross Aichi Medical Center Nagoya Daini Hospital. This research was completed in accordance with the Declaration of Helsinki as revised in 2013.