In summary, the repeated machine learning models used in the present study showed the accuracy of ranged from 0.65 to 0.69 and the AUROC ranged from 0.73 to 0.76 for the diagnosis of AKI without available baseline SCr. On the other hand, the single machine leaning models showed the accuracy ranged from 0.53 to 0.74 and the AUROC ranged from 0.70 to 0.74 for the diagnosis of AKI without available baseline SCr. These findings suggest that repeated machine learning models exhibit superior accuracy and better predictive value for AKI diagnosis. In addition, while the single machine learning models did not exhibit better accuracy and predictive value in balanced testing dataset (method 2, trial 2), it did show better performance in testing dataset post exclusion of patients with uncertain AKI status (method 2, trial 3). Remarkably, with available past SCr records, the Computerized algorithm showed superiority in every index compared to either repeated or single machine learning models. Among the repeated machine learning models, all machine learning models tested in this study showed similar performance. Among the single machine learning models, RF, XGBoost, and GB showed superior performance than other machine learning models being tested.
Since the diagnosis of AKI is based on a 7-day increase of SCr [17], computerized algorithms can accurately make the diagnosis of AKI in patients with available baseline SCr or a recent record of SCr. Nevertheless, for patients without such reference SCr values, the diagnosis of AKI is difficult for computerize algorithms and even for clinicians. In the present study, we tried to cross this obstacle by using machine learning models to identify AKI events based on point-of-care features of patients presented with abnormal SCr. Remarkably, all the patients included had abnormal SCr values and thus the function of our models are actually to distinguish AKI events from preexisting CKD. To date, the application of machine learning in the care of AKI has mainly focused on the prediction of AKI. Thus, such AKI prediction models with short time windows could be compared with our AKI diagnosing models as references [19]. In an AKI prediction model for all-care setting conducted by Cronin et al. in 2015, pre-admission laboratory tests of -5 days to + 48 hours from admission date were obtained from more than 1.6 million hospitalizations for model training. They found that the models (LR, LASSO regression, RF) exhibited AUROC of 0.746–0.758 for the prediction of in-hospital AKI events [20]. In another study, He et al. tested machine learning models differentiating AKI at different prediction time windows. Their models exhibited AUROC ranged 0.720–0.764. Among the tested models, the best model performance was achieved by predicting AKI one day in advance [21]. A similar study by Cheng et al. tested different data collection time windows for training datasets of AKI prediction models. The results suggested that RF algorithm showed the best performance of AKI prediction at 1–3 days in advance with AUROC of 0.765, 0.733, and 0.709, respectively [22]. Compared with these studies, our repeated machine learning models exhibited AUROC ranged 0.73–0.76, depending on the training model used, showing that repeated machine learning could exhibit similar performance with different model algorithms and is comparable to machine learning models with large training data.
Although we compare the present AKI diagnosis model to AKI prediction models, the difference between these two models does exist. In the work by Koyner et al., which was also conducted in all-care setting, they tested the models with and without change in SCr from baseline. The results showed that excluding “change of SCr” from inputted features did not affect the model’s ability on AKI predictions [23]. On the contrary, in the present study, SCr is an important feature to be inputted into the machine learning model of AKI diagnosis, regardless of the algorithm used. The cause of this difference may be that AKI prediction relies more on the severity comorbidities rather than existing abnormal SCr reading. On the other hand, the SCr value at the point-of-care is an important feature for the identification of AKI events.
Among studies of developing AKI prediction models, the researchers have been looking for the best machine learning algorithm for predicting the risk of coming AKI events. In an AKI prediction model developed with 1.6 million training dataset by Cronin et al. in 2015, they found that the performance of traditional LR and LASSO regression models were slightly superior to that of RF model [20]. In the work of Kim et al. in 2021, in which they intended to develop a continuous real-time prediction model for AKI events, recurrent neural network algorithm was found to be most suitable for predicting AKI events 48 hours in advance [24]. In the single machine learning models of the present study, we found that RF, XGBoost, and GB algorithms exhibited superior performance for the diagnosis of AKI. Nonetheless, in the case of repeated machine learning models, the difference between different algorithms was not obvious. This finding suggests that with repeated training, the performance of different machine learning algorithms may approach to a consistent level.
In the work of Yue et al., they built a machine learning model for AKI prediction in patients with sepsis, in which the most important model features included urine output, mechanical ventilation, body mass index, estimated glomerular filtration rate, SCr, partial thromboplastin time, and blood urea nitrite [25]. In addition to features directly related to renal function, features related to general disease severity also weigh importantly in this model of sepsis-related AKI. In the present model, which was design for all-care setting, the features related to sepsis, such as lymphocyte fraction, WBC count, platelet count, pulse rate, SBP, and GOT also took important roles. This finding suggests that in the context of all-care setting, sepsis is the most important cause of AKI in hospitalized patients.
As electronic diagnostic tools have been applied to decision support and electronic alert systems of AKI, these studies showed heterogenous design of systems and revealed mixed results [26]. In the past studies, electronic AKI alert systems have shown acceptable accuracy and applicability [27, 28]. Furthermore, Hodgson and colleagues showed that their electronic AKI alert system reduced the incidence of hospital acquired AKI and in-hospital mortality [29]. On the contrary, a study by Wilson et al. enrolling 6030 patients showed that electronic AKI alert system did not reduce the risk of primary outcome with heterogeneity of effects across clinical centers [30]. The results of the present study suggest that while electronic diagnostic tools may improve the accuracy of AKI diagnosis, timely differential diagnosis and management are necessary to achieve better outcomes.
The limitation of the present study was relatively small sample size, especially the testing dataset. On the other hand, regarding the all-care setting of the present study, our machine learning models may be applicated to hospitalized patients admitted to both critical care units and general wards.
In conclusion, the machine learning models were able to diagnose AKI without available baseline SCr records. In addition, the machine learning models for AKI diagnosis in the present study showed superior accuracy than clinicians did. We also found that the repeated machine learning models showed more consistent and superior performance than the single machine learning models. Remarkably, the computerized AKI diagnostic algorithms showed superior accuracy than machine learning models in the context that baseline SCr is available. As a result, these two different approaches may be combined to build a more comprehensive electronic AKI diagnostic system in future.