Machine Learning Models for Point-of-Care Diagnostics of Acute Kidney Injury

doi:10.21203/rs.3.rs-4105584/v1

Download PDF

Article

Machine Learning Models for Point-of-Care Diagnostics of Acute Kidney Injury

https://doi.org/10.21203/rs.3.rs-4105584/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background

Computerized diagnostic algorithms could achieve early detection of acute kidney injury (AKI) only with available baseline serum creatinine (SCr). To tackle with this weakness, we tried to construct a machine learning model for AKI diagnosis based on point-of-care clinical features regardless of baseline SCr.

Methods

Patients with SCr > 1.3 mg/dL were recruited retrospectively from Wan Fang Hospital, Taipei. A Dataset A (n = 2,846) was used as the training dataset and a Dataset B (n = 1,331) was used as the testing dataset. Point-of-care features, including laboratory data and physical readings, were inputted into machine learning models. The repeated machine learning models randomly used 70% and 30% of Dataset A as training dataset and testing dataset for 1,000 rounds, respectively. The single machine learning models used Dataset A as training dataset and Dataset B as testing dataset. A computerized algorithm for AKI diagnosis based on 1.5x increase in SCr and clinician’s AKI diagnosis compared to machine learning models.

Results

The repeated machine learning models showed accuracy of 0.65 to 0.69. The single machine learning models showed accuracy of 0.53 to 0.74. The computerized algorithm show accuracy of 0.86 to 0.95. Clinician’s diagnosis showed accuracy of 0.52 to 0.57. The clinical features with leading impact on model output included blood lymphocyte, white blood cell, platelet, SCr, aspartate aminotransferase, systolic blood pressure, and pulse rate.

Conclusions

The machine learning models were able to diagnose AKI in the context of absent baseline SCr and showed superior accuracy than clinicians have.

Health sciences/Medical research

Health sciences/Nephrology

Acute kidney injury (AKI)

artificial intelligence (AI)

Chronic kidney disease (CKD)

creatinine

electronic alerts

intensive care units (ICU)

hospitalization

machine learning

Acute kidney injury (AKI) is one of the most important adverse events for hospitalized patients [1]. As kidney function is frequently affected by cardiovascular dysfunction [2], sepsis [3], autoimmune diseases, and circulatory collapse of various causes [4], AKI is an informative event that predicts in-hospital mortality and longer hospital stay [5]. Thus, early detection of AKI is a reasonable strategy to improve the outcomes of hospitalized patients [6, 7].

To achieve early detection of AKI, electronic AKI alert system is an approach with great prospect. In a multicenter cohort study, electronic AKI alert system showed the capability of improving renal function recovery in patients admitted to intensive care units [8]. In another multicenter study, the use of computerized decision support system was associated with reduced in-hospital mortality, less dialysis sessions, and decreased length of hospital stay [9]. In a prospective study conducted in Korea, electronic AKI alert system connected to automated nephrologist consultation showed that early consultation and intervention of nephrologist improved the likelihood of renal recovery of AKI in hospitalized patients [10]. Although electronic AKI alert system has not effectively changed clinical managements or improved the outcomes of AKI, it does show such potential if being continuously optimized [11–13].

Nonetheless, the constructing of electronic AKI alert system is not without obstacle. Such computerized algorithms detect AKI events based on increase in serum creatinine (SCr) > 1.5 times from baseline in 7 days [14, 15]. For patients without “baseline SCr in 7 days”, computerized algorithms have to make deduced diagnosis, which may result in lower diagnostic accuracy. To compensate for this weakness, machine learning models that detect AKI based on point-of-care clinical features may be a solution. Indeed, AKI is recognized as an ideal syndrome to apply artificial intelligence due to its standardized and readily identifiable definition [16]. Patients with AKI may present various clinical features including demographics, comorbidities, change in vital signs, or a diversity of laboratory findings [17]. Such features may be inputted into machine learning models to make point-of-care diagnosis of AKI without baseline SCr. As such, the purpose of the present study is to construct machine learning models in the context of absent baseline SCr in 7 days, which are based on clinical features at a single time point.

Study Design and Participants

This retrospective study was conducted in Wan Fang Hospital, Taipei Medical University, Taipei, Taiwan. The present study was approved by the ethics committee and Institutional Review Board of Taipei Medical University (N202111017) and the informed consent was waived. The entire study was conducted in accordance with the tenets of the 1975 Declaration of Helsinki, as revised in 2000. A training dataset (Dataset A) and a testing dataset (Dataset B) were. All patients were included into the study using simple randomization to avoid selection bias. As 26 features were inputted into the machine learning models, we planned to enroll more than 2,600 patients into the training dataset and more than 1,000 patients into the testing dataset.

For Dataset A, the hospitalized patients from January, 2018 to June, 2020 were randomly screened for the eligibility to be included into the training dataset. The inclusion criteria were as follows: (1 ) at least one SCr value of > 1.3 mg/dL during the hospitalization course and (2) age older 20 years. The exclusion criterium was that the patient had missing baseline SCr within 7 days before the indexed abnormal SCr. Then these patients with abnormal SCr were classified into AKI group and non-AKI group by a computerized algorithm for AKI diagnosis that will be described later. Finally, the patients of the AKI group and the non-AKI group were randomly balanced to equal number to form Dataset A (n = 1,423 in each group).

For Dataset B, hospitalized patients from July, 2020 to December, 2020 with (1) at least one SCr value of > 1.3 mg/dL during the hospitalization course and (2) age > 20 years were randomly selected to be included as the testing dataset. Notably, available baseline SCr within 7 days is not necessary for Dataset B. Then, the patients in Dataset B were classified into AKI and non-AKI groups according to KDIGO Clinical Practice Guideline for AKI, which was performed by the researcher nephrologists of the present study. AKI was defined to be present as one of the followed criteria were met: 1. For patients with available baseline SCr values within 7 days before the indexed abnormal SCr, increase of SCr > 1.5 times satisfied the diagnosis of AKI. 2. For patients with available SCr values more than 7 days before the indexed abnormal SCr, the nearest previous SCr value was assumed to be the baseline SCr and increase of SCr > 1.5 times above this baseline value satisfied the diagnosis of AKI. 3. For patients without available previous SCr values, the patient was assumed to have normal baseline SCr and AKI was defined to be present arbitrarily. Finally, Dataset B contained 334 patients in the AKI group and 997 patients in the non-AKI group.

The features of the machine learning models

The features used in the machine learning models were gender, age, laboratory and physical readings obtained at the time of admission. The laboratory readings inputted included SCr, Na, K, aspartate aminotransferase (AST), alanine aminotransferase (ALT), red blood cell count (RBC), hemoglobin, hematocrit (Hct), red cell distribution width (RDW-CV), white blood cell count (WBC), the fraction of neutrophil, lymphocyte, monocyte, eosinophil, and basophil, platelet count, and platelet distribution width (PDW). The physical readings inputted included respiration rate, systolic blood pressure (SBP) and diastolic blood pressure (DBP), oxygenation saturation (SpO₂), body temperature, pulse rate, body weight and body height. Before the initiation of running machine learning models, the association between the model features and AKI events were evaluated by using Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). Physical readings were measured by attending nurses at 7:00 on the day when the indexed laboratory data were obtained. Biochemistry data were measured by using Beckman Coulter DxC AU5800; hematology data were measured by using Beckman Coulter DxH 1601 (Beckman Coulter Inc Brea CA).

The computerized algorithm for the definition of AKI

The present computerized algorithm for the diagnosis of AKI had been validated in the author’s affiliation. In brief, in case an azotemic patient ( SCr > 1.3 mg/dL) with previous SCr value within 90 days was identified, increase of SCr > 1.5 times satisfied the diagnosis of AKI; in case an azotemic patient without previous SCr value within 90 days was identified, the baseline SCr was assumed to be normal and AKI was diagnosed arbitrarily; in case an azotemic patient with previous SCr of > 90 days before the indexed abnormal SCr, the nearest previous SCr was assumed to be the baseline and increase of SCr > 1.5 times satisfied the diagnosis of AKI. The program code was written by using Node.js 14.19.1 (OpenJS Foundation, San Francisco, CA, USA).

Researcher’s definition for AKI and clinician’s diagnosis

The definition of AKI was adapted from KDIGO Clinical Practice Guideline for AKI [18]. Once a SCr value > 1.3 mg/dL was identified, previous SCr values were reviewed. AKI was defined in the followed situations: (1) In patients with previous SCr tests within 7 days before the indexed SCr values, increase of SCr value of > 1.5 times defined AKI. Otherwise, AKI was absent. (2) In patients with previous SCr tests more than 7 days before the indexed abnormal SCr values, the nearest previous SCr value was assumed to be the baseline SCr and increase of SCr value of > 1.5 times above this level defined AKI. Otherwise, AKI was absent. (3) In patients without any previous SCr values, the patient was assumed to have normal baseline SCr and AKI was defined to be present. This researcher’s diagnosis was used as the standard diagnosis in the present study. Notably, the AKI criteria for decreased urine output in KDIGO Clinical Practice Guideline were not applied to the present study. The AKI diagnosis documented in the discharge summaries were taken as the clinician’s diagnosis. In case AKI was not included in the discharge diagnosis of patient with AKI event, the diagnosis was assumed to be inaccurate.

The development of machine learning models

In developing machine learning models, two approaches were used, which were named method 1 and method 2, respectively. For the method 1, only Dataset A was used in repeated machine learning models. For each round of machine learning, 70% of Dataset A was randomly selected to be training dataset and the other 30% was used as testing dataset. The same procedure was repeated for 1,000 rounds to obtain the average performance of the machine learning models. In each round, seven machine learning models, including Support Vector Machine (SVM), Logistic Regression (LR), Gradient Boosting (GB), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Naive Bayes classifiers (NB), Neural Network (NN) were used. For the method 2, Dataset A was used as the training dataset of machine learning for single machine learning models. Then, Dataset B, which had been confirmed by the researchers, was used as the testing dataset. Similarly, all seven machine learning models were used in each round of single machine learning. In the method 2, different testing approaches were conducted, including the entire unbalanced Dataset B, reduced Dataset B for balanced number of AKI and non-AKI patients, and reduced Dataset B that excluded patients without baseline SCr within 7 days before the indexed abnormal SCr. The machine learning programs were edited by using Python version 3.11.

The evaluation of model performance

For each machine learning model, the performance was evaluated by its accuracy, precision, recall (sensitivity), specificity, and F1 score obtained from the formula: (2 * precision * recall)/(precision + recall). The predictive value of the machine learning models were evaluated by area under curve of receiver operating characteristic (AUROC).

Statistics

Continuous variables were shown as mean ± standard deviation; categorical variables were shown as frequency and percentage. Analytic statistical tests for continuous variables were performed by using two-tailed t-test for independent samples; analytic statistical tests for categorical variables were made using chi-squared test. P values of < 0.05 was considered as significant. Statistical analysis was performed using SAS 9.4 (SAS Institute Inc., Cary, NC, USA).

Clinical characteristics of the datasets

The Dataset A consisted of 1,423 AKI and 1,423 non-AKI patients respectively. The gender, point-of-care SCr, RBC, Hct, body temperature, weight and height were similar between AKI and non-AKI patients. On the other hand, AKI patients had significantly younger age, higher serum Na and K, higher AST and ALT, lower hemoglobin, higher RDW-CV, higher WBC, higher neutrophil fraction, lower lymphocyte, monocyte, eosinophil and basophil fraction, lower platelet counts and higher PDW. Regarding physical measurements, the AKI patients had significantly higher respiratory rate, lower SBP and DBP, lower SpO₂ and higher pulse rate. (Table 1).

Table 1

Baseline characteristics of the Dataset A.
Characteristics	AKI	Non-AKI	P value
Number	1,423	1,423	n/a
Male, n (%)	822 (57)	854 (60)	0.13
Age, year	73.2 ± 16.1	75.1 ± 14.1	< 0.05
Creatinine, mg/dL	3.6 ± 2.3	3.7 ± 3.2	0.30
Na, mmol/L	140.4 ± 9.1	138.5 ± 7.2	< 0.05
K, mmol/L	4.4 ± 1.0	4.1 ± 0.7	< 0.05
AST, U/L	176.9 ± 859.6	65.1 ± 318.7	< 0.05
ALT, U/L	87.8 ± 294.4	36.6 ± 127.0	< 0.05
RBC, 10⁶/uL	3.3 ± 0.8	3.4 ± 0.8	0.12
Hb, g/dL	10.0 ± 2.2	10.3 ± 2.1	< 0.05
Hct, %	30.3 ± 6.9	30.8 ± 6.6	0.06
RDW-CV, %	16.8 ± 3.0	15.7 ± 2.4	< 0.05
WBC, 10³/uL	13.1 ± 19.0	10.2 ± 13.5	< 0.05
Neutrophil, %	80.0 ± 15.4	74.2 ± 14.3	< 0.05
Lymphocyte, %	10.4 ± 11.3	14.5 ± 10.3	< 0.05
Monocyte, %	6.2 ± 4.05	7.7 ± 4.1	< 0.05
Eosinophil, %	1.1 ± 2.1	2.2 ± 3.2	< 0.05
Basophil, %	0.3 ± 0.5	0.5 ± 0.5	< 0.05
Platelet, 10⁶/uL	153.9 ± 102.2	192.7 ± 98.5	< 0.05
PDW, %	17.4 ± 0.8	17.2 ± 0.7	< 0.05
Respiratory rate, /min	19.5 ± 5.3	18.3 ± 3.5	< 0.05
SBP, mmHg	117.2 ± 23.9	127.9 ± 22.4	< 0.05
DBP, mmHg	64.7 ± 16.2	68.5 ± 14.1	< 0.05
SpO2, %	96.4 ± 4.4	96.9 ± 3.1	< 0.05
Temperature, ℃	36.7 ± 0.6	36.7 ± 0.4	0.25
Pulse rate, /min	91.8 ± 21.4	83.4 ± 17.5	< 0.05
Weight, kg	61.2 ± 15.2	62.2 ± 14.8	0.09
Height, cm	158.6 ± 15.3	159.1 ± 10.3	0.32
Dataset A was composed of AKI and non-AKI hospitalized patients that were randomly selected (n = 1423 in each group). The presence or absence of AKI was defined by a validated computerized algorithm.
AKI, acute kidney injury; AST, aspartate aminotransferase; ALT, alanine aminotransferase; RBC, red blood cell; Hb, hemoglobin; HCT, hematocrit; RDW-CV, red cell distribution width; WBC, white blood cell; PDW, platelet distribution width; SBP, systolic blood pressure; DBP, diastolic blood pressure; SpO₂, oxygen saturation.
Category variables were expressed as frequency (%) and were analyzed using chi square; continuous variables were expressed as mean ± standard deviation and analyzed using student’s t test for independent groups.

Dataset B consisted of 1,331 hospitalized patients with abnormal renal function (SCr > 1.3 mg/dL), which had been classified into 334 AKI patients and 997 non-AKI patients by the researcher nephrologists of the present study as mentioned previously. The AKI and the non-AKI patients showed similar gender, age, serum K, RDW-CV, body temperature, and height. Otherwise, the AKI patients had significantly lower SCr, higher serum Na, higher AST and ALT, higher RBC, hemoglobin, and Hct, higher WBC, neutrophil fraction, lower lymphocyte, monocyte, eosinophil, and basophil fraction, lower platelet count and higher PDW, higher respiratory rate, lower SBP and DBP, lower SpO₂, and higher pulse rate. (Table 2)

Table 2

Baseline characteristics of the Dataset B.
Characteristics	AKI	Non-AKI	P value
Number	334	997	n/a
Male, n (%)	207 (62)	648 (65)	0.26
Age, year	73.3 ± 16.5	72.6 ± 14.3	0.45
Creatinine, mg/dL	3.0 ± 1.9	3.4 ± 2.9	< 0.05
Na, mmol/L	139.2 ± 8.2	137.8 ± 6.4	< 0.05
K, mmol/L	4.2 ± 0.8	4.2 ± 0.6	0.56
AST, U/L	166.2 ± 779.2	35.9 ± 183.0	< 0.05
ALT, U/L	95.7 ± 461.6	24.8 ± 60.0	< 0.05
RBC, 10⁶/uL	3.7 ± 0.8	3.6 ± 0.8	< 0.05
Hb, g/dL	11.2 ± 2.5	10.8 ± 2.3	< 0.05
Hct, %	33.5 ± 7.7	32.4 ± 7.1	< 0.05
RDW-CV, %	15.6 ± 2.7	15.5 ± 2.4	0.81
WBC, 10³/uL	12.0 ± 7.5	8.17 ± 4.4	< 0.05
Neutrophil, %	79.0 ± 13.5	70.6 ± 13.5	< 0.05
Lymphocyte, %	11.5 ± 10.0	17.6 ± 10.3	< 0.05
Monocyte, %	6.56 ± 4.3	8.44 ± 4.0	< 0.05
Eosinophil, %	1.3 ± 2.5	2.4 ± 2.7	< 0.05
Basophil, %	0.3 ± 0.4	0.6 ± 0.5	< 0.05
Platelet, 10⁶/uL	183.1 ± 103.7	207.1 ± 96.4	< 0.05
PDW, %	17.2 ± 0.8	17.1 ± 0.6	< 0.05
Respiratory rate, /min	18.8 ± 6.5	17.7 ± 2.6	< 0.05
SBP, mmHg	121.8 ± 24.4	130.8 ± 21.4	< 0.05
DBP, mmHg	68.6 ± 16.2	71.8 ± 14.3	< 0.05
SpO2, %	96.4 ± 4.5	97.2 ± 2.9	< 0.05
Temperature, ℃	36.7 ± 0.5	36.6 ± 0.3	0.11
Pulse rate, /min	86.4 ± 18.9	79.3 ± 14.7	< 0.05
Weight, kg	64.9 ± 17.7	64.2 ± 14.7	0.48
Height, cm	161.4 ± 8.9	161.6 ± 9.0	0.67
Dataset B was composed of 334 AKI patients and 997 non-AKI patients. The diagnoses of AKI were made by the researcher nephrologists of the present study.
AKI, acute kidney injury; AST, aspartate aminotransferase; ALT, alanine aminotransferase; RBC, red blood cell; Hb, hemoglobin; HCT, hematocrit; RDW-CV, red cell distribution width; WBC, white blood cell; PDW, platelet distribution width; SBP, systolic blood pressure; DBP, diastolic blood pressure; SpO₂, oxygen saturation.
Category variables were expressed as frequency (%) and were analyzed using chi square; continuous variables were expressed as mean ± standard deviation and analyzed using student’s t test for independent groups.

PCA and UMAP were used to reduce the dimension of all the features mentioned above to visualize their correlations with the existence of AKI. Both PCA and UMAP showed that these features could roughly distribute the AKI patients to the right upper dimension and the non-AKI patients to the left lower dimension. (Fig. 1) Thus, these features were used to build the machine learning models to identify AKI patients.

The repeated machine learning model (method 1)

As stated previously, for the method 1, 70% of Dataset A was randomly selected to be used as the training dataset and the other 30% was used as the testing dataset. This method was applied to all the seven machine learning models mentioned previously. The same procedure was repeated for 1,000 rounds to obtain the average performance of the machine learning models. The performance parameters of each of the machine learning models were expressed as the mean ± standard deviation of the 1,000 repeats. The accuracy of the method 1 ranged from 0.65 to 0.69. The machine learning models SVM, GB, XGB, and RF achieved highest accuracy (0.69 ± 0.01). The F1 score of the method 1 ranged from 0.55 to 0.69. The machine learning models XGB and RF exhibited the highest F1 score (F1 score = 0.69 ± 0.01). The machine learning models of the method 1 exhibited AUROC ranging from 0.73 to 0.76, of which SVM, GB, XGB, and RF AUROC showed the highest AUROC (0.76 ± 0.01). Overall, with the method 1, the machine models XGB and RF showed the best performance (Table 3).

Table 3

Repeated machine learning models based on Dataset A (n = 1,423 in each group).
Model	Accuracy	Precision	Recall	Specificity	F1 score	AUROC
SVM	0.69 ± 0.01	0.70 ± 0.01	0.67 ± 0.02	0.71 ± 0.02	0.68 ± 0.01	0.76 ± 0.01
LR	0.67 ± 0.01	0.68 ± 0.01	0.64 ± 0.02	0.70 ± 0.02	0.66 ± 0.01	0.73 ± 0.01
GB	0.69 ± 0.01	0.70 ± 0.01	0.67 ± 0.02	0.70 ± 0.02	0.68 ± 0.01	0.76 ± 0.01
XGB	0.69 ± 0.01	0.70 ± 0.01	0.68 ± 0.02	0.70 ± 0.02	0.69 ± 0.01	0.76 ± 0.01
RF	0.69 ± 0.01	0.69 ± 0.01	0.68 ± 0.02	0.70 ± 0.02	0.69 ± 0.01	0.76 ± 0.01
NB	0.65 ± 0.01	0.76 ± 0.02	0.44 ± 0.05	0.86 ± 0.02	0.55 ± 0.04	0.73 ± 0.01
NN	0.67 ± 0.01	0.68 ± 0.02	0.65 ± 0.02	0.70 ± 0.03	0.67 ± 0.01	0.74 ± 0.01
The parameters were derived from machine learning repeated for 1000 times and expressed in mean ± standard deviation. In each repeat of machine learning, 70% of the data were randomly selected as training dataset and the other 30% were used as testing dataset. The presence or absence of AKI was defined by a validated computerized algorithm.
SVM, Support Vector Machine; LR, Logistic Regression; GB, Gradient Boosting; XGBoost, Extreme Gradient Boosting; RF, Random Forest; NB, Naive Bayes classifiers; NN, Neural Network.

The single machine learning models (method 2)

In the method 2, the entire Dataset A (n = 2,846) was used as the training dataset and the Dataset B (334 AKI patients and 997 non-AKI patients) was used as the testing dataset. In the first trial of the method 2, the entire unbalanced Dataset B was used as the testing dataset. The accuracy showed a wide range from 0.53 to 0.74, of which NB was the machine learning model showing the highest accuracy (accuracy = 0.74). The F1 score ranged from 0.48 to 0.52, and the AUROC ranged from 0.70 to 0.74. As a reference to the performance of machine learning models, the entire Dataset B was also applied to a pre-established computerized algorithm of AKI diagnosis as stated previously. The computerized algorithm showed the accuracy of 0.95 and the F1 score of 0.90. On the other hand, the clinician’s diagnosis achieved only accuracy of 0.57 and F1 score of 0.34. (Table 4)

Table 4

Single machine learning models based on Dataset A (n = 1,423 in each group) and unbalanced Dataset B as the testing dataset (n = 1,331).
Model	Accuracy	Precision	Recall	Specificity	F1 score	AUROC
SVM	0.68	0.41	0.69	0.67	0.52	0.74
LR	0.63	0.37	0.70	0.60	0.49	0.70
GB	0.56	0.34	0.80	0.48	0.48	0.72
XGB	0.53	0.32	0.80	0.44	0.46	0.72
RF	0.56	0.34	0.80	0.49	0.48	0.72
NB	0.74	0.48	0.50	0.82	0.49	0.72
NN	0.67	0.41	0.70	0.67	0.52	0.74
Traditional methods
Computerized algorithm	0.95	0.95	0.85	0.98	0.90	n/a
Clinician’s diagnosis	0.57	0.28	0.45	0.61	0.34	n/a
Dataset A was used for machine learning and the entire Dataset B was used as the testing dataset. In the procedure of machine learning, the presence or absence of AKI was defined by a validated computerized algorithm. In the procedure of result performance evaluation, the diagnoses of AKI were made by the researcher nephrologists of the present study.
SVM, Support Vector Machine; LR, Logistic Regression; GB, Gradient Boosting; XGBoost, Extreme Gradient Boosting; RF, Random Forest; NB, Naive Bayes classifiers; NN, Neural Network.

In the second trial of the method 2, while the entire Dataset A was used as the training dataset, the testing Dataset B was randomly reduced to balanced AKI and non-AKI group (n = 334 in each group). The accuracy ranged from 0.63 to 0.72, of which the machine learning model RF exhibited the highest accuracy (0.72). The F1 score ranged from 0.53 to 0.72 and the AUROC ranged from 0.71 to 0.75. As a reference to the machine learning models, the computerized algorithm exhibited the accuracy of 0.92 and the F1 score of 0.91. The clinician’s diagnosis for this balanced Dataset B had accuracy of 0.52 and F1 score of 0.49. (Table 5)

Table 5

Single machine learning models based on Dataset A (n = 1,423 in each group) and balanced Dataset B as the testing dataset (n = 334 in each group).
Model	Accuracy	Precision	Recall	Specificity	F1 score	AUROC
SVM	0.69	0.72	0.64	0.74	0.68	0.75
LR	0.67	0.68	0.65	0.69	0.66	0.71
GB	0.69	0.68	0.71	0.67	0.70	0.74
XGB	0.68	0.67	0.72	0.65	0.69	0.74
RF	0.72	0.71	0.73	0.70	0.72	0.74
NB	0.63	0.74	0.41	0.85	0.53	0.73
NN	0.69	0.71	0.64	0.74	0.67	0.75
Traditional methods
Computerized algorithm	0.92	0.98	0.85	0.98	0.91	n/a
Clinician’s diagnosis	0.53	0.54	0.45	0.61	0.49	n/a
Dataset A was used for machine learning and the Dataset B balanced for equal number of AKI and non-AKI patients was used as the testing dataset. In the procedure of machine learning, the presence or absence of AKI was defined by a validated computerized algorithm. In the procedure of result performance evaluation, the diagnoses of AKI were made by the researcher nephrologists of the present study.
SVM, Support Vector Machine; LR, Logistic Regression; GB, Gradient Boosting; XGBoost, Extreme Gradient Boosting; RF, Random Forest; NB, Naive Bayes classifiers; NN, Neural Network.

In the third trial of the method 2, the entire Dataset A was used as the training dataset in the same way. In the testing Dataset B, patients without previous SCr values within 7 days before the indexed SCr were excluded to remove all deduced diagnoses (n = 398). In this trial, the accuracy had a narrower range from 0.65 to 0.71, of which RF exhibited the highest accuracy (0.72). The F1 score ranged from 0.56 to 0.72 and the AUROC ranged from 0.72 to 0.77. In this Dataset B post exclusion of patients without baseline SCr within 7 days, the computerized algorithm exhibited the accuracy of 0.86 and the F1 score of 0.85; The clinician’s diagnosis showed the accuracy of 0.52 and the F1 score of 0.51. (Table 6) In the three trials of the method 2, we can find that balanced or more precisely defined testing dataset revealed superior performance of the machine learning models.

Table 6

Single machine learning models based on Dataset A (n = 1,423 in each group) and Dataset B post exclusion of patients without baseline creatinine values in prior 7 days as the testing dataset (n = 398).
Model	Accuracy	Precision	Recall	Specificity	F1 score	AUROC
SVM	0.71	0.72	0.70	0.73	0.71	0.77
LR	0.67	0.67	0.67	0.67	0.67	0.72
GB	0.71	0.71	0.73	0.70	0.72	0.77
XGBoost	0.70	0.69	0.73	0.66	0.71	0.75
RF	0.72	0.71	0.73	0.70	0.72	0.77
NB	0.65	0.75	0.45	0.84	0.56	0.74
NN	0.71	0.72	0.68	0.74	0.70	0.76
Traditional methods
Computerized algorithm	0.86	0.95	0.76	0.96	0.85	n/a
Clinician’s diagnosis	0.52	0.52	0.49	0.55	0.51	n/a
Dataset A was used for machine learning and the Dataset B post exclusion of patients without baseline creatinine values in prior 7 days was used as the testing dataset. In the procedure of machine learning, the presence or absence of AKI was defined by a validated computerized algorithm. In the procedure of result performance evaluation, the diagnoses of AKI were made by the researcher nephrologists of the present study. SVM, Support Vector Machine; LR, Logistic Regression; GB, Gradient Boosting; XGBoost, Extreme Gradient Boosting; RF, Random Forest; NB, Naive Bayes classifiers; NN, Neural Network.

The feature importance of the machine learning models

Three machine learning models that showed the best performance in the third trial of the method 2, including RF, XGBoost, GB were selected to evaluate the feature importance. In RF model, the three features the highest importance were lower lymphocyte fraction, lower SBP, and lower SCr (Fig. 2A). In XGBoost model, the three features the highest importance were lower SCr, lower lymphocyte fraction, and higher WBC count (Fig. 2B). In GB model, the three features the highest importance were lower SCr, higher AST, lower SBP (Fig. 2C). Overall, lower SCr, higher WBC count, lower lymphocyte fraction, higher RDW-CV, lower platelet count, higher GOT, younger age, lower SBP, higher pulse rate, and higher respiratory rate were features of AKI.

In summary, the repeated machine learning models used in the present study showed the accuracy of ranged from 0.65 to 0.69 and the AUROC ranged from 0.73 to 0.76 for the diagnosis of AKI without available baseline SCr. On the other hand, the single machine leaning models showed the accuracy ranged from 0.53 to 0.74 and the AUROC ranged from 0.70 to 0.74 for the diagnosis of AKI without available baseline SCr. These findings suggest that repeated machine learning models exhibit superior accuracy and better predictive value for AKI diagnosis. In addition, while the single machine learning models did not exhibit better accuracy and predictive value in balanced testing dataset (method 2, trial 2), it did show better performance in testing dataset post exclusion of patients with uncertain AKI status (method 2, trial 3). Remarkably, with available past SCr records, the Computerized algorithm showed superiority in every index compared to either repeated or single machine learning models. Among the repeated machine learning models, all machine learning models tested in this study showed similar performance. Among the single machine learning models, RF, XGBoost, and GB showed superior performance than other machine learning models being tested.

Since the diagnosis of AKI is based on a 7-day increase of SCr [17], computerized algorithms can accurately make the diagnosis of AKI in patients with available baseline SCr or a recent record of SCr. Nevertheless, for patients without such reference SCr values, the diagnosis of AKI is difficult for computerize algorithms and even for clinicians. In the present study, we tried to cross this obstacle by using machine learning models to identify AKI events based on point-of-care features of patients presented with abnormal SCr. Remarkably, all the patients included had abnormal SCr values and thus the function of our models are actually to distinguish AKI events from preexisting CKD. To date, the application of machine learning in the care of AKI has mainly focused on the prediction of AKI. Thus, such AKI prediction models with short time windows could be compared with our AKI diagnosing models as references [19]. In an AKI prediction model for all-care setting conducted by Cronin et al. in 2015, pre-admission laboratory tests of -5 days to + 48 hours from admission date were obtained from more than 1.6 million hospitalizations for model training. They found that the models (LR, LASSO regression, RF) exhibited AUROC of 0.746–0.758 for the prediction of in-hospital AKI events [20]. In another study, He et al. tested machine learning models differentiating AKI at different prediction time windows. Their models exhibited AUROC ranged 0.720–0.764. Among the tested models, the best model performance was achieved by predicting AKI one day in advance [21]. A similar study by Cheng et al. tested different data collection time windows for training datasets of AKI prediction models. The results suggested that RF algorithm showed the best performance of AKI prediction at 1–3 days in advance with AUROC of 0.765, 0.733, and 0.709, respectively [22]. Compared with these studies, our repeated machine learning models exhibited AUROC ranged 0.73–0.76, depending on the training model used, showing that repeated machine learning could exhibit similar performance with different model algorithms and is comparable to machine learning models with large training data.

Although we compare the present AKI diagnosis model to AKI prediction models, the difference between these two models does exist. In the work by Koyner et al., which was also conducted in all-care setting, they tested the models with and without change in SCr from baseline. The results showed that excluding “change of SCr” from inputted features did not affect the model’s ability on AKI predictions [23]. On the contrary, in the present study, SCr is an important feature to be inputted into the machine learning model of AKI diagnosis, regardless of the algorithm used. The cause of this difference may be that AKI prediction relies more on the severity comorbidities rather than existing abnormal SCr reading. On the other hand, the SCr value at the point-of-care is an important feature for the identification of AKI events.

Among studies of developing AKI prediction models, the researchers have been looking for the best machine learning algorithm for predicting the risk of coming AKI events. In an AKI prediction model developed with 1.6 million training dataset by Cronin et al. in 2015, they found that the performance of traditional LR and LASSO regression models were slightly superior to that of RF model [20]. In the work of Kim et al. in 2021, in which they intended to develop a continuous real-time prediction model for AKI events, recurrent neural network algorithm was found to be most suitable for predicting AKI events 48 hours in advance [24]. In the single machine learning models of the present study, we found that RF, XGBoost, and GB algorithms exhibited superior performance for the diagnosis of AKI. Nonetheless, in the case of repeated machine learning models, the difference between different algorithms was not obvious. This finding suggests that with repeated training, the performance of different machine learning algorithms may approach to a consistent level.

In the work of Yue et al., they built a machine learning model for AKI prediction in patients with sepsis, in which the most important model features included urine output, mechanical ventilation, body mass index, estimated glomerular filtration rate, SCr, partial thromboplastin time, and blood urea nitrite [25]. In addition to features directly related to renal function, features related to general disease severity also weigh importantly in this model of sepsis-related AKI. In the present model, which was design for all-care setting, the features related to sepsis, such as lymphocyte fraction, WBC count, platelet count, pulse rate, SBP, and GOT also took important roles. This finding suggests that in the context of all-care setting, sepsis is the most important cause of AKI in hospitalized patients.

As electronic diagnostic tools have been applied to decision support and electronic alert systems of AKI, these studies showed heterogenous design of systems and revealed mixed results [26]. In the past studies, electronic AKI alert systems have shown acceptable accuracy and applicability [27, 28]. Furthermore, Hodgson and colleagues showed that their electronic AKI alert system reduced the incidence of hospital acquired AKI and in-hospital mortality [29]. On the contrary, a study by Wilson et al. enrolling 6030 patients showed that electronic AKI alert system did not reduce the risk of primary outcome with heterogeneity of effects across clinical centers [30]. The results of the present study suggest that while electronic diagnostic tools may improve the accuracy of AKI diagnosis, timely differential diagnosis and management are necessary to achieve better outcomes.

The limitation of the present study was relatively small sample size, especially the testing dataset. On the other hand, regarding the all-care setting of the present study, our machine learning models may be applicated to hospitalized patients admitted to both critical care units and general wards.

In conclusion, the machine learning models were able to diagnose AKI without available baseline SCr records. In addition, the machine learning models for AKI diagnosis in the present study showed superior accuracy than clinicians did. We also found that the repeated machine learning models showed more consistent and superior performance than the single machine learning models. Remarkably, the computerized AKI diagnostic algorithms showed superior accuracy than machine learning models in the context that baseline SCr is available. As a result, these two different approaches may be combined to build a more comprehensive electronic AKI diagnostic system in future.

AKI, acute kidney injury

ALT, alanine aminotransferase

AST, aspartate aminotransferase

AUROC, area under curve of receiver operating characteristic

CKD, chronic kidney disease

DBP, diastolic blood pressure

GB, Gradient Boosting

Hct, hematocrit

LR, Logistic Regression

NB, Naive Bayes classifiers

NN, Neural Network

PCA, Principal Component Analysis

PDW, platelet distribution width

RBC, red blood cell count

RDW-CV, red cell distribution width

RF, Random Forest

SBP, systolic blood pressure

SCr, serum creatinine

SpO2, oxygenation saturation

SVM, Support Vector Machine

UMAP, Uniform Manifold Approximation and Projection

WBC, white blood cell count

XGBoost, Extreme Gradient Boosting

Disclosures

All the authors declare no interest or relationship within the last 3 years directly related to this manuscript, including advisory positions, consulting fees, equity & stock ownership, and non-financial support. We also declare no patents or copyrights an author may have that are relevant to the work in the manuscript.

Funding

This work was supported, in part, by the research grants from the Wan Fang Hospital, Taipei Medical University (111-wf-f-1, 111-wf-f-2, 112-wf-swf-03). These funding agencies had no influence on the study design, data collection or analysis, the decision to publish, or preparation of the manuscript.

Acknowledgements

The content of this publication does not reflect the views or policies of the authors’ facilities.

Authors’ contributions

Conception and Design: Chen CY, Chang TI, Hsieh HL, and CT Liu; Analysis and Interpretation: Chen CY, SC Hsu, Chen CH, Huang PH, Hsieh HL, and CT Liu,; Data Collection: CY Chen, Chang TI, Chu YL, Huang NJ, HL Hsieh, and CT Liu; Writing the Manuscript: Chen CY and CT Liu; Critical Revision: Chen CH, Hsu SC, Sue YM, Chen TH, Lin FY, Shih CM, Huang PH, Hsieh HL, and CT Liu; Approval and Accountability: all co-authors approve and agree to be accountable for the manuscript; Statistical Analysis: Chen CY, Chu YL, Huang NJ, HL Hsieh, and CT Liu; Obtaining Funding: CT Liu and Sue YM.

Data Availability statement

The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

Levey AS, & James MT. Acute Kidney Injury. Ann Intern Med 2017; 167(9): ITC66–ITC80.
Rangaswami J, Bhalla V, Blair JEA, Chang TI, Costa S, Lentine KL, et al.; American Heart Association Council on the Kidney in Cardiovascular Disease and Council on Clinical Cardiology. Cardiorenal Syndrome: Classification, Pathophysiology, Diagnosis, and Treatment Strategies: A Scientific Statement From the American Heart Association. Circulation. 2019; 139(16): e840–e878.
Sadudee Peerapornratana S, Manrique-Caballero CL, Gómez H, and Kellum JA. Acute kidney injury from sepsis: current concepts, epidemiology, pathophysiology, prevention and treatment. Kidney Int. 2019; 96(5): 1083–1099.
Bienholz A, Wilde B, and Kribben A. From the nephrologist's point of view: diversity of causes and clinical features of acute kidney injury. Clin Kidney J. 2015; 8(4): 405–414.
Abebe A, Kumela K, Belay M, Kebede B, Wobie Y. Mortality and predictors of acute kidney injury in adults: a hospital-based prospective observational study. Scientific reports 2021; 11(1): 15672.
Connell A, Laing C. Acute kidney injury. Clinical medicine 2015; 15(6): 581–584.
Levey AS. Defining AKD: The Spectrum of AKI, AKD, and CKD. Nephron 2022; 146(3): 302–305.
Holmes J, Roberts G, Geen J, Dodd A, Selby NM, Lewington A, et al. Utility of electronic AKI alerts in intensive care: A national multicentre cohort study. Journal of critical care 2018; 44:185–190.
Al-Jaghbeer M, Dealmeida D, Bilderback A, Ambrosino R, Kellum JA. Clinical Decision Support for In-Hospital AKI. J Am Soc Nephrol 2018; 29(2): 654–660.
Park S, Baek SH, Ahn S, Lee KH, Hwang H, Ryu J, et al. Impact of Electronic Acute Kidney Injury (AKI) Alerts with Automated Nephrologist Consultation on Detection and Severity of AKI: A Quality Improvement Study. Am J Kidney Dis 2018; 71(1): 9–19.
Wilson FP, Shashaty M, Testani J et al. Automated, electronic alerts for acute kidney injury: a single-blind, parallel-group, randomised controlled trial. Lancet 2015; 385: 1966–1974.
Colpaert K, Hoste EA, Steurbaut K, Benoit D, Van Hoecke S, De Turck F, Decruyenaere J. Impact of real-time electronic alerting of acute kidney injury on therapeutic intervention and progression of RIFLE class. Crit Care Med 2012; 40: 1164–1170.
Wilson FP, Shashaty M, Testani J, Aqeel I, Borovskiy Y, Ellenberg SS, Feldman HI, Fernandez H, Gitelman Y, Lin J, Negoianu D, Parikh CR, Reese PP, Urbani R, Fuchs B. Automated, electronic alerts for acute kidney injury: A single-blind, parallel-group, randomised controlled trial. Lancet 2015; 385: 1966–1974.
Kidney Disease: Improving Global Outcomes (KDIGO) Acute Kidney Injury Work Group. KDIGO Clinical Practice Guideline for Acute Kidney Injury. Kidney International Supplements 2012; 2, 124–138.
Palevsky PM, Liu KD, Brophy PD, Chawla LS, Parikh CR, Thakar CV, et al. KDOQI US commentary on the 2012 KDIGO clinical practice guideline for acute kidney injury. Am J Kidney Dis. 2013; 61(5): 649–672.
Bagshaw SM, Goldstein SL, Ronco C, Kellum JA, and for the ADQI 15 Consensus Group. Acute kidney injury in the era of big data: the 15th Consensus Conference of the Acute Dialysis Quality Initiative (ADQI). Can J Kidney Health Dis. 2016; 3: 5.
Makris K, and Spanou L. Acute Kidney Injury: Definition, Pathophysiology and Clinical Phenotypes. Clin Biochem Rev. 2016; 37(2): 85–98.
Arif Khwaja. KDIGO clinical practice guidelines for acute kidney injury. Nephron Clin Pract. 2012; 120(4): c179–84.
Yu X, Ji YW, Huang MJ, Feng Z. Machine learning for acute kidney injury: Changing the traditional disease prediction mode. Front Med (Lausanne). 2023 Feb 3:10:1050255.
Cronin RM, VanHouten JP, Siew ED, Eden SK, Fihn SD, Nielson CD, et al. National Veterans Health Administration inpatient risk stratification models for hospital-acquired acute kidney injury. J Am Med Inform Assoc. 2015; 22(5):1054–71.
He JQ, Hu Y, Zhang XZ, Wu LJ, Waitman LR, Liu M. Multi-perspective predictive modeling for acute kidney injury in general hospital populations using electronic medical records. JAMIA Open. 2019; 2(1):115–122.
Cheng P, Waitman LR, Hu Y, Liu M. Predicting Inpatient Acute Kidney Injury over Different Time Horizons: How Early and Accurate? AMIA Annu Symp Proc. 2018: 2017: 565–574.
Koyner JL, Carey KA, Edelson DP, Churpek MM. The Development of a Machine Learning Inpatient Acute Kidney Injury Prediction Model. Crit Care Med. 2018; 46(7): 1070–1077.
Kim K, Yang H, Yi J, Son HE Ryu JY, Kim YC, et al. Real-time clinical decision support based on recurrent neural networks for in-hospital acute kidney injury: external validation and model interpretation. J Med Internet Res. (2021): e24120.
Yue SR, Li SS, Huang XY, Liu J, Hou XF, Zhao YM, et al. Machine learning for the prediction of acute kidney injury in patients with sepsis. J Transl Med. 2022; 20(1): 215.
Bajaj T and Koyner JL. Artificial Intelligence in Acute Kidney Injury Prediction. Adv Chronic Kidney Dis. 2022; 29(5): 450–460.
Colpaert K, Hoste E, Hoecke SV, Vandijck D, Danneels C, Steurbaut K, et al. Implementation of a real-time electronic alert based on the RIFLE criteria for acute kidney injury in ICU patients. Acta Clin Belg. 2007: 62 Suppl 2: 322–5.
Selby NM, Crowley L, Fluck RJ, McIntyre CW, Monaghan J, Lawson N, et al. Use of electronic results reporting to diagnose and monitor AKI in hospitalized patients. Clin J Am Soc Nephrol. 2012;7(4): 533–40.
Hodgson LE, Roderick PJ, Venn RM, Yao GL, Dimitrov BD, Forni LG. The ICE-AKI study: Impact analysis of a Clinical prediction rule and Electronic AKI alert in general medical patients. PLoS One. 2018; 13(8): e0200584.
Wilson FP, Martin M, Yamamoto Y, Partridge C, Moreira E, Arora T et al. Electronic health record alerts for acute kidney injury: multicenter, randomized clinical trial. BMJ. 2021: 372: m4786.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Machine Learning Models for Point-of-Care Diagnostics of Acute Kidney Injury

Status:

Version 1

Abstract

Figures

Introduction

Methods

Study Design and Participants

The features of the machine learning models

The computerized algorithm for the definition of AKI

Researcher’s definition for AKI and clinician’s diagnosis

The development of machine learning models

The evaluation of model performance

Statistics

Results

Clinical characteristics of the datasets

The repeated machine learning model (method 1)

The single machine learning models (method 2)

The feature importance of the machine learning models

Discussion

Abbreviations

Declarations

References

Additional Declarations

Status:

Version 1