Identification of Relevant Studies
A total of 540 articles were identified by searching three electronic databases. Among them, 105 were duplicate studies, and 384 were excluded during the initial screening by reviewing titles and abstracts. The full texts of the remaining 53 articles were thoroughly reviewed. Among these, 34 studies were excluded from the final analysis due to the following reasons: abstract (n=15), review (n=11), clinical score (n=2), study with incomplete data (n=2), failed to get the original text (n=3) and did not pertain to topic (n=1, the topic of this article was automated identification of the electronic medical record). The remaining 19 studies were included in the final analysis, which was shown in Figure 1.
Characteristics of eligible studies
The total number of subjects tested in the included studies was 304,076, with the sample size ranged from 109 to 96,653 17-35.
Seventeen studies described the demographic characteristics of their study population, of whom the mean age was 37 to 71 years old and the percentage of males was 16% to 88% 17,19-31,33-35.
The included studies were categorized based on the type of the surgery participants received, including cardiothoracic surgery, any inpatient operative procedure, liver transplantation, total knee arthroplasty 17-35.
Enrolled studies presented the performance of the AI algorithms with test dataset (internal validation), and there were only four studies 21,26,27,34 that presented the performance of external validation. Nine studies 21-25,28,32-34 established the AI algorithm based on the gradient boosting machine (GBM), three studies 17,19,35 established random forest (RF)-based algorithms, three studies 20,27,29 established two types of artificial neural network (ANN)-based algorithms, one study 26 established Bayesian network (BN)-based algorithm, one study 31 established decision-tree (DT)-based algorithm, one study 30 established an ensemble algorithm, and another study even conducted a novel machine learning risk algorithm 18 called: MySurgeryRisk .
Fifteen studies applied the Kidney Disease Improving Global Outcomes (KDIGO) definition for AKI 17-19,21,22,24-27,29-34. Among these, some used serum creatinine changes only to define AKI while urine output criteria were not adopted 21,23,25,29,34. Two studies applied the Acute Kidney Injury Network (AKIN) criteria 20,23.
These characteristics (modifiers) were evaluated as potential sources of heterogeneity through subgroup analysis and meta-regression. (Table 1) shows the detailed characteristics of the studies.
Methodological Quality of the Studies (Figure 2)
Among the 19 studies17-35 in the final analysis, 4 studies 18,25,32,33 showed low risk of bias, 2 studies 26,29showed unclear risk of bias, and 13 studies 17,19-24,26-28,30-33 showed high risk of bias.
Regarding the participants domain, the risk of bias was low in 18 studies 17-25,27-35 and unclear in one due to insufficient information describing the sampling method in external validation 26.
Concerning the predictors domain, we considered the risk of bias unclear in one study 31 because the details of the predictors were not reported.
In terms of the outcomes, 15 studies 17-19,21,22,24-27,29-34 applied the Kidney Disease Improving Global Outcomes (KDIGO) definition for AKI, but we considered the risk of bias unclear in five studies 21,22,24,29,34 because they utilised creatinine changes only. The risk of bias was high in one study 27 because only patients with severe AKI were enrolled. In addition, two studies 28,35 which used their own criteria for AKI were also considered to have high risk of bias.
The most concerning issue seen in the analysis was the high risk of bias in majority of the included studies (13/19). The risk of bias in 12 studies 17,19-23,27,28,30,31,34,35 was considered high.
Overall, studies 17,19-24,26-28,30-33 with high risk in at least one of the four domains were rated as low methodological quality in the diagnostic test accuracy of artificial intelligence for the prediction of acute kidney injury during the perioperative period (Figure 2)
Diagnostic Test Accuracy of Artificial Intelligence for the prediction of Acute Kidney Injury during perioperative period
The Figure 3 showed the paired forest plot for sensitivity and specificity with the corresponding 95% CIs for each study. The SROC curve, with a 95% confidence region, was illustrated in Figure 4. The following summarised estimates using the HSROC model were also calculated: sensitivity 0.77 (95% CI: 0.73 to 0.81), specificity 0.75 (95% CI: 0.71 to 0.80), positive likelihood ratio 3.2 (95% CI: 2.7 to 3.7), negative likelihood ratio 0.30 (95% CI: 0.26 to 0.35), and diagnostic odds ratio 10.7 (95% CI 8.5 to 13.5). To investigate the clinical utility of AI, a Fagan nomogram was generated. Assuming a 50% prevalence of AKI during the perioperative period, the Fagan nomogram shows that the posterior probability of AKI was 76% if the test was positive, and the posterior probability of the absence of AKI was 23% if the test was negative (Figure 5).
Exploring Heterogeneity with Meta-Regression andSubgroup Analysis
The shape of the SROC curve was symmetric (Figure 4). However, we observed a medium positive correlation after logit transformed TPR and FPR (Spearman correlation coefficient=0.48), and an asymmetric parameter, β, with a significant P-value (P=0.036) indicating threshold heterogeneity among the studies.
The heterogeneity was not found among the included studies in the joint model of meta-regression (type of AI [P=0.58], number of included patients [P=0.22], type of surgery [P=0.17], methodological quality [P=0.93], external validation [P=0.69], the definition of AKI [p=.14] Figure 6)
(Table 2) shows the detailed results of subgroup analysis exploring the potential source of between-study heterogeneity.
Sensitivity analysis
After excluding one study at a time, the results (figure 7) showed that every result is 95% within the confidence interval, combined DOR was 10.66 (95% CI: 8.47 to 13.40), which meant the outcomes of meta-analysis was robust.
Publication Bias
Publication bias were assessing using Deek’s funnel plot for the prediction of AKI during the perioperative period (Figure 8). The plot was grossly symmetrical with respect to the regression line. The Deek’s funnel plot asymmetry test showed no evidence of publication bias (P=0.62).