The systematic review was registered with the International Database of Prospectively Registered Systematic Reviews (PROSPERO; registration number CRD42021261676) and conducted in accordance with the PRISMA guideline for reporting of systematic reviews.21
Identification of studies
A comprehensive electronic search was carried out through PubMed, Cochrane database, Scopus, and ClinicalTrials.gov (http://clinicaltrials.gov/) up until June 2021. The search terms used in PubMed were: (heart OR cardiac OR aort* OR valv* OR thoracic) AND surg*) OR ‘valve replacement*’ OR ‘bypass*’ OR ‘CABG’ OR ‘extracorporeal circulation’ OR ‘on pump’ OR ‘Cardiac Surgical Procedures’) AND (‘neutrophil gelatinase-associated lipocalin’ OR NGAL OR ‘LCN2 protein, human’) AND (‘diagnostic accuracy’ OR ‘sensitivity’ OR ‘specificity’ OR PPV OR NPV OR ‘positive predictive value’ OR ‘negative predictive value’) . In the Cochrane library, Scopus and ClinicalTrials.gov, a similar strategy was used. In addition, abstracts from meetings and reference lists of eligible papers or related reviews were searched manually to identify additional relevant studies.
Inclusion and exclusion criteria
The inclusion criteria for studies were: (i) adult cardiac surgery cohort requiring CPB; (ii) measurement of plasma NGAL for the early diagnosis of AKI (within 24 h) after cardiac surgery; (iii) provision of data from which true-positive (TP), false-positive (FP), false-negative (FN) and true-negative (TN) could be found or calculated; (iv) AKI clearly defined by acceptable methods- preferably by KDIGO, RIFLE or AKIN criteria 22-24 and (v) those published in English. Exclusion criteria were: (i) studies with duplicate data reported in other studies; (ii) sample size less than 25; (iii) timing of pNGAL measurement not clearly defined; (iv) inclusion of paediatric patients within the cohort; (v) more than 20% ‘off-pump’ patients included in the cohort; (vi) insufficient diagnostic accuracy data available.
Study selection and data extraction
One reviewer (HSC) screened the titles and abstracts of all citations to judge eligibility based on the inclusion and exclusion criteria. For citations that could not be evaluated through the titles and abstracts, full texts were retrieved for thorough evaluation.
A second reviewer (JF) second checked all prospective citations for eligibility. Full-text copies of all potentially relevant reports were retrieved and assessed for inclusion by both reviewers (HSC and JF).
One reviewer (HSC) extracted the data from each study. This was checked for accuracy by the second reviewer (JF). Any discordance was then checked by the first reviewer. The following information was recorded from each selected study (i) basic characteristics of studies: name of the first author, year of publication, sample size, country; (ii) characteristics of cohort: AKI diagnosis criteria, number of ‘off-pump’ patients, number of patients who developed AKI; (iii) measurement of NGAL: specimen type, analytical method, NGAL test cut-off and the timing of sample collection; (iv) the criteria for the diagnosis of AKI; (v) study outcomes: test sensitivity and specificity and or true positive (TP), false positive (FP), false negative (FN), true negative (TN), positive predictive value (PPV), negative predictive value (NPV) and area under the receiver operator characteristic curve (AUROC).
Assessment of the risk of bias
The Quality Assessment of Diagnostic Accuracy Studies version 2 (QUADAS-2) tool was used to assess the risk of bias.25 The following items were evaluated: patient selection, interpretation of the index test, appropriateness and interpretation of the reference standard, flow of patients and timing of tests. The applicability of each study to the question under review was also assessed to consider whether the procedures employed in a study would differ significantly from those employed in real clinical practice. Each item was scored as low risk of bias, unclear risk of bias or high risk of bias. Two reviewers (HSC and JF) assessed the risk of bias.
Studies identified to pose a high risk of bias were not excluded from the meta-analysis, but the findings were instead interpreted in light of the bias, which is in keeping with good practice.
For each study, sensitivity, specificity, prevalence, PPV, NPV, TP, FP, FN and TN cases were recorded. If a study lacked the mandatory diagnostic accuracy data, the TP/FP/FN/TN according to the following formulae: sensitivity = TP/ (TP + FN), specificity = TN/ (FP + TN), AKI + non-AKI = TP + FP + TN + FN were calculated and entered into a 2 × 2 table.
The diagnostic data were entered into Review Manager software (RevMan version 5.4, Nordic Cochrane Centre, Copenhagen) to generate forest plots of sensitivity and specificity. The odds ratio was used for the synthesis and presentation of results.
To estimate the summary values for sensitivity and specificity, and their 95% confidence and prediction regions, a random-effects meta-analysis odds ratio (OR) was performed using the hierarchical summary ROC (hSROC) model implemented in STATA® software version 16.1 (StataCorp LP, College Station, TX, USA) using the METANDI command. This model is described in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy for comparisons of test accuracy when there is variability in threshold between studies and is preferred as it takes into account both sensitivity and specificity measures and the correlation between them, assumes that thresholds vary between studies and incorporates variability within and between studies.26 In the presence of heterogeneity, which was expected in this review, a random-effects meta-analysis weights the studies relatively more equally than a fixed-effect analysis.
In accordance with the STATA requirements, meta-analyses were performed only when data from four or more studies were available. For studies that reported multiple time points, each was assigned to a group based on the timing of the pNGAL sample in relation to the cessation of CPB. These were <4 h, 4-8 h 12 h or 24 h post-cessation of CPB. A separate meta-analysis was performed for each time point. Subgroup meta-analysis was also performed based on whether a point of care test (POCT) or laboratory-based method was used.
Heterogeneity was assessed by visual inspection of the forest plots and of the size of the prediction region in the hSROC plots. The index of variability (I2) statistic and chi-squared test statistic were used to approximate proportion of total variability in point estimates that could be attributed to heterogeneity, although the limitations of this approach are discussed. An I2>50% with a p-value <0.05 from the chi-squared test was indicative of moderate heterogeneity. 27
Studies identified to have a high risk of bias were individually removed from the meta-analysis and data to determine the effect on the summary points and heterogeneity.