External validation of simplified out-of-hospital cardiac arrest and cardiac arrest hospital prognosis scores in a Japanese population: a multicentre retrospective cohort study

Background The novel simplified out-of-hospital cardiac arrest (sOHCA) and simplified cardiac arrest hospital prognosis (sCAHP) scores used for prognostication of hospitalised patients have not been externally validated. Therefore, this study aimed to externally validate the sOHCA and sCAHP scores in a Japanese population. Methods We retrospectively analysed data from a prospectively maintained Japanese database (January 2012 to March 2013). We identified adult patients who had been resuscitated and hospitalised after intrinsic out-of-hospital cardiac arrest (OHCA) (n=2428, age ≥18 years). We validated the sOHCA and sCAHP scores with reference to the original scores in predicting 1-month unfavourable neurological outcomes (cerebral performance categories 3–5) based on the discrimination and calibration measures of area under the receiver operating characteristic curves (AUCs) and a Hosmer-Lemeshow goodness-of-fit test with a calibration plot, respectively. Results In total, 1985/2484 (82%) patients had a 1-month unfavourable neurological outcome. The original OHCA, sOHCA, original cardiac arrest hospital prognosis (CAHP) and sCAHP scores were available for 855/2428 (35%), 1359/2428 (56%), 1130/2428 (47%) and 1834/2428 (76%) patients, respectively. The AUCs of simplified scores did not differ significantly from those of the original scores, whereas the AUC of the sCAHP score was significantly higher than that of the sOHCA score (0.88 vs 0.81, p<0.001). The goodness of fit was poor in the sOHCA score (ν=8, χ2=19.1 and Hosmer-Lemeshow test: p=0.014) but not in the sCAHP score (ν=8, χ2=13.5 and Hosmer-Lemeshow test: p=0.10). Conclusion The performances of the original and simplified OHCA and CAHP scores in predicting neurological outcomes in successfully resuscitated OHCA patients were acceptable. With the highest availability, similar discrimination and good calibration, the sCAHP score has promising potential for clinical implementation, although further validation studies to evaluate its clinical acceptance are necessary.


INTRODUCTION
Out-of-hospital cardiac arrest (OHCA) remains a global medical challenge. 1 2 The overall incidence of OHCA has been reported to be 80-110 cases per 100 000 population, with approximately 275 000 cases in Europe, 350 000 cases in the USA and 110 000 cases in Japan each year. [3][4][5] Survival rates after OHCA are improving, largely because of improvements in early recognition, community response and cardiopulmonary resuscitation (CPR) techniques; however, post-OHCA survival rates remain low and range from 0% to 18%. 6 7 Post-cardiac arrest patients often require longterm and expensive life support, and survivors often have disabling neurological sequelae, with the reported rate for survival with good neurological function ranging from 1.6% to 6.9%. 2 3 8 This situation leads to healthcare burdens on families and caregivers. 1 2 6 Therefore, prognostication of patients is important, as it aids in identifying patients who will benefit most from intensive care, the allocation of healthcare resources and counselling for patients' families. The OHCA score 9 and the cardiac arrest hospital prognosis (CAHP) score 10 were developed to support the prediction of neurological outcomes of patients who were hospitalised after resuscitation following a cardiac arrest using variables recorded immediately after admission. Both OHCA and Original research CAHP scores are calculated using clinical and demographic variables that describe the cardiac arrest and biochemical measures obtained in hospital.
The OHCA and CAHP scores have both been validated. [9][10][11][12] However, the use of the no-flow interval limits their clinical application because estimating the time of collapse is challenging in unwitnessed OHCAs, which account for up to 40% of OHCAs. 13 14 Furthermore, the Utstein template no longer requires a no-flow time recording. 15 To address these limitations, Wang et al developed and validated the simplified OHCA (sOHCA) and simplified CAHP (sCAHP) scores, wherein no-flow time was excluded from the measures, in a prospective cohort of 412 OHCA patients. They reported that both scores had excellent discriminatory performance for predicting neurological outcome after return of spontaneous circulation (ROSC) and were comparable in accuracy to the original scores. 8 External validation is an important step prior to widespread implementation in clinical routine. However, to our knowledge, no study to date has externally validated sOHCA and sCAHP scores. We aimed to externally validate these predictive scores using the Japanese OHCA registry.

Study design
This retrospective observational study evaluated data from a 2012 survey of survivors after OHCA in the Kanto region (SOS-KANTO), which was a multicentre observational study in the Kanto region of Japan. 16 Data from adult OHCA patients (≥18 years old) who were hospitalised after successful ROSC were analysed. Cases were excluded if they (1) involved extrinsic aetiology (trauma, burn, hypothermia, hanging, drowning, asphyxiation or drug overdose) or (2) had missing data on 1-month outcomes after cardiac arrest. Patients with extrinsic aetiology were excluded based on the original studies of predictive scores. 8 10 The large sample size selected to decrease the uncertainty in our estimates included all the available data that met the inclusion and exclusion criteria.
This study protocol was approved by the institutional review board of the Tokyo Metropolitan Bokutoh Hospital (approval number: 31-041). All data were deidentified prior to the analysis, and the need for informed consent was waived.

Patient and public involvement
This was a secondary analysis of an existing registry dataset. Patients or the public were not involved in the design, conduct, reporting or dissemination plans of our research.

Data collection and outcome measures
The SOS-KANTO Study comprised 16 452 cardiac arrest patients from 67 emergency medical centres between January 2012 and March 2013. 16 The emergency medical services (EMS) personnel collected data on age, sex, bystander-initiated CPR with or without automated external defibrillator (AED) use, cardiac arrest location, witnessed status, initial cardiac rhythm, various time points (eg, collapse, CPR initiation and ROSC), intravenous epinephrine use and 1-month neurological outcomes, using a form based on the Utstein reporting guidelines. The institutional researchers collected relevant in-hospital information, and neurological outcomes were reported using the following Cerebral Performance Category (CPC) Scale 17 : 1 (good cerebral performance), 2 (moderate cerebral disability), 3 (severe cerebral disability), 4 (coma or vegetative state) and 5 (death). Cardiac arrest was defined as the cessation of cardiac mechanical activities, confirmed by the absence of pulse and normal breathing. Physicians caring for the patients determined the aetiology of the cardiac arrest.
The original OHCA, sOHCA, original CAHP and sCAHP scores were calculated using the formulas detailed in table 1. [8][9][10] The no-flow interval was defined as the time from collapse to CPR initiation and the low-flow interval as the time from CPR initiation to ROSC. The primary outcome was the 1-month unfavourable neurological outcome, defined as a CPC score of 3-5.

Statistical analysis
For descriptive statistics, numerical variables are presented as medians with IQRs, and categorical variables are presented as counts and percentages. We compared two groups using the Mann-Whitney U test for continuous variables and χ 2 or Fisher's exact tests for categorical variables, as appropriate. All statistical analyses were performed using EZR (Saitama Medical Center, Jichi Medical University, Saitama, Japan), 18 a graphical user interface for R (The R Foundation for Statistical Computing, Vienna, Austria) 19 with several add-on packages.
The external validity of the predictive scores was based on measures of discrimination (ability of the model to discriminate between favourable and unfavourable outcomes) and calibration (whether probabilities predicted using the model match observed probabilities). The area under the receiver operating characteristic curve (AUC) and 95% CI were used to assess discrimination. AUC values were compared using the paired DeLong test. We calculated sensitivity and specificity for predicting an unfavourable neurological outcome based on the cut-off value that provided a positive predictive value (PPV) >0.99, which was selected because medical futility has been traditionally defined in the literature as predicting an intervention that provides a <1% possibility of a favourable neurological outcome. 20 Calibration was assessed using the Hosmer-Lemeshow goodness-of-fit test in 10 groups. The extent of overestimation or underestimation relative to the observed and predicted rates was explored graphically using calibration plots.
In the main analysis, missing values were handled using the pairwise deletion method. Additionally, we performed a sensitivity analysis using imputed datasets. To obtain approximately unbiased estimates of the parameters, we performed multiple Table 1 Formulas for original and simplified out-of-hospital cardiac arrest (OHCA) and cardiac arrest hospital prognosis (CAHP) scores (with permission)

Score Formula
Original OHCA score −13 (if the initial recorded rhythm is VF/VT) + 6 × ln (no-flow interval)* + 9 × ln (low-flow interval) † − 1434 / (creatinine) ‡ + 10 × ln (arterial lactate) § Simplified OHCA score −13 (if the initial recorded rhythm is VF/VT) + 9 × ln (low-flow interval) − 1434 / (creatinine) +10 × ln (arterial lactate) Original CAHP score imputations to handle missing data in the analyses. After applying the above mentioned inclusion and exclusion criteria, we conducted a multiple regression analysis and generated 20 imputation datasets. For each imputation model, we included age, sex, cardiac arrest at home, witnessed status, bystander CPR, AED use by a bystander, initial shockable cardiac rhythm, response time, no-flow interval, low-flow interval, pH, lactate level, creatinine level, total dose of epinephrine during CPR and a 1-month CPC score from 3 to 5. Constraints for minimum imputed values were introduced to avoid errors in the predictive score calculation. Based on a previous study 9 and minimum values in the dataset, the lowest possible values for the no-flow interval, low-flow interval, lactate and creatinine were determined to be 0.5 min, 0.5 min, 0.5 mmol/L and 10 µmol/L, respectively. Predictive scores were estimated in each imputed dataset and were integrated based on Rubin's rules. 21 All statistical tests were two-tailed, and statistical significance was defined as a P value <0.05 or assessed using 95% CIs.

RESULTS
The SOS-KANTO 2012 Study evaluated 16 452 patients, and 3335 patients were hospitalised after ROSC. After 907 patients (784 patients with extrinsic aetiology and 123 patients with unknown outcomes) were excluded, 2428 patients were deemed eligible for the analysis (figure 1). Table 2 shows the patients' baseline characteristics and the number of patients with missing values for each variable. The median age was 70 (IQR: 60-80) years, and 68% of the patients were men. Among 2428 patients, 1985 (82%) had unfavourable neurological outcomes 1 month after the OHCA. There were significant differences between patients with complete and incomplete data in terms of age, sex, witnessed status, non-cardiac aetiology, bystander-initiated CPR, call-to-hospital time, low-flow interval, pH, lactate levels, total dose of epinephrine, and CPC scores 3-5 at 1 month after cardiac arrest.

Discrimination of the predictive scores
The AUC of the original and simplified CAHP scores was 0.90 (95% CI 0.87 to 0.92) and 0.88 (95% CI 0.85 to 0.90), respectively. There was no significant difference between the AUC of these two scores (p=0.47). The AUC was significantly higher in the sCAHP score than in the sOHCA score (p<0.001). The cut-off value based on a PPV >0.99 was as follows: 91 for the original OHCA score, 74 for the sOHCA score, 313 for the original CAHP score and 240 for the sCAHP score. Table 4 summarises the performance of each predictive model with cutoff values based on a PPV >0.99. The cut-off values identified 14/855 (2%) patients in the original OHCA score, 58/1359 (4%) patients in the sOHCA score, 53/1130 (5%) patients in the original CAHP score and 241/1834 (13%) patients in the sCAHP score. For predicting an unfavourable neurological outcome, the sensitivity and specificity of the predictive model with the cut-off values were 0.02 (95% CI 0.01 to 0.03) and 1.00 (95% CI 0.96 to 1.00), 0.05 (95% CI 0.04 to 0.06) and 0.99 (95% CI 0.97 to 1.00), 0.06 (95% CI 0.04 to 0.07) and 1.00 (95% CI 0.97 to 1.00) and 0.15 (95% CI 0.13 to 0.17) and 0.99 (95% CI 0.97 to 0.99) for the original OHCA, sOHCA, original CAHP and sCAHP scores, respectively. Figure 3 shows the calibration plots, illustrating how predictive scores performed in predicting 1-month favourable neurological outcome in the study population. The Hosmer-Lemeshow goodness-of-fit test demonstrated that sOHCA and original CAHP scores had poor model fits (ν=8, χ 2 =19.1 and p=0.014 and ν=8, χ 2 =23.8 and p=0.002, respectively), whereas the original OHCA and sCAHP scores had good model fits (ν=8, χ 2 =11.1 and p=0.196 and ν=8, χ 2 =13.5 and p=0.097, respectively). A calibration plot showed that the sOHCA and original CAHP scores overestimated the probability of the 1-month

Original research
unfavourable neurological outcome in patients with relatively better prognosis.

DISCUSSION
Our analyses showed that both the original and simplified OHCA scores, as well as the CAHP scores, had similar acceptable discriminatory ability in predicting neurological outcome 1 month after OHCA in a Japanese cohort. Except for the sOHCA score, these predictive scores calibrated well. Our findings of acceptable discriminatory performance of the original OHCA and CAHP scores were in agreement with those of previous internal and external validations. [9][10][11][12] The result of the sensitivity analysis was largely consistent with these findings. To our knowledge, this is the first study to externally validate the sOHCA and sCAHP scores.
A clinically useful post-OHCA prognostic score should be calculable in a timely manner, using variables that are routinely collected. Delays in obtaining the results of blood analysis and calculating the score with variable transformation may limit the utility of the prognostic scores in the emergency department. For OHCA patients who are successfully resuscitated and admitted to the intensive care unit, these scores are easily applicable in routine clinical practice. However, these scores were unavailable in more than half of patients registered in our dataset, which may limit the utility of these scores as prognostic scores for use in research using routinely collected data. By removing the no-flow interval from the calculation of these predictive scores, data availability increased 1.6-fold. The proportion of unwitnessed OHCAs in our study (28%) was within the reported range of 11%-40%. [10][11][12][13][14] As no-flow interval data cannot be obtained in these unwitnessed OHCAs, our results showing Figure 2 Receiver operating characteristic curve of the predictive scores. Left: receiver operating characteristic curve of the original and simplified out-of-hospital cardiac arrest (OHCA) scores. There were no significant differences in area under the curve between the two scores (p=0.41). Middle: receiver operating characteristic curve of the original and simplified cardiac arrest hospital prognosis (CAHP) scores. There were no significant differences in area under the curve between the two scores (p=0.47). Right: receiver operating characteristic curve of the simplified OHCA and simplified CAHP scores. The area under the curve of the simplified CAHP score was significantly higher than that of the simplified OHCA score (p<0.001).

Original research
significantly higher availability of sOHCA scores were to be expected. Besides facilitating higher data availability, a previous study suggested that removing the no-flow interval from the calculation could be advantageous in improving discrimination because it could decrease uncertainty caused through estimating the no-flow interval, especially in an area where the proportion of unwitnessed cardiac arrest is relatively high. 8 In the present study, discrimination did not improve, but it was maintained.
These results indicate that removing the no-flow interval from the calculation was acceptable in terms of discrimination. A validated predictive score assists clinical decisions and facilitates further clinical research through enabling researchers to compare outcomes between cohorts involving differing demographics and emergency care systems. Furthermore, this score could play a role in the decision-making process that clinicians and families engage in concerning continuing care for a critically ill patient. When the cut-off value that provides a PPV >0.99 was applied to OHCA patients who had ROSC and were admitted to hospital, the proportion of patients considered to have a CPC score of 3-5 was the highest when using the sCAHP score, in which futility was indicated for 13% of the patients. These results suggest that this predictive score facilitates the identification of futility in the care of OHCA patients after successful ROSC at the very beginning of their hospital treatment. We opted for a cut-off value that provided a PPV >0.99, based on a conventional definition of medical futility 20 ; however, there are controversial ethical issues surrounding the futility in relation to medical treatment, and clinical uptake may depend on other factors, including evaluation of acceptability to clinicians and patient groups.
Among the predictive scores assessed, the sCAHP score had the highest availability, and discrimination and goodness of fit were comparable to or better than the other scores. Therefore, we consider that the sCAHP is the most promising option for clinical implementation. We recognise that prognostication requires caution and should be multimodal whenever possible. Various clinical examinations, electrophysiological measurements, imaging studies and evaluation of biomarkers are undertaken to estimate prognosis after OHCA. 22 Predictive scores may reinforce the accuracy of these predictive factors and vice-versa.

Limitations
First, the retrospective study design was prone to various biases. The original OHCA score was developed before the study period; therefore, it is possible that clinicians used that score during the study period, which possibly affected the results. Although the original CAHP score was defined later than the period of the current study, clinicians may have considered some of the factors incorporated into the CAHP score while making their clinical decisions. Therefore, self-fulfilling prophecy is a potential bias. Second, the reason for missing values was not clear. Although we addressed the missing values using the pairwise method, this may have biased our results. Third, a 1-month CPC grade is a standard assessment for functional outcome after cardiac arrest; however, further studies that take long-term functional outcomes into account are warranted because the functional outcome may change over a long period after cardiac arrest. 23 Fourth, our assumption that the variables that are missing in our dataset are representative of the variables that will be unavailable to hospital clinicians was not tested. Fifth, there are no available data regarding the validity of the various measurements in the database. Finally, the dataset we used is relatively old, and outcomes after OHCA are related to both patient demographics and EMS services. Further validation is desirable before implementing a predictive score into practice.

CONCLUSIONS
We externally validated the original OHCA and CAHP scores and the sOHCA and sCAHP scores. By removing the no-flow interval from the calculations, sOHCA scores increased their availability without significant reduction in discrimination. The sCAHP score has promising potential in predicting the outcome of patients who are successfully resuscitated and hospitalised after OHCA, although further validation studies are necessary.

Figure 3
Calibration plots for the predictive scores in predicting 1-month unfavourable functional outcome. Patients were grouped according to 10 percentile increments of predictive probability of unfavourable neurological outcome (cerebral performance categories 3-5). The predicted probabilities of unfavourable outcome in each group are plotted against the actually observed proportions. The line shows the points where the predicted and observed values match. CAHP, cardiac arrest hospital prognosis; out-of-hospital cardiac arrest.