This study complied with the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis statement for reporting methods and results.10 Ethical approval for this study was obtained from the Kyoto University (Kyoto, Japan) ethics committee who waived the requirement for informed consent due to the retrospective nature of the study as it posed minimal risk to the patients (Approval ID: R1045).
Study design and source of data
This study was a secondary analysis of the Comprehensive Registry of In-Hospital Intensive Care for Out-of-Hospital Cardiac Arrest Survival (CRITICAL) and Japanese Association for Acute Medicine Out-of-Hospital Cardiac Arrest (JAAM-OHCA) registries which record data from patients with OHCA transported to critical care medical centres or hospitals with emergency care departments. Prediction models for patients with cardiac arrest who achieved ROSC were previously developed and updated by modifying certain predictors and incorporating 2 additional years of data from the CRITICAL study.9 To develop and update the prediction model, we used data from the CRITICAL study (16 institutions) collected between January 2013 and December 2019. The CRITICAL study is a multicentre prospective registry in Osaka, Japan, of prehospitalization and in-hospital data related to OHCA treatments, the details of which have been previously reported and described.11
To validate the modified prediction model externally, we used data from 56 institutions from the JAAM–OHCA registry who participated in a protocol to record blood gas levels and detailed laboratory data of patients with OHCA after their arrival at the hospital between January 2014 and December 2019. The JAAM–OHCA registry is a multicentre, prospective registry across Japan, the details of which are described in Supplementary Appendix 1.12 Although the JAAM-OHCA registry encompasses some data from hospitals in the Osaka Prefecture where the CRITICAL study was conducted, this data was excluded from the validation set of this study. Hence, the validation set differed geographically from the derivation set and was not involved in the development of these models.
Population
The inclusion criteria for patients were: adults aged 18 years or older who experienced OHCA, were successfully resuscitated (defined as continuous palpable circulation with a self-beat for more than 30 s), and had been admitted to the intensive care unit (ICU) during the study period.
The exclusion criteria included patients with traumatic cardiac arrest, unknown initial rhythms, collapse witnessed by emergency medical service personnel, and those who did not receive CPR from a physician upon hospital arrival or had no prehospital data.
Outcomes
We designated cerebral performance category (CPC) at 90 days as the neurological outcome measure with category 1 representing good cerebral performance, category 2 representing moderate cerebral disability, category 3 representing severe cerebral disability, category 4 representing coma or vegetative state, and category 5 representing death or brain death.13 A good outcome was defined as a CPC of 1 or 2, while a poor outcome was defined as a CPC of 3, 4, or 5. The treating physician was responsible for evaluating the CPC.
Predictors of outcome and data processing
Relevant candidates for predictive variables that hold clinical significance and can be promptly evaluated after ROSC were identified through a thorough analysis of the literature and expert clinical opinions. These candidates are consistent with those employed in the development of existing models.9 Two prediction models were created, each using a different set of variables. Model 1 consisted of patient demographics, pre-hospital information, and the initial rhythm upon hospital admission, whereas Model 2 included all variables, including in-hospital information at the time of ROSC and laboratory data available within 3 h of ROSC. All potential variables were selected from among those obtained in the hours following ROSC (Supplementary Table 1). Unlike existing models, this study treated the duration of the no-flow time as a continuous variable. Following an assessment of nonlinearity using restricted cubic splines,14 it was determined that a linear relationship with the outcome was a good approximation for continuous predictors, except for low-flow time. Because the low-flow time demonstrated a logarithmic transformation and exhibited a linear relationship with the outcome, it was treated as a continuous variable. Continuous variables were standardised, whereas categorical variables were transformed into dummy variables.
Sample size calculation
To achieve an expected R-squared value of 0.15, with an estimated prevalence of 15% for favourable outcomes, Riley et al.'s criteria were employed, indicating that a total of 1317 participants were necessary. 15. For external validation, over 200 events and non-events are needed to obtain accurate calibration estimates.16 In the current validation set, there were a minimum of 400 events and non-events, ensuring a sufficient sample size. Hence, both the development and validation sets surpassed the required estimates in terms of sample size.
Missing values
We employed nonparametric missing value imputation using the "missForest" algorithm with the random forest method as part of our analysis.17 The random forest method is known for generating accurate point estimates by aggregating multiple regression trees through bootstrap aggregation; hence, combining the risk of overfitting and estimates from multiple trees. This method demonstrated superior performance compared with other techniques.18, 19 In our study, 178 (5.3%) and 242 (5.7%) patients in the derivation and validation sets, respectively, were missing the main outcome values (i.e. CPC at 90 days) owing to a loss of follow-up. However, CPC data at 30 days was available for these patients. To accurately predict the outcome measure, we used the available 30-day CPC data along with all predictors to impute the missing 90-day CPC data using the missForest algorithm.20 Additionally, any missing predictor values were imputed using all available predictors and outcomes.
Statistical analysis
Continuous variables are presented as medians with interquartile ranges (IQR). Categorical variables are presented as numbers and percentages.
All statistical analyses were conducted using R software version 4.2.1 (The R Foundation for Statistical Computing, Vienna, Austria).21 The level of significance was set at a two-sided P-value of less than 0.05.
Model development and internal validation
The model development strategy employed logistic regression with a lasso, which incorporated the L1 norm of the coefficients as a penalty term for the loss function. This approach imposes constraints on the coefficients, effectively selecting important predictors while reducing dimensionality and mitigating potential overfitting.22, 23 The coefficients for the final model were selected by choosing a conservative lambda value that was one standard error greater than the lowest cross-validated value.23, 24 The internal validity of the model was assessed using bootstrapping analysis, which involved resampling the model 1000 times. The confidence intervals (CIs) for the prediction accuracy measures of the models were determined using Harrell's bootstrapping bias correction and optimism correction method. This allowed for the computation of a 95% CI for the C-statistic of the model in the derivation set by utilising the 'predboot' package in R.25
External validation
External validation was performed by applying the constructed model to the validation set to assess predictive performance. C-statistics with 95% CIs were used to evaluate discriminatory ability and Brier scores were calculated for each model to measure accuracy, with lower values indicating greater predictive accuracy.26 These scores were defined as the average squared difference between predicted probabilities and observed outcomes, with lower values indicating greater predictive accuracy. The calibration of the model calibration was also assessed visually using a calibration plot that plotted the predicted probability and observed frequency of poor outcomes in the validation set.27 Additionally, we assessed the level of relatedness of the case-mix between the development and validation cohorts (Supplementary Appendix 2).7 Net-benefit values were also calculated and depicted in decision curves to determine the clinical usefulness of different models at appropriate thresholds for clinical use.28 Decision curve analysis involves plotting the "net-benefit" against "threshold probabilities," evaluating the clinical value of different models at various thresholds.