Early warning score using electronically-held data for patients discharged from intensive care units

Rationale. Intensive care units (ICUs) admit the most severely ill patients. Once these patients are discharged from the ICU to a step-down ward, they continue to have their vital signs monitored by nursing staff, with early warning score (EWS) systems being used to identify those at risk of deterioration. Objectives. We report the development and validation of an enhanced continuous scoring system for predicting adverse events, which combines vital signs measured routinely on acute care wards (as used by most EWSs) with a risk score of a future adverse event calculated on discharge from ICU. Methods. A modified Delphi process identified common, and candidate variables frequently collected and stored in electronic records as the basis for a ‗static‘ score of the patient‘s condition immediately after discharge from the ICU. L1 -regularised logistic regression was used to estimate the in-hospital risk of future adverse event. We then constructed a model of physiological normality using vital-sign data from the day of hospital discharge, which is combined with the static score and used continuously to quantify and update the patient‘ s risk of deterioration throughout their hospital stay. Data from two NHS Foundation Trusts (UK) were used to develop and (externally) validate the model. measurements and 4,831 from cohort. Outcome validation of our model yielded an area under the receiver operating characteristic curve (AUROC) of 0.724 predicting ICU re-admission in-hospital It showed an improved performance with respect to other competitive risk scoring systems, including the National EWS (NEWS, We a data fro m a patient‘s stay in ICU has better performance than commonly-used EWS systems based on vital signs alone.


Introduction
Patients hospitalised for acute conditions can suffer adverse events such as cardiac arrests or unplanned admission to a higher-acuity area. These events are usually preceded by changes in vital signs some hours before (1)(2)(3)(4)(5). Early detection of these changes might prevent some of the subsequent adverse events, so many simple Early Warning Score (EWS) systems have been developed and deployed clinically (6)(7)(8). EWS systems are now recommended to be used as part of routine care by the UK National Institute for Health and Clinical Excellence (9)(10)(11) and are widely used in other healthcare settings.
EWS systems initially used purpose-designed paper charts (12,13). An increasing number of healthcare providers now use electronic systems (14,15) to gather physiological data and calculate scores (16). With the increasing use of electronic patient records, there is scope to develop improved EWS systems that include many more variables than are currently used, potentially leading to increasing predictive accuracy by using more complex algorithms to calculate scores (17).
We hypothesise that an EWS system could better predict adverse events if it included variables from a patient's electronic medical record. As a proof of concept, we studied patients discharged to acute care wards from intensive care units (ICUs). These patients have a detailed electronic patient record (EPR) containing data about their ICU stay, and typically having a high rate of adverse events occurring after discharge to a ward. In addition, there is a large body of literature describing variables that are correlated with adverse events in these patients.
This study describes the development of an enhanced scoring system for predicting adverse events in patients discharged from ICUs which combines (i) routine vital signs measured on acute care wards (as used by most EWS systems) with (ii) a risk score of a future adverse event calculated on discharge from ICU. The study further validates the ability of the proposed scoring system and other published EWS systems to identify patients at risk of death or re-admission to ICU.

Materials and Methods
This study is reported in line with the TRIPOD statement (18).

Regulatory approvals
The project had UK Health Research Authority Confidentiality Advisory Group approval (ref:

Data sources
Data from two NHS Foundation Trusts (UK) were used in this study. The Oxford University Hospitals NHS Foundation Trust (OUH) has two adult general ICUs. The Royal Berkshire Hospital NHS Foundation Trust (RBH) in Reading, UK, has a single adult general ICU.
We used routinely-collected data, stored in the ICU computerised information systems (CIS). The OUH and RBH CIS contain all measurements of physiological status, and other relevant clinical information such as patient demographics; details of treatments and interventions; and laboratory test results recorded during a patient's ICU stay. An average of 655 data items are recorded daily for each patient.
All the ICUs in this study use a Philips Healthcare CIS (Philips Healthcare, Eindhoven, Netherlands). In addition we used the linked datasets submitted to the UK national comparative audit for ICUs, or ICNARC CMP (19).
During the study period, neither organisation had electronic recording of vital signs for patients in acute care wards outside the ICUs. Vital-sign data were collected prospectively for recruited patients in both organisations for the first 14 days after discharge from ICU. The physiological data from each patient's paper observation chart were entered into an electronic database. Data were double-entered in 55% of patients to confirm the accuracy of data entry.

Participants
All completed adult admissions to OUH ICUs from June 2006 to December 2015 and to the RBH ICU from April 2010 to December 2015 were used to develop the ICU discharge score. Admissions were only considered when (i) patients were discharged alive from ICU and (ii) the discharge status of the patient episode (or hospital admission) was known (or recorded). We excluded patients discharged from ICU for palliative care or transferred to another organisation. Vital signs data on wards were acquired prospectively after discharge from ICU for patients in OUH between April 2013 and December 2014, and in RBH from October 2013 and December 2014.
After exclusion criteria were applied, we split the database into -development‖ and -validation‖ sets by organisation, such that (1) we developed and evaluated the model using data from all valida admissions within the OUH group using crossvalidation methods, and (2) we performed an external validation of the model using data from all valid admissions within the RBH group.

Outcomes
The primary outcome was the first occurrence of either in-hospital death or readmission to the ICU. For the evaluation of EWS systems, we considered the compound outcome of in-hospital death or ICU re-admission within the next N hours of a vital-sign observation, in line with previous studies (20, 21). For the primary outcome, we set N = 24 hours. We have also evaluated the systems for different values of N (as detailed below). Secondary outcomes were in-hospital death or readmission to ICU, individually.

Predictors
We used a conceptual model in which we estimated the risk of a patient experiencing an adverse event at the point of discharge from ICU from variables recorded during their stay in the ICU. After discharge from ICU, the patient's risk of experiencing an adverse event within the following 24h was calculated using both ward-recorded vital signs and the risk calculated at ICU discharge.

ICU-based feature representation:
We used an evidence-based technique to select candidate variables to calculate the risk of deterioration at ICU discharge. We systematically reviewed of studies reporting a significant (p < 0.05) association between a variable recorded during an ICU stay and either in-hospital death or ICU re-admission. The resulting list of candidate variables was reviewed by a panel of five clinical experts in a modified Delphi process who added other variables they expected to be predictive of adverse events. Candidate variables were then limited to those available in our electronic databases. These were either -static‖ variables (mainly based on demographic information) or time-varying variables recorded repeatedly throughout the patient's stay in the ICU (see Figure 1). To determine the risk of future compound outcome after discharge from the ICU, we derived a total of 161 candidate features from all candidate variables, which were then used for building a prediction model.
Post-ICU feature representation: All vital-sign observations recorded for 14 days after discharge from ICU were considered for analysis. Each set of vital signs includes heart rate, systolic blood pressure, respiratory rate, body temperature, neurological status assessment using the Alert-Verbal-Painful-Unresponsive (AVPU) scale, peripheral oxygen saturation (SpO 2 ), a record of whether the patient was receiving supplemental oxygen at the time of the SpO 2 measurement, and the date and time of the observation. Vital-sign measurements are typically recorded every 4 or 6h throughout the patient's stay on the ward (see Figure 1).
The final list of candidate variables and features, and procedures for pre-processing (including dealing with missing data) are further described [see Additional file 1].

Model development
To develop the risk scoring system, our approach assumes that at each vital-sign observation performed after discharge from ICU, the patient's current condition can be characterised (or represented) by a single risk estimate. Immediately after ICU discharge, the patient is assigned a risk score (RS 1 ) estimated from an ICU-based set of features, which is then updated using the abnormalities in their vital signs recorded during subsequent ward care (RS 2 ).
We therefore built the first model, RS 1 , using the features derived from the variables acquired during the patients' stay in the ICU using the development set. We used a L1-regularised logistic regressor for predicting the compound outcome. For the second scoring system, RS 2 , we applied a one-class classification method (22) using the vital-sign data recorded after discharge from the ICU, as described previously (23, 24). Further details of the model development are available [see Additional file 1].
An overall risk score, the risk score index (RSI), was subsequently determined using a simple time-dependent linear combination of the two constituent risk scores, such that: where is used to adjust the weight of RS 1 with respect to the time since discharge from ICU, corresponds to the elapsed time (in hours) since the patient was discharged from ICU and has a maximum value of hours. Further details of the model development and optimisation of the parameters are available [see Additional file 1].

Model validation and statistical analysis
The discrimination of the first model, RS 1 , was assessed using the area under the curve of the receiver operating characteristic (AUROC) metric. Calibration was assessed using a goodness of fit test, the Hosmer-Lemeshow -C‖ statistic, the Brier score and Cox's calibration regression (25-27). The performance of this first model was examined both for the compound outcome and each adverse event (in-hospital death and ICU re-admission) individually. The ability of RS 1 to predict future adverse events at increasing intervals from ICU discharge was also examined by calculating the AUROC for future events by day after (ICU) discharge (up to 120 days).
The final model (RSI) was validated using the AUROC for the compound outcome of in-hospital death or ICU re-admission within the next N hours of a vital-sign measurement recorded after ICU discharge, in line with previous studies for evaluating EWS systems (20, 21). We evaluated the model for different values of N, with [ ] hours. We note that in this case the AUROC represents how well the scoring system RSI discriminates between observation sets followed by an adverse event and those with no subsequent adverse outcome within the next N hours. Therefore, the unit of analysis is a vital-sign set rather than a patientadmission, as performed for the validation of the first model.
We also considered each individual adverse event separately. To understand better the feasibility of implementating the risk scoring systems in this setting, we also evaluated the burden of observation sets -triggered‖ by the risk scoring system for every correctly identified observation followed by an adverse event within 24h.
We report the cross-validation results using the development dataset. We also report the external validation results using data from the RBH Trust. Confidence intervals were estimated using bootstrap confidence intervals via percentiles, with 500 samples (28).

Comparison with published risk scoring systems
We compared the performance of our proposed risk scoring index, RSI, with that of each model individually (RS 1 and RS 2 ), and with that of a number of published and clinically-used EWS systems: the modified EWS, or MEWS (29), the standardised EWS, or SEWS (30), the National EWS, or NEWS (21), and our centile-based EWS, or CEWS (31). We detail the components and weightings of the individual EWS systems in Additional file 2.

Results
The application of inclusion and exclusion criteria to patient-admissions in order to derive the final cohorts in both organisations included in this study is shown in Figure  SM3 Table 1 provides a summary of characteristics of the development and validation cohorts derived from the ICU admissions. Both cohorts were similar in terms of demographic and administrative data. The compound outcome rate was slightly higher in the development dataset than that in the validation dataset, considering either all ICU admissions or the sub-group of post-ICU admissions. We also note that the number of in-hospital deaths is higher than the number of re-admissions to ICU. In the development cohort, 12,394 vital-sign observation sets were acquired from the 273 patients after ICU discharge, and 4,831 from the 136 patients in the validation cohort.
For the first model (RS 1 ), which estimates the risk of future adverse events immediately after ICU discharge, 45 features from the 161 candidate features identified from the systematic review and expert opinion, were retained in the final model. These are listed Table SM3-1 [see Additional file 3]. They comprise largely measures of cardiac or respiratory physiology, renal and hepatic function, plasma electrolytes, measures of inflammation and measures of treatment intensity. Of note, artificial ventilation during the last 24h of ICU admission was associated with a lower risk of adverse outcome. This is because all the ICUs admit ventilated, elective, postsurgical admissions who have a low risk of adverse events and are discharged within 24h of extubation. Calibration plots of the model for both development and validation sets are shown in Figure SM3-2 [see Additional file 3]. Table 2 summarises the performance for the combined outcome and for either adverse event (in-hospital death or ICU re-admission) considered individually. Table 3 shows the performance of RSI and other baseline scoring systems for predicting observation sets followed by in-hospital death or ICU re-admission within the following 24h. The proposed scoring system showed an increased discrimination ability to predict adverse events within 24h with respect to the other risk scoring systems considered in this study. Using the external validation dataset (RBH), RSI gave an AUROC of 0.724 (95% CI of 0.704-0.741), versus 0.653 (0.621-0.683) for NEWS, and 0.672 (0.648-0.695) for CEWS. Figure 2 shows the AUROC values of the risk scoring systems for predicting the compound outcome within 12, 24, 36, 48, and 72h of a vital-sign measurement. The proposed RSI system consistently shows superior discrimination for each derived outcome.

Discussion
There are many variants of the original EWS system (6-8), including systems designed for specific patient groups such as children (32, 33) and patients in highdependency units (34). These systems typically use vital signs to determine the level of risk of an adverse event. Vital signs are used because they are easily-acquired variables, regularly measured in clinical practice, and have a long history of being used to track patients' progression over time. However, point estimates of the risk of future deterioration from single vital-sign measurement sets assume -normal‖ values that may not be appropriate for a hospital population, and ignore other data that can add to the precision and granularity of the risk estimate. The introduction of electronic recording of vital signs and electronic patient records means that the parsimony and simplicity required by paper-based systems are now less relevant.

Main findings and strengths
This was a proof-of-concept study for developing and validating an enhanced EWS that uses many more electronically held variables than the conventional vital signs, and which combined dynamic and static methods of risk evaluation, as typically used in other prediction disciplines such as meteorology (35) and imaging (36). We chose to study post-ICU patients because they have a detailed electronic record generated during their ICU stay, a high adverse event rate after ICU discharge, and because a significant body of literature exists on variables associated with adverse outcomes. However, the design principles we used in this study can be applied to any acute care patient group where sufficient data are available electronically.
As part of this work we developed a new scoring system for predicting in-hospital mortality and re-admission to ICU from data collected during a patient's ICU stay. A systematic review in 2013 (37) identified seven different published scoring systems that quantify the risk of mortality or re-admission at varying intervals after ICU discharge. However, only two studies verified their systems with an external (independent) validation dataset (38, 39). They obtained AUROC values similar to those found in this study.
RSI is a dynamic system that computes an updated risk estimate every time a new vital sign is recorded. It combines this dynamic risk estimate derived from the routinely-measured vital signs on the acute ward (RS 2 ) with a -static‖ risk estimate that is computed immediately after discharge from ICU using data from the ICU CIS (RS 1 ). As the performance of RS 1 is expected to worsen as the time from discharge from ICU increases because patients' conditions change (see Figure SM3-3), we used a time-dependent function to reduce its contribution over time, with the patient's risk score becoming increasingly determined by vital-signs based RS 2 . The combined risk estimates as given by RSI improved the predictive power of our scoring system when compared with RS 2 .
In this study, a compound outcome of in-hospital mortality and ICU re-admission was used as opposed to only using in-hospital mortality (20). Patients discharged from ICU with curative intent may die in hospital in spite of full, timely and appropriate care. They may also die because they do not receive the care they need in a timely fashion. Developing a model using in-hospital mortality as the sole outcome would limit the power of such a model, as appropriately-treated survivors, with a readmission to ICU, would not contribute to the adverse events. Hence, there would be a risk that the resulting model would predict inevitable, rather than preventable, deaths, as also noted by others (40, 41). Our main goal was to develop a system that would identify patients who were likely to respond to earlier intervention of higher acuity care. Whilst ICU re-admission does not capture all the appropriately treated survivors, there is no other marker readily available in hospital electronic records to identify these patients. A similar compound outcome has been used in validation studies of EWSs for the same reason (21, 31, 42). We further note that RSI was considerably better at predicting in-hospital mortality than ICU re-admission (see Table 3). This has also been noted in other studies reporting evaluation of EWS systems (21, 42, 43), where ICU admissions are less reliably identified than inhospital death.

Limitations
This work has a number of limitations. We studied a very specific group of patients who were at high risk of in-hospital deterioration and who had detailed records of their ICU stay before they were discharged to the ward. We could only study the variables available in the ICU electronic records; hence, some of the candidate variables identified in the systematic review could not be included in the model. In addition, as vital-sign data had to be prospectively collected from consenting patients and transcribed from paper charts, the number of patients in the study, and therefore the number of adverse events, was limited.
We assessed the performance of the combined scoring system developed using the methodology commonly used to assess that of EWS systems. This uses AUROC measures based on paired vital-sign recordings and events within fixed time periods; i.e., derived outcome of occurrence of an adverse event within (e.g.) 24h of a vitalsign measurement. Therefore, AUROC values represent the probability that any randomly-chosen observation followed within the chosen time period by in-hospital death or ICU re-admission has a higher risk score than any randomly-chosen observation not followed by an event in the same time period (44). That is, repeated measurements from the same patient were used for evaluating the performance of scoring systems, which assumes that the scores computed from each observation set for that patient are independent (which is the usual approach when evaluating EWS systems). However, this assumption may not hold in practice; i.e., a vital-sign measurement at one point in time is likely to be correlated with previous measurements. This, alongside the highly imbalanced dataset (in which the outcome occurs infrequently), gives rise to AUROCs with high values that are not truly comparable with AUROCs from single predictor/single outcome algorithms where the unit of analysis is a patient-admission (45).
An external (independent) dataset was used to validate the risk scoring systems. RSI did not perform as well as expected for the combined outcome. This was primarily because the combined system over-estimated the risk of adverse events in higherrisk patients, possibly reflecting a difference in the patient population at the two hospital sites (46). The ICUs at the OUH admit tertiary referral patients not seen in RBH; hence, risk associated with these patients captured in the scoring system built with the development dataset may not have added explanatory power for patient admissions in the validation dataset.
Finally, we note that the performance of our scoring system for predicting adverse events exceeds that of previously published EWS algorithms. This should, however, be interpreted with caution, as most of these systems were developed and validated on all hospital admissions to acute care areas, and our system was developed on a very specific population.

Conclusions
Scoring systems, such as EWSs, are used to identify hospitalised patients at risk of adverse events. In this study, we developed a bipartite score based on machine learning that encompasses the patient state at the time of ICU discharge, as well as vital signs recorded on the wards at the time the risk score is calculated. We showed that a scoring system incorporating data from a patient's stay in an ICU has better performance than typically-used EWS systems based on vital signs alone.

Ethical Approval and Consent to participate
The

Consent for publication
Not applicable.

Availability of data and materials
Fully anonymised individual participant data may be made available to researchers directly affiliated to an academic institution or charitable organisation who provide an independently-reviewed, methodologically sound proposal. The data will be available for a period starting six months after publication and ending one year later. Only data required to answer the primary question posed by the proposed research will be made available. Researchers should contact Professor Young in the first instance.
Other data and outputs from the project (Health Innovation Challenge Fund ref: HICF-0510-006) may be available on request.

Competing interests
JDY, PJW, and LT report grants from Wellcome Trust/UK Department of Health. PJW and LT report grants from National Institute for Health Research, personal fees from Sensyne Health, outside the submitted work.  Performance of the first model (RS 1 ) on both development and validation datasets for the combined outcome (or adverse event) of in-hospital death or readmission to ICU at any point after discharge from ICU. Performance for each event separately are also displayed. Cox's calibration regression: for a good calibration, should be close to 0, and should be close to 1. AUROC denotes area under the receiver operating characteristic curve, shown with mean (standard deviation, SD). Table 3. Area under the receiver operating characteristic curve (AUROC), standard deviation and corresponding 95% confidence interval (CI) for the developed risk scoring index (RSI) and other competitive early warning score (EWS) systems, using adverse event (re-admission to ICU or in-hospital death) within 24h of an observation set as the (compound) outcome. Results are shown for each outcome independently using both development and validation datasets.  Representation of a set of the variables acquired for an example patient included in our study pre (represented with circles) and post discharge from the ICU. Variables include vital signs (dark grey), laboratory tests (lighter grey) and interventions/treatments (white) performed. The grey vertical line marks the patient's ICU discharge timepoint. A more detailed electronic patient record is -generated‖ during the patient's ICU stay. We also note the different frequency of measurement of the vital signs in the ICU from that on the ward.

Figure 2.
Performance of the developed risk scoring system (RSI) and the other competitive scoring systems considered for predicting an adverse event (either readmission to ICU or in-hospital death) within 12, 24, 36, 48, and 72h of a vital-sign observation set. The left-hand panel shows the performance on the development dataset, the right-hand panel shows the performance on the validation dataset. The performance is represented with the mean AUROC (area under the receiver operating characteristic curve) values.