Setting & data
Our cohort of unplanned admissions is from two acute hospitals which are approximately 65 kilometres apart in the Yorkshire & Humberside region of England – Scarborough hospital (n ~ 300 beds) and York Hospital (YH) (n ~ 700 beds), managed by York Teaching Hospitals NHS Foundation Trust. For this study, the two acute hospitals are combined into a single dataset and analysed collectively. The hospitals have electronic NEWS scores and vital signs recording which is routinely collected as part of the patient’s process of care (see Table S1).
We considered all adult (age ≥ 18 years) emergency medical admissions (excluding ambulatory care area patients), discharged (alive/deceased) during 3 months (11 March 2020 to 13 June 2020), with electronic NEWS recorded within ± 24 hours of admission. This on-admission NEWS score is referred to as the index NEWS.
For each emergency admission, we obtained a pseudonymised patient identifier, patient’s age (years), gender (male/female), discharge status (alive/dead), admission and discharge date and time, diagnoses codes based on the 10th revision of the International Statistical Classification of Diseases (ICD-10) [11, 12], NEWS (including its subcomponents respiratory rate [breaths per minute], temperature [oC], systolic pressure [mmHg], pulse rate [beats per minute], oxygen saturation [percentage], oxygen supplementation [yes/no], and alertness level [alert, voice, pain, unconscious] ) [6, 13], blood test results (albumin [g/L], creatinine [umol/L], haemoglobin [g/l], potassium [mmol/L], sodium [mmol/L], urea [mmol/L], and white cell count [109 cells/L]), and Acute Kidney Injury (AKI) score.
Table 1
Four risk scores for predicting the risk of mortality and sepsis, known as computer-aided risk scoring systems (CARSS)
Computer-Aided Risk (CAR) score | NEWS data only (N) | NEWS and Blood test results data (NB) |
Mortality (M) | CARM_N | CARM_NB |
Sepsis (S) | CARS_N | CARS_NB |
We had developed and externally validated four risk scores: 1) CARM_N for predicting in-hospital mortality based on NEWS [10]; 2) CARM_NB for predicting in-hospital mortality that incorporates routine blood test results [7]; CARS_N for predicting sepsis based on NEWS [9]; CARS_NB for predicting sepsis that incorporates routine blood test results [8] (see Table 1). These four equations are collective known as computer-aided risk scoring systems (CARSS), calculated using index NEWS and blood test results. We excluded records where the index NEWS (or blood test results) was not within ± 24 hours (± 96 hours) or was missing/not recorded at all (see Table S2).
The ICD-10 code ‘U071’ was used to identify records with COVID-19. We searched primary and secondary ICD-10 codes for ‘U071’ for identifying COVID-19. We also linked positive laboratory results for COVID-19 swabs to an automated diagnostic coding entry in the patient electronic health record.
Statistical Analyses
We report discrimination and calibration statistics as performance measures for CARSS [14].
We determined the discrimination of CARSS using the concordance statistic (c-statistic) that gives the probability of randomly selected patients who experienced adverse outcome had a higher risk score than a patient who does not. For a binary outcome (COIVD-19/Non-Covid-19), the c-statistic is the area under the Receiver Operating Characteristics (ROC) curve [15]. The ROC curve is a plot of the sensitivity, (true positive rate), versus 1-specificity, (false positive rate), for consecutive predicted risks. A c-statistic of 0.5 is no better than tossing a coin, whilst a perfect model has a c-statistic of 1. In general, values less than 0.7 are considered to show poor discrimination, values of 0.7 to 0.8 can be described as reasonable, and values above 0.8 suggest good discrimination [16].
Calibration measures a model's ability to generate predictions that are on average close to the average observed outcome and can be readily seen on a scatter plot (y-axis observed risk, x-axis predicted risk). Perfect predictions should be on the 45° line. We internally validated and assessed the calibration for all the models using the bootstrapping approach [17, 18]. The overall statistical performance was assessed using the scaled Brier score which incorporates both discrimination and calibration [14]. The Brier score is the squared difference between actual outcomes and predicted risk of COVID-19, scaled by the maximum Brier score such that the scaled Brier score ranges from 0–100%. Higher values indicate superior models. The 95% confidence interval for the scaled Brier score was calculated using bootstrap approach [19].
We followed the STROBE guidelines to report the findings [20]. All analyses were undertaken using R [21] and Stata [22]. The 95% confidence interval for the c-statistic was computed using DeLong’s method as implemented in the pROC library [23].