Identifying adults with sepsis, bedside tools versus administrative data: a cohort study

Our objective was to calculate how well the bedside severity scores qSOFA, NEWS, and SIRS predict 30-day mortality from onset of infection compared to the Sepsis-3 recommended diagnostic criteria of an increase in SOFA score of ≥ 2 as a consequence of infection. We then assessed the ability of routinely collected administrative data (ICD-10 codes and blood culture sampling) to identify patients with clinical sepsis. The overall purpose is to inform development of a robust proxy measure for sepsis surveillance at scale. potential for inclusion as part of a clinically relevant proxy marker for sepsis surveillance.


Introduction
Sepsis, rede ned in 2016 as "life-threatening organ dysfunction caused by a dysregulated host response to infection", 1 is globally estimated to cause approximately 5.3 million deaths per year. 2 Survivors of sepsis are at increased risk of hospital readmission and long term functional sequelae. 3,4 Sepsis related morbidity, mortality and consequent nancial burden 5,6 has ignited an international focus on improving the prevention, recognition, and management of sepsis. 7 Evaluating the impact of changes intended to improve practice requires accurate surveillance of sepsis incidence and outcomes which is stable over time.
Sepsis surveillance has been complicated by both changes in de nition and the number of clinical tools used to screen for sepsis. The Sepsis-3 1 consensus guidelines set out diagnostic criteria for sepsis as organ dysfunction, de ned as an acute increase in total Sequential [Sepsis-Related] Organ Failure Assessment (SOFA) 8 score of at least two, as a consequence of infection. Using the Sepsis-3 de nition in emergency departments and general wards is challenging since SOFA is not routinely used outside of critical care. Furthermore, SOFA is calculated only once per 24 hours which limits its ability to detect sepsis early, crucial to the improvement of patient outcomes. 9 In recognition of the need for a bedside tool to identify patients with infection at risk of poor outcomes the Sepsis-3 guidelines recommend the use of qSOFA (quick SOFA). 1 Other severity scores commonly in use include the National Early Warning Score (NEWS & NEWS2) 10,11 and Systemic In ammatory Response Syndrome (SIRS) criteria. 12 However, these scores differ in their ability to predict mortality among people with infection 13,14 so there is a need to establish how they compare to SOFA as the Sepsis-3 1 gold standard to inform a clinically relevant, reliable measure for sepsis surveillance at scale.
The development of sepsis surveillance measures using clinical rather than administrative data is considered desirable 15 however, the national use of electronic health records is not yet widespread even in high income countries. Meantime, there is a need for an objective and reliable method of identifying people with sepsis at scale. The most commonly used administrative data in tracking epidemiological trends are the International Classi cation of Diseases, Tenth Revision (ICD-10) codes. 16 The use of ICD-10 codes to track sepsis is controversial due to changes in de nition, trends in coding practice and the range of potentially appropriate codes. 17 Recent data for NHS England reported a doubling in the use of sepsis codes (A40, streptococcal sepsis; A41, other sepsis; & R57.2, septic shock) between 2016/17 and 2017/2018. 18 The sudden increase in use of these sepsis codes will be multifactorial but, such large changes in coding practice signals problems with using ICD-10 codes in isolation for sepsis surveillance and risks misinterpretation as an absolute rise in sepsis incidence. 19 There is therefore a need for a clinically relevant, robust and resilient method to identify patients with sepsis which can be applied at scale, ideally from routinely collected administrative data. 19 The aims of this paper are: Patients with a blood culture were matched 1:1 to patients who did not have a blood culture taken but were in the same ward within 24 hours of the blood culture sampling and had a similar length of stay (+/-2 days). All matched critical care admissions were included. A random sample of matched ward patients was obtained using a computer generated list of random numbers.

Collection of Administrative Data
Administrative data was de ned as data currently collected nationally which was relevant to sepsis and comprised: blood culture sampling, positive blood cultures and ICD-10 codes for infection and sepsis. The blood cultures were reviewed by a consultant in infectious disease (CM) and categorised as negative, positive or contaminated (Additional le 1).
All ICD-10 codes assigned for each admission were extracted from Trakcare®, the hospital administration system. A list of infection and sepsis codes (Additional le 2) was constructed, informed by the literature, 20 reviewed, and agreed by CM.

Score Calculation
Each patient with a suspected or con rmed source of infection plus antibiotic administration had the following scores calculated: qSOFA, SOFA, NEWS and SIRS. The scores were calculated as per the original tables published elsewhere. 1,8,10,12 Baseline SOFA score for ward patients was assumed to be zero, as per Sepsis-3. 1 A SOFA score of ≥ 2, consequent to infection, was considered to be indicative of sepsis in general ward patients whilst critical care patients required a change of SOFA score of ≥ 2, consequent to infection. A clinical cut off score of ≥ 2 was applied to qSOFA and SIRS. Performance of NEWS was assessed using both the medium (NEWS ≥ 5) and high (NEWS ≥ 7) clinical risk thresholds. 10 Statistical Analysis The performance characteristics of the severity scores were assessed for their ability to predict 30-day mortality from the onset of infection using the clinically adopted cut off scores for each tool. Missing data were treated as normal values in score calculation. The area under the receiver operating characteristic curve (AUROC) and 95% con dence intervals were calculated for each severity score using both the clinically recommended cut-off values and across all values. The difference between the AUROCs across all values was assessed using DeLong's test. 21 The ability of administrative data to identify people with sepsis was assessed by calculation of performance characteristics of variables singly and in combination, and plotting of AUROC against the Sepsis-3 de nition of sepsis. It was also assessed against the best performing severity score in predicting mortality: infection plus NEWS ≥ 7.
Descriptive statistics were used to compare differences between groups. Pearson Chi-Square was calculated for categorical variables. Mann-Whitney U was applied to assess differences between continuous variables in two groups. Kruskall-Wallis was used to assess differences among more than two groups, on a single continuous variable. A p value of less than 0.05 was considered signi cant.

Results
During the study period there were 11207 adult admissions ≥ 24hrs (Fig. 1). Of these, 1826 (16.3%) admissions had at least one blood culture sampled and 9381 (83.7%) admissions had no blood cultures.
Admissions with a blood culture were matched to admissions without a blood culture by clinical area at time of blood culture (+/-24hrs) and length of stay (+/-24hrs) to achieve a balanced cohort of patients with and without blood cultures. Due to the small number of critical care admissions the criteria were revised to matching by critical care area at time of blood culture (+/-96hrs) and length of stay (+/-48 hrs). We matched 898 ward admissions with a blood culture to 898 ward admissions without a blood culture (n = 1796). Within the critical care population, 35 blood culture admissions were matched to 35 admissions without a blood culture (n = 70).
A nal cohort of 1000 admissions (500 with and 500 without a blood culture) comprised a simple random sample of 930 index ward admissions (465 with a blood culture and their matched 465 nonblood culture admissions) and all eligible index critical care admissions (n = 70). Twenty-one admissions were excluded due to missing or inaccessible case notes (n = 20) or misclassi cation (n = 1) of which 4 were matched as critical care admissions and 17 as ward admissions. Their matched admission was also excluded (n = 21), leaving a nal cohort of 958 admissions (479 with and 479 without ≥ 1 blood culture).

Study population
The nal cohort sample (n = 958) had a median age of 68 (IQR 52-79) and 54% were female. Of the total cohort, 630 were treated for infection and of those, 269 (42.7%) had sepsis as per Sepsis-3 criteria (see Table 1). People with sepsis were older (p = < 0.001), had a longer hospital stay (p < 0.001) and were more likely to die (p < 0.001) compared to those with infection or no infection. Only 35.3% (n = 95) patients meeting Sepsis-3 criteria had suspected or con rmed sepsis documented contemporaneously in their medical notes. between sepsis and infection only admissions (Table 2). Compared to the infection group signi cantly more patients in the sepsis group had a NEWS ≥ 7 (p = 0.02) or qSOFA ≥ 2 (p < 0.001). The respiratory component of the SOFA score had a high number of missing values (n = 546) due to ward patients rarely having an arterial blood gas obtained. Glasgow Coma Scale (GCS), used in qSOFA and SOFA, also had a high number of missing values (n = 467).

Performance characteristics of SOFA, qSOFA, NEWS and SIRS criteria
The performance characteristics of each severity score in predicting 30-day mortality from onset of infection were calculated using the usual clinical cut-off scores (  The performance characteristics of each tool were also calculated separately for ward and critical care patients. Of the 630 patients with either infection or sepsis, 571 were ward admissions and 59 were admitted to critical care. For ward patients (n = 571) NEWS predicted 30-day mortality (AUROC 0.80) better across all values than the AUROCs for any other tool (SIRS, 0.71, p = 0.003; qSOFA, 0.71, p = 0.001; SOFA, 0.66, p < 0.001).
Performance characteristics of routine data to identify people with sepsis Only 26 (9.7%) sepsis admissions had a sepsis ICD-10 code (A40, A41 or R57.2) ( Table 4). Coding for infection was more frequent with 65.1% of sepsis admissions being allocated at least one infection code. More than 80% of infection and sepsis codes allocated to infection and/or sepsis admissions were from three ICD-10 chapters: diseases of the respiratory system (J00-J99), certain infectious and parasitic diseases (A00-B99) and diseases of the genitourinary system (N00-N99).

Discussion
To the best of our knowledge this is the rst study which investigates the performance of qSOFA, SIRS, NEWS and SOFA in predicting 30-day mortality of patients in a mixed hospital cohort. We found that the severity scores varied widely in their ability to identify people who met the Sepsis-3 1 criteria. NEWS ≥ 7 was a better predictor of 30-day mortality from onset of infection than SOFA, qSOFA or SIRS in patients with infection. When plotted across all values, however, it did not perform signi cantly better than SOFA.
Analysis of routinely collected administrative data revealed than only one in ten patients with clinical sepsis had an ICD-10 sepsis code allocated and consequent poor performance as a proxy measure for sepsis surveillance. Positive blood cultures also performed poorly, however, blood culture sampling alone demonstrated potential as one component of a robust measure for sepsis surveillance.
While the results of this study are limited by its retrospective, single centre design, the review of infection across the whole hospital admission rather than only at admission, provides a complete picture of sepsis incidence within the cohort. The use of clinical data to identify people with sepsis also avoided the bias of selecting patients on the basis of their ICD-10 codes and also enabled assessment of coding patterns.
The high number of missing values for the respiratory component of SOFA, due to the absence of arterial blood gases in ward patients, and poor documentation of the Glasgow Coma Scale (GCS), used in qSOFA and SOFA, may have resulted in SOFA scores which under-represented the true acuity of the ward population. Other studies have also noted GCS as being frequently missing. 22 We considered replacing GCS with AVPU (Alert, Verbal, Pain, Unresponsive), however, although AVPU was more complete, only 6 out of 17 patients identi ed as having a GCS ≤ 13 had an abnormal AVPU. The limitations of AVPU in identifying people with altered mentation is re ected in its amendment within NEWS2 to ACVPU to include 'new confusion or delirium'. 11 Our ndings are consistent with the broader literature demonstrating superior performance of NEWS over qSOFA and SIRS in predicting mortality. 13,23,24,25 Although other studies have reported SOFA as being superior to NEWS in predicting mortality, 14, 26 this was not replicated in our data. Indeed, SOFA had poorer performance characteristics in our cohort than reported in other retrospective studies. 27,28 This may be explained by SOFA predicting sepsis-associated mortality less well in the ward population of our cohort and its high level of missing data.
The suboptimal performance of qSOFA in predicting mortality, particularly in terms of sensitivity, is similar to that reported elsewhere. 29,30,31 Our study also found that, compared to NEWS, qSOFA did not perform well in identifying people who met the Sepsis-3 criteria. These ndings add to the body of evidence which challenges the use of qSOFA over NEWS. 13 Similar to other studies, SIRS had a high sensitivity but poor speci city and as such is a poor predictor of those patients with infection at an increased risk of a poor outcome. 13,24 In line with other studies, 20, 32 our research reported under coding for sepsis and consequent poor performance of the ICD-10 sepsis codes in identifying patients with sepsis. The growth in coding for sepsis observed in the literature has contributed to reporting of signi cant increases in documented sepsis incidence. 18,34,35 Such changes compromise the use of sepsis codes as a robust and reliable measure of sepsis incidence and underline the need for accurate sepsis epidemiology based on a combined measure which is resistant to bias. Our nding of lack of sensitivity in sepsis codes and lack of speci city in infection codes, supports recent concerns about the quality of coding, its impact on epidemiological trends and potential for misinformed in uence on policy and practice. 35 Although no one combination of administrative data variables had an AUROC of > 0.7, the cut-off for acceptable discrimination, this study did identify nationally collected administrative data with better predictive accuracy for sepsis than the sepsis explicit ICD-10 codes currently used. Blood culture sampling, for example, performed signi cantly better than ICD-10 sepsis codes in identifying patients meeting the Sepsis-3 criteria. As an administrative data variable, blood culture sampling is unaffected by trends in diagnostic coding but is susceptible to changes in sampling practice. Further study is required to assess the effect that standardisation of blood culture sampling in sepsis has on its sensitivity and speci city as a measure. Positive blood cultures predictably performed poorly as a surrogate marker for sepsis and, as reported elsewhere, lacks the sensitivity required for sepsis surveillance. 36 The development of a stable measure to identify people with sepsis from administrative data would make a signi cant contribution to both sepsis surveillance and, more broadly, to the decisions policy makers make in relation to resource allocation in a scally pressured system. It would also address the concerns raised about potential for over-coding and over-diagnosis. 35 Further investigation and validation of a combined measure is required in a larger, multicentre cohort.

Conclusions
Our results support the use of NEWS ≥ 7 to identify patients with infection at increased risk of death at 30 days. The use of sepsis explicit ICD-10 codes to identify people with sepsis performed poorly in this cohort. Although complete alignment between routinely stored data and clinical diagnosis is considered unachievable, 37 a measure which is stable and performs well should continue to be pursued. Blood  Cohort selection ow diagram