Development of risk prediction models to predict urine culture growth for patients with 1 suspected urinary tract infection in the emergency department: protocol for an 2 electronic health record study from a single UK hospital 3

Background: 24 Urinary tract infection (UTI) is a leading cause of hospital admissions and is diagnosed 25 based on urinary symptoms and microbiological cultures. Due to lags in the availability of 26 culture results of up to 72 hours, and the limitations of routine diagnostics, many patients 27 with suspected UTI are started on antibiotic treatment unnecessarily. Predictive models 28 based on routinely collected clinical information may help clinicians to rule out a diagnosis of 29 bacterial UTI in low-risk patients shortly after hospital admission, providing additional 30 evidence to guide antibiotic treatment decisions. 31 Methods: 32 Using electronic hospital records from Queen Elizabeth Hospital Birmingham (QEHB) 33 collected between 2011 and 2017, we aim to develop a series of models that estimates the 34 probability of bacterial UTI at presentation in the emergency department (ED) among 35 individuals with suspected urinary tract infection syndromes. Predictions will be made during 36 ED attendance and at different time points after hospital admission to assess whether 37 predictive performance may be improved over time as more information becomes available 38 about patient status. All models will be externally validated for expected future performance 39 using QEHB data from 2018/19. 40

based on routinely collected clinical information may help clinicians to rule out a diagnosis of 29 bacterial UTI in low-risk patients shortly after hospital admission, providing additional 30 evidence to guide antibiotic treatment decisions. 31

Methods: 32
Using electronic hospital records from Queen Elizabeth Hospital Birmingham (QEHB) 33 collected between 2011 and 2017, we aim to develop a series of models that estimates the 34 probability of bacterial UTI at presentation in the emergency department (ED) among 35 individuals with suspected urinary tract infection syndromes. Predictions will be made during 36 ED attendance and at different time points after hospital admission to assess whether 37 predictive performance may be improved over time as more information becomes available 38 about patient status. All models will be externally validated for expected future performance 39 using QEHB data from 2018/19. 40 Background 48 UTI is a leading cause of hospital admissions [1], with a clinical spectrum that ranges from 49 urosepsis and pyelonephritis to mild urinary symptoms, each of which merits different 50 durations of antibiotic treatment or potentially no antibiotics at all [2,3]. The diagnosis of UTI 51 syndromes is based on a combination of symptoms and microbiological culture of urine 52 (bacteriuria) and/or blood (bacteraemia) [4]. Obtaining microbiological results introduces a 53 bottleneck for evidence-based diagnosis, since cultures often take 48-72 hours to grow. In the 54 meantime, patients are often treated with antibiotics. Previous studies have found that up to 55 50% of such antibiotic use is unnecessary [5][6][7]. A wide range of additional information is 56 collected as part of routine hospital care, which may provide an opportunity to reduce the 57 diagnostic uncertainty introduced by the delay in culture results. Stored within electronic health 58 records (EHR), these auxiliary data may help to create risk prediction models that can be used 59 to predict the likely culture result and identify patients who are highly unlikely to have bacterial 60

UTI. 61 62
We are aware of very few studies that have looked into using routine health data to predict the 63 bacteriuria in emergency department (ED) settings [8,9]. In a recent study, Taylor  In this study, we will expand on previously published work [8,9] and develop a model which 76 aims to judge the probability of bacterial UTI in UK patients who present with suspected UTI 77 in the ED. The models will be developed and tested using data on individuals presenting in 78 the ED at Queen Elizabeth Hospital Birmingham (QEHB). QEHB has EHR which are ideally 79 suited for this purpose, containing high-quality and detailed information on diagnoses, 80 outcomes, investigations, vital signs, drug treatments and diagnostic coding dating back to 81 2011 [15]. Using these hospital records, our model aims to predict the probability that urinary 82 pathogens will grow in urine and/or blood cultures collected during ED attendance. For 83 admitted patients, additional predictions will be made at specific intervals throughout the first 84 three days of their hospital stay to investigate whether additional information gathered during 85 their inpatient stay, but before availability of culture results, allows to predict culture growth 86 with increased certainty. Finally, we will explore differences in model performance and clinical 87 progression for important subpopulations including the elderly and patients with a recorded 88 alternative infective syndromes (e.g. pneumonia) at arrival or discharge, which do not require 89 antibiotics for UTI but may need them for the treatment of the other infection. To develop the predictive models, we will use data from all eligible patients who attended the 122 emergency department at QEHB between 1st November 2011 and 31st December 2017 123 (electronic recording of ED diagnosis at QEHB started after a system change at the end of 124 October 2011). 125

Validation dataset
We will use data collected at QEHB between 1st January 2018 and 31st March 2019 to 128 externally validate the model. Patients who were included in the development dataset due to 129 an earlier attendance will be excluded from the validation dataset. We will further seek 130 opportunities to undertake external validation in datasets from other hospitals such as 131 Heartlands Hospital (part of University Hospital Birmingham NHS Foundation Trust) and 132 University College London Hospital NHS Foundation Trust. All patients who attended the ED at QEHB within the study period and who had a urine sample 137 submitted for microbiological testing within 24 hours of arrival are eligible for inclusion in the 138 study. A window of 24 hours was chosen to account for discrepancies between when the 139 sample was collected and when the urine sample was recorded in the laboratory system 140 (particularly overnight). Patients enter the study at registration in the ED and exit the study on 141 the earliest of the following dates: date of discharge, date of death, date of transfer to a 142 different hospital, or date of urine culture results. 143 Individuals aged <18 years, pregnant women, patients who were not admitted via the ED, and 145 patients whose urine sample was submitted for culture but was not cultured due to standard 146 laboratory protocols at QEHB (see Outcome section for details) will be excluded from the 147 analysis. The principal outcome of interest is microbiological growth (≥104 colony-forming units / mL). 151 Only urine samples that were eventually cultured will be included in the analysis. procedures (UK Standards for Microbiology Investigations: SMI B41, Investigation of Urine; 154 SMI B37: investigation of blood cultures (for organisms other than Mycobacterium species) 155 [17]. The decision whether to culture a urine sample depends on cell count results performed 156 in the laboratory. Only urines with white blood cell counts and bacteria counts above a 157 threshold value were cultured. At the start of the study the threshold value for proceeding to 158 culture was white cell counts >40/µL or bacteria counts >4000/µL. This was adjusted to >80/µL 159 or bacteria counts >8000/µL following the introduction of a revised standard operating 160 procedure in the microbiology laboratory in October 2015. Performing cell counts is not 161 possible for urine samples less than 4mL or for samples too viscous to pass through the 162 instrument. Samples for which cell counts could not be performed are always cultured and 163 included in the analysis. Following standard procedure at QEHB, (heavy) mixed growth in the 164 urine sample will be considered as contamination, except where E. Coli was present. In 165 addition, samples will be classified as positive if there are <104 colony-forming units / mL but 166 the same urinary pathogen is identified from a blood culture, implying urosepsis. 167 168 Predictors 169 We will consider a wide range of candidate predictors relating to characteristics of the urine 170 sample, a patient's clinical presentation at the start of and throughout the hospital stay, and to 171 risk factors encoded in a patient's medical history (Table 1)

Feature engineering and selection 190
All continuous predictors will be winsorized at the 1st and 99th percentile to account for 191 outliers and normalised to lie within the range (0, 1]. Categorical predictors will be encoded 192 in a full-rank encoding, combining levels with a small number of cases (<5%). Predictors with 193 zero variance will be excluded before analysis. For highly correlated predictors (correlation 194 coefficient > 0.9 using Spearman's rank correlation), one predictor will be removed before analysis based on clinical judgement. Similarly, predictors which are found to be largely 196 missing and might thus not be expected to be present when the model will be used in 197 practice at QEHB will be removed from the analysis before fitting the models. 198 We will consider the use of fractional polynomials with up to four degrees of freedom (i.e. 2 199 fractional polynomial terms) for each numerical predictor [20,21]. Once the best fitting 200 fractional polynomials have been determined, we will consider models with all predictors and 201 parsimonious models selected via backwards feature elimination based on Wald statistics 202 and Rubin's rules [22]. 203

Baseline model in the ED 205
We will first develop a multivariable logistic regression model to predict bacterial growth in 206 the urine and/or blood sample at the end of ED attendance. A prediction will be made for 207 each patient based on the fitted value, which will serve as a baseline comparison for all 208 further models considered. 209

Landmarking models at distinct time points after hospital admission 210
In order to investigate whether additional measurements in those patients admitted to hospital 211 improve the predictive power of our risk prediction models in this subpopulation. We will 212 develop a set of landmarking logistic regression models [23] that predict the probability of 213 bacterial growth in the ED urine sample at pre-defined times t = {0, 12, 24, 36, 48, 60} hours 214 after the patient has left the emergency department and was admitted to the hospital ward. In 215 order to do so, we require a value for each included predictor at time t. Since predictors are 216 measured irregularly throughout the patient's hospital stay, we will first train a multivariate 217  Values at time t will be estimated using the best linear unbiased predictors from the empirical 220 Bayes posterior distribution of the random effects, conditional on past predictor measurements 221 [23]. The estimated predictor values will then be fed to a logistic regression model that predicts 222 the probability of microbiological growth in the ED sample after having observed the patient 223 for t hours. As a result, patients might have more than one prediction, one for each time t at 224 which they were still part of the at-risk population. Only patients still admitted and without a 225 culture result at time t will be considered at-risk and will be included in the fitting and evaluation 226 of the logistic regression model for time t. 227 228

Missing data 229
In EHR data, information is only recorded when events take place and we cannot distinguish 230 between cases in which a test or diagnosis wasn't made and cases in which they were made 231 but not recorded. Consequently, if variables such as co-morbidities, procedures, admission 232 records, test results and procedures are not recorded it is fair to assume that these events did we will include all predictors as well as the prediction outcome in the imputation procedure 242 and impute 5 datasets with 10 iterations per dataset (Table 2). Model training will be performed 243 on the imputed development dataset. However, we cannot use the same imputation procedure 244 to evaluate our models since we expect predictors to also be missing during model 245 deployment. When used in practice, our model must impute any missing data in real-time 246 before making a prediction, but at this point no outcome will be available yet to use in the 247 imputation. This will tend to result in suboptimal imputations when the model is used in practice [25]. To obtain an honest estimate of the performance of our models, we will evaluate them 249 on a second set of imputations that were fit without using the outcome in the imputation 250 procedure, emulating the situation in which the model will ultimately be used [26]. 251 252  differs between patient groups and increases for example with age. Whereas a urine sample 270 might be sent for culture in many different patients "just in case", a clinically usable model to 271 confirm or rule out suspected bacterial UTI needs to perform especially well in patients with urinary symptoms. In our main analysis, we will therefore validate our models in the subgroup 273 of patients with a suspected ED diagnosis of lower UTI or pyelonephritis, and our final model 274 will be chosen based on the performance in this group. This group differs from the training 275 population, which will include all patients irrespective of ED diagnosis to increase sample size 276 and provide our model with enough power to learn general relationships. In a secondary 277 analysis, we will also evaluate the performance of our models in patients without an ED 278 diagnosis of UTI as well as in different age groups, by sex and by outcome (i.e. discharge 279 diagnosis, death, admission to ICU, length of stay). Finally, we will also consider using only 280 data from patients with a suspected ED diagnosis of lower UTI or pyelonephritis for training to 281 ensure that a heterogeneous training population is not obscuring important relationships in 282 patients with suspected UTI. 283 284

Internal validation 285
Model discrimination in each scenario will be assessed via multiple performance metrics: 286 AUROC, Brier score, area under the precision-recall curve (AUPRC), sensitivity and 287 specificity. Sensitivity and specificity will be evaluated at their joint maximum as indicated by 288 the AUROC. We will assess how well predicted and observed probabilities correspond within 289 each predicted decile (model calibration) by creating a calibration plot and estimating the 290 calibration slope. An estimated slope > 1 indicates underfitting, whereas a slope < 1 291 indicates overfitting. 292 Evaluating the model only on the development dataset or a single validation dataset leads to 293 optimistic estimations of the true model performance (henceforth called the apparent 294 performance) [27]. To obtain a more reliable estimate of model performance, we will draw at 295 least 100 bootstrap samples of the development dataset. Where computation time allows for 296 it, we will consider up to 1,000 bootstrap samples. All preprocessing and analysis steps 297 including missing data imputation, estimation of fractional polynomials, feature selection, and 298 model evaluation will be carried out independently within each bootstrapped sample to avoid 299 any data leakage [28]. The result will be one final model per bootstrapped sample. 300 Evaluating each model on the bootstrap sample in which it was developed provides another 301 estimate of the apparent performance, this time within the bootstrap. To estimate the 302 magnitude of optimism in this bootstrapped apparent performance, we will simultaneously 303 evaluate the bootstrapped model in the original development dataset (called test 304 performance). The difference between test performance and bootstrapped apparent 305 performance will be an estimate of model optimism.

External validation 312
The performance of the model in a new dataset will be evaluated using EHRs from patients 313 with suspected UTI who were admitted to QEHB between 1st January 2018 and 31st March 314 2019. We will summarise average performance and calibration in this temporally independent 315 sample. 316

Discussion 317
The need to reduce inappropriate antibiotic prescribing in secondary care is widely 318 acknowledged, but progress is thwarted by the lack of rapid and reliable diagnostic tests for 319 bacterial infection. Risk prediction models using data contained within EHR offer a new 320 approach to improve antibiotic prescribing decisions, by integrating clinical and demographic 321 data with test results to stratify patients according to their likelihood of bacterial infection. However, diagnostic uncertainty represents a major obstacle in the application of risk 323 prediction models for bacterial infection. Clinical infection syndromes often overlap, and 324 diagnoses are often not confirmed by microbial culture. This makes it difficult to reliably 325 distinguish infection from non-infectious conditions, but also to discriminate between clinical 326 infection syndromes. 327 For these reasons, we have not attempted to develop a model which supports decision 328 around antibiotic initiation in the ED, recognising that few doctors will be willing to withhold 329 antibiotics if patients are unwell and the diagnosis is uncertain. Instead, we have opted for a 330 model that identifies patients who may benefit from early antibiotic cessation since they are 331 actually at low risk of bacterial UTI. Descriptive analyses of patients who have been 332 categorised by the model as low/high risk of bacterial UTI will identify categories of patients 333 who are most likely to be low risk, for example based on age, gender and UTI syndrome at 334 presentation. This will be used in conjunction with expert clinical opinion to define a "low-risk" 335 population of patients who have been treated with antibiotics for suspected UTI, but are 336 unlikely to benefit from antibiotic treatment. Individuals from this population sub-group will be 337 asked to participate in a proof of concept trial, and randomised to either stop antibiotics 338 early, or to continue antibiotic as per standard care. The trial will assess the safety and 339 feasibility of early antibiotic cessation in these patients, and lay the foundation for a future 340 multi-centre trial. It will also demonstrate the potential use of EHR datasets to guide 341 prescribing decisions.