All admitted patients with a laboratory diagnosis of SARS-CoV-2 during March 1 - April 24, 2020 (i.e. a high prevalence period) from a single west London hospital were identified. Patients were included if they were admitted to hospital and diagnosed with SARS-CoV-2 based on real-time reverse transcriptase polymerase chain reaction (RT-PCR, proprietary Public Health England Assay until 10 March 2020, then AusDiagnostics®, Australia, assay thereafter). No patients were excluded.
Inpatients had their symptoms and clinical course documented in their electronic healthcare record (EHR) by the admitting clinical team (Millennium: Cerner Corporation, Kansas City, Missouri). Demographic and clinical data were extracted retrospectively from the EHRs for all patients included in the analysis by the infectious diseases team. Patient outcomes were followed up until death or discharge.
Outcome was defined as death occurring during hospital admission for patients who were admitted with a laboratory confirmed diagnosis of SAR-CoV-2.
Predictors were chosen in concordance with previously published literature10,14,15, and included demographic details (age and sex), comorbidities (chronic respiratory disease, obesity, hypertension, diabetes, ischaemic heart disease, cardiac failure, chronic liver disease, chronic kidney disease, and history of a cerebrovascular event), symptomatology (fever, cough, dyspnoea, myalgia, abdominal pain, diarrhoea and vomiting, confusion, collapse, and olfactory change), and the number of days of symptoms prior to admission. Length of hospital stay to discharge, or death, was recorded for all patients to allow for survival analysis in the Cox regression model. Smoking history and ethnicity data were not included in the predictive models due to 28.9% and 23.4% of patients having missing data for these fields, respectively.
Age and number of days of symptoms prior to admission were continuous variables. All other predictors were encoded as binary presence features. Sex was converted to a binary feature where 0 and 1 represented male and female patients, respectively. Predictors were chosen such that they can be elicited on first contact with a healthcare worker. The intended use, for both models, is therefore an outcome prediction based on clinical admission data.
Patient baseline characteristics were described by mean and median for continuous variables and frequency and proportion for categorical variables. Log rank analysis was applied to the whole dataset to report unadjusted associations between each predictor and the outcome. Age was not normally distributed and was normalised by calculating its fractional ranks and then using an inverse density function. We then used an independent samples t-test to compare age by outcomes. Number of days of symptoms prior to hospital admission (NOD) were also not normally distributed and a Mann-Whitney U test was carried out to compare NOD between outcome groups. Multivariable Cox regression analysis was then applied to contextualise the predictors in relation to each other.
Cox regression predictive model
To create a predictive model for death in SARS-CoV-2, we randomly split the dataset into training (80%) and test (20%) sets. As others have demonstrated, the optimal proportion of the dataset partitioned for training depends on the full dataset size and classification accuracy, with higher accuracies and smaller dataset sizes requiring a larger majority of the data for training the model.16 However, a range of proportions for the training set were trialled during the training phase of model development for both the Cox regression and ANN models. The training/test set portions yielding the highest average area under the receiver operator curve (AUROC) during training cross-validation were used in the testing phase, and their results are reported in this analysis. On the training set, we used a parsimonious model building approach using the clinically relevant demographic, comorbidity and symptomatology features identified. All predictors were included in a Cox regression model irrespective of whether they were significant in univariable log-rank analysis. Using k-fold cross-validation on the training set, we chose the model with the lowest Akaike information criterion (AIC) score and highest concordance index (c-index).17 Subsequently, predictors which were not significantly associated with death were removed using backwards elimination. This generated a list of predictors making up a predictive model. We then assessed the performance of the model by calculating the survival function at the third quartile of length of stay for patients in the test set, as length of stay was not normally distributed. Since predicting mortality is a binary classification problem, a standard threshold of 0.5 (50%) was used to predict mortality. For example, if the model predicts a patient-specific mortality of 60%, this is interpreted as a “positive prediction”, in that the patient is likely to die. Accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were computed. Using k-fold cross-validation on the whole dataset allowed for a calculation of a mean c-index with 95% confidence intervals (CI). Model calibration was assessed graphically using a calibration curve and numerically with a Brier score, which represents the mean squared error for a probabilistic forecast, with a lower score representing more calibrated predictions.18
Artificial Neural Network predictive model
The dataset was again randomly split into training (80%) and test (20%) sets. To maximise network learning efficiency, feature-wise normalisation was used. Each feature in the input data was centred around 0 by subtracting the mean of the feature, and then dividing it by its standard deviation.19 The open-source TensorFlow machine learning library20 was used to construct the ANN. To optimise the model, we adjusted hyperparameters (the number and size of layers, batch-size, dropout, and regularisation) using k-fold cross validation on the training set. The ANN was designed to achieve maximal performance on cross-validation. Once the model architecture was established, we retrained the ANN on the entire training set, before finally validating its performance on the test set. We calculated the same performance metrics and assessed calibration in the same manner as the Cox regression model. The performance profiles of the models were then compared, and an efficient implementation for Delong’s algorithm (which is an algorithm used to compare the area under two or more correlated receiver operator curves) was used to compare the AUROC between both models.21,22 Figure 1 illustrates a summary of the model development and assessment methodology.