In this study, we created a clinical score and ML models that accurately predicted the likelihood of severe disease courses (defined as requiring ICU admission at any stage during the disease and/or death of any cause during the study period) for SARS-CoV-2 positive patients at the largest hospital group in Switzerland during the country’s ‘first wave’ of the COVID-19 pandemic. An external validation using the larger “second wave” patient cohort also confirmed the prognostic value of the score and models, and thus the generalizability.
The most predictive risk factors were male sex, low hemoglobin (< 100 g/L), elevation of inflammatory parameters (CRP > 25 mg/L, leucocyte counts > 10 G/L), hyperglycemia (> 10 mmol/L), and impaired renal function (eGFR < 75 mL/min, sodium > 144 mmol/L). Since most of those parameters are readily available/commonly measured at presentation, the score can provide early-stage guidance regarding patient triage, thus contributing to the improvement of outcomes by enabling timely and targeted use of health care resources for patients at risk for severe clinical courses.
Regarding the individual laboratory score components, an increase in inflammatory parameters as a predictor of severe disease is mechanistically plausible and well documented. [19–22] Similar to other infections, loss of glycemic control has been reported in COVID-19 patients (with elevated blood glucose increasing the risk of SARS-CoV-2 infection), and also that well controlled diabetes mellitus correlates with favorable outcomes. [23, 24] A systematic review and meta-analysis further corroborated these findings. [25] There is ample evidence that end-stage kidney disease and renal impairment (as reflected by eGFR in our analysis) are prognostic of more severe disease, with case fatality rates on ICU of up to 50%. [26, 27] Similarly, electrolyte disorders like hypernatremia have been linked to increased COVID-19 related mortality, possibly in relation to increased respiratory rate or dehydration from increased body temperature. [28, 29] Finally, a systematic review and meta-analysis recently discussed the role of anemia and changes in iron metabolism, reflected by the low-normal cut-off for hemoglobin in our analysis, in the pathophysiology and disease course of COVID-19. [30]
One of the hallmarks of COVID-19 is its disproportionally high mortality in the elderly, possibly due to multimorbidity [31, 32]. More detailed surveys report an increased likelihood of death following development of symptoms in the age groups < 30y and > 65y [33]. It has been speculated that younger patients with severe courses experience hyperinflammatory syndromes (often referred to as ‘cytokine storms’), an IL-6 driven over-reaction of the immune system to pathogens resulting in multi-organ failure that is associated with a high mortality [34]. Several studies identified high age and obesity, which are often connected to a reduced state of health, as risk factor for severe COVID-19 [35, 36]. Therefore, these parameters are not independent risk factors in our cohort despite statistically significant differences between the means, arguably because of their correlation with multimorbidity. To avoid overfitting due to intercorrelated parameters, we rejected age and obesity (as BMI) in favor of other correlated features that also explain additional cases.
There is also a large body of evidence concerning underlying diseases as risk factors associated with critical disease and overall COVID-19 mortality [37–39]. We screened the EHRs for presence of cardiovascular comorbidities (incl. arterial hypertension and stroke events), obesity, chronic pulmonary conditions, kidney disease, cancer, diabetes mellitus, and smoking status. Of those, only cardiovascular disease (overall, hypertension, coronary heart disease, and cardiomyopathy or congestive heart failure), and type II diabetes had a prognostic value. However, none of those parameters were more predictive than male sex and laboratory values taken around the time of first RT-PCR. Hyperglycemia, anemia, and impaired renal function are signs or risk factors for deterioration and poor prognosis of these comorbidities in their own right, and may indicate that poorly controlled underlying diseases are more detrimental than the diseases themselves.
In the light of the worldwide spread of SARS-CoV-2 leading to high rate of fatalities and shortage of hospital beds in many countries, several attempts have been undertaken to create predictive scores and models for the early identification of patients at high risk. In a regularly updated systematic review by Wynants et al. [40], 50 prognostic models were identified, including 23 for mortality and 8 for progression to severe disease. Frequently reported prognostic variables were sex, comorbidities, CRP and creatinine. All models reported moderate to excellent predictive performance, but were judged as being at high risk of bias (e.g. due to exclusion of participants still in follow-up who didn’t develop the outcome at the end of the study, and use of the last available measurement instead of one at the time of intended use of the model), and none of them is currently recommended for use in clinical practice [40]. Recommendations for future investigations in this field include adequate inclusion/censoring and description of the study population, specification of the time horizon of the prediction, and structured reporting based on the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines [41] to enable independent validation, which we aimed to follow closely with this study.
Limitations of the current study include the small sample size, and the exclusion of patients who rejected the IHG general research consent and those with no laboratory data available. The excluded 379 individuals (65.7% of the total of 577 patients who did not reject the general research consent and were tested positive) correspond mostly to patients seen in the ambulatory COVID-19 testing facility, where the general public have access to on-demand testing. Furthermore, more of the included patients were hospitalized, while comorbidities and risk factors might differ among individuals who could be treated as out-patients. There was no specific time-to-event analysis, particularly since the data generated in the first months of the pandemic was very heterogeneous, and included external direct transfers to ICU. The score and models therefore only speak to the probability of incurring a severe outcome, not when this outcome will occur. Another limitation concerns the censoring of outcomes, given there was no explicit follow-up. While it is conceivable that out-patients in particular could have deteriorated after discharge and presented elsewhere, the large catchment area of the IHG should mitigate this effect. Additionally, the case-fatality ratios of 12.6%, and 7.2% in the training group (‘first wave’) and the prospective validation group (‘second wave’), respectively, are high compared to the Swiss national average (1.1%, https://covid19.bag.admin.ch). This hints at good coverage of outcomes in these retrospective analyses along with presence of an admission bias. We therefore suggest using the tools proposed here for projection of outcomes as discussed.