Using data from a large and recent prospective cohort of COVID-19 patients, a new prediction tool using a Bayesian network was built to predict the need for mechanical ventilation, transfer to an intensive care unit or death within 21 days of hospital admission.
A total of 32 clinical decision tools to predict the clinical course of patients admitted to hospital with COVID-19 have been proposed. Among these, 23 aimed to predict the risk of mortality and 8 were designed to predict progression to a more serious or critical illness However, as noted by Wynants et al. [10], the proposed clinical prediction tools are poorly presented and are at high risk of selection bias. Sufficient information is not provided to replicate these studies concerning the selection of patients and data, and the methodology used. Few studies report missing data, and, even when this was the case, the authors do not explain how these values were replaced, which is crucial for the construction of clinical prediction tools. Internal validation is not always carried out or clearly presented, and without a careful internal validation an overfitting of the model to the data is to be expected explaining that their reported performance is probably optimistic [10]. Contrary to previously published models, the methodology used to develop the prediction model proposed herein follows the published recommendations concerning information on source data, the presentation of inclusion and exclusion criteria for the prospectively included population, the explanation of the judgment criterion, the management of missing data, the explanation of the model used, the methodology used for internal validation, and the model performance measures and their interpretations [16].
The internal validity tests confirmed that the performance of the PREDICT-COVID prediction tool described herein was satisfactory, and with only 11 variables that are commonly used, the prediction tool is easily usable in clinical practice. The performance of the logistic regression model, which is the most commonly used method in medicine to calculate the risk of an event according to exposure, was similar to that of the Bayesian model. However, Bayesian models have many advantages over logistic regression models; for instance, as they do not use any a priori hypothesis, explanatory variables can be co-linear (e.g. white blood cell count and neutrophil to lymphocyte ratio), and as they are based on conditional probabilities, the associations between variables are taken into account even if they are not linear. In addition, Bayesian networks are applicable in case of missing data that are frequent in clinical practice, which could be considered the most important advantage as in absence of a single variable a logistic regression model will not be able to calculate a prediction score for an individual. This explains, in part, why Bayesian models are increasingly used to develop risk prediction tools [17].
A selection bias is highly likely in the proposed clinical prediction tools. For example, Chinese clinical prediction tools were constructed using data from hospitalised populations that are much younger (less than 50 years of age) than those hospitalised in Europe or the USA, and whose mortality is much lower (less than 5%). Other clinical prediction tools were built using learning dataset of less than 100 patients. Furthermore, some clinical prediction tools have been performed on particular populations and use non-routinely performed laboratory or radiological data (e.g. IL-6 [18], coronary calcifications [19]) when a patient is admitted to hospital for COVID-19. It is likely, that prescription of specific laboratory or radiological exams by a trained physician is more predictive than the results of the specific exams per se. A very high mortality rate in these studies confirms this hypothesis.
The clinical characteristics and clinical course of COVID-19 patients hospitalised in Lyon were similar to those reported in California [6], New York City [9], and Italy [5, 7, 8], which supports the generalisability of the proposed prediction tool. As reported in the analysis published byWynants et al [10], the most frequently reported diagnostic and prognostic predictors of covid-19 are age, body temperature, lymphocyte count and lung imaging characteristics. Flulike symptoms and neutrophil counts are frequently predictive in diagnostic models, while co-morbidities, sex, C-reactive protein and creatinine are frequent prognostic factors. In agreement, 7 of the 11 selected prognostic variables included in the model proposed herein are included among those cited in the other proposed models. Although increasing age is reported as a risk for poor outcomes [8, 9, 20–23], herein patients with an unfavourable outcome were slightly but significantly older, but this was not retained in the prediction model presented herein. Similarly, elevated BMI has also been reported to be a risk factor [22, 23], but this was not the case herein. This suggests that age and BMI may explain a higher rate of hospitalisation in COVID-19 patients rather than an unfavourable outcome. Another interesting point is that among the 11 variables retained herein, the majority were laboratory parameters, and these represented 6 of the 7 most predictive variables. Among these biological parameters, only aspartate transaminase and prothrombin time are not among the parameters most frequently retained by other predictive tools.
The present study has some limitations. The endpoint was independent of the co-morbidities of the patient infected with SARS-Cov-2. This method was chosen because there is no consensus to formally attribute death to SARS-Cov-2 and comorbidities certainly influence the patient’s course. Furthermore, as recommended by Piccininni et al [8]. total mortality captures indirect deaths, such as those related to a healthcare system under crisis, yielding a more complete picture of the pandemic’s consequences. The total number of explanatory variables was limited due to the increased mathematical combinatory; and was related to the number of subjects included in the learning dataset. After a classification based on variance reduction, somewhat arbitrarily, only the 11 most relevant variables were selected. Thus, using 11 variables, the prediction model offers a good compromise between performance and ergonomics. As Lindsell et al. [24] point out, our model, which can be used as both a prognostic and predictive model, does not guide the health care team on the effectiveness of treatment, which is more informative than a simple prognosis. The main limitation of the present study is that, although carefully developed from especially collected data and controlled by internal monitoring, the results have to be externally validated using similar data sources and prove its effectiveness in a pragmatic trial.