Practical Clinical and Radiological Models to Diagnose COVID-19 on Chest CT in a French Multicentric Emergency Population


 Our aim was to develop practical models built with simple clinical-radiological features to facilitate COVID-19 diagnosis. To do so, 513 consecutive adult patients suspected of having COVID-19 from 15 emergency departments from 03/13/2020 to 04/14/2020 were included (244 [47.6%] with a positive RT-PCR). Chest CTs were immediately and prospectively analysed by on-call teleradiologists (OCTR) and systematically reviewed within one week by another senior teleradiologist. Each OCTR reading was concluded using a 5-point scale: normal, non-infectious, infectious non-COVID-19, indeterminate and highly suspicious of COVID-19. The senior reading reported the lesions’ semiology, distribution, extent and differential diagnoses. Multivariate stepwise logistic regression (Step-LR) and classification tree (CART) models to predict a positive RT-PCR were trained on 412 patients, validated on an independent cohort of 101 patients and compared with the OCTR performances (295 and 71 with available clinical data, respectively). Regarding models elaborated on radiological variables alone, best performances were reached with the CART model (i.e., AUC=0.92 versus 0.88 for OCTR) while step-LR provided the highest AUC with clinical-radiological variables (0.93 versus 0.86 for OCTR). Hence, these two simple models, depending on the availability of clinical data, could be used by any radiologist to support their conclusion in case of COVID-19 suspicion.


Introduction
Coronavirus disease 2019 (COVID-19) is a viral disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It was identi ed in Wuhan, China, in December 2019 and has rapidly spread worldwide 1 [3][4][5][6] . However, many hospitals do not have access to molecular testing, and the delay to get the results can be days long because of the lack of materials and technicians. Conversely, chest CT is routinely performed at most institutions and can provide a result, or at least a diagnostic orientation, in less than an hour. Previous studies have highlighted typical COVID-19 patterns consisting of peripheral, multifocal ground-glass opacities (GGO), with a sensitivity of 60-98% 3,5,7,8 . Since March 2020, the French Society of Radiology (SFR) has published several educational webinars and a standardized report including a four-point scale to categorize the risk of COVID-19 on chest CT, namely: highly suspicious, compatible/indeterminate, not suspicious and normal.
In parallel, machine-and deep-learning models have been issued to facilitate and even automate the diagnosis of COVID-19 on chest CT, but their implementation in practice requires either time or mathematical and computer sciences skills or graphics processing units [8][9][10][11][12][13] ). Alternatively, Qin et al. have proposed a simple scoring system based on biological, clinical and radiological data with high performance (area under the ROC curve [AUC] = 0.91 in the independent validation cohort), but the added value to classical radiological assessment was not detailed 14 .
Furthermore, some information required to run the model, notably a history of exposure or the leukocyte count, may not be systematically known by radiologists.
IMADIS Teleradiology is a French company dedicated to remote interpretation of emergency CT and MRI examinations. As of March 2020, IMADIS Teleradiology had partnerships with the emergency and radiological departments of 69 hospitals covering all French regions. During the coronavirus crisis, IMADIS has been widely involved in the diagnosis of COVID patients by remote interpretations of chest CT scans from partner centres. A standardized and dedicated work ow for patients suspected of having COVID-19 and COVID-19-dedicated webinars were thus speci cally developed.
Based on our experience, the rst aim of our study was to elaborate and validate practical and simple models that could be used by any radiologist in any institution, without any mathematical background, based on easily available clinical and radiological features. The second aim was to correlate the results of our models with those of the OCTR's standardized conclusions.

Study design
This prospective-retrospective observational multicentric study was approved by the French Ethics Committee for the Research in Medical Imaging (CERIM) review board (IRB CRM-2007-107) according to good clinical practices and applicable laws and regulations. The written informed consent was waived due to the nature of the analysed data, which were anonymized healthcare data. All methods were performed in accordance with the relevant guidelines and regulations. The need for written informed consent was waived because of its retrospective nature.
IMADIS Teleradiology is a French company dedicated to remote interpretation of emergency CT and MRI examinations. As of March 2020, IMADIS Teleradiology had partnerships with the emergency and radiological departments of 69 hospitals covering all French regions. The panel of IMADIS teleradiologists consisted of 109 senior radiologists with at least 5 years of emergency imaging experience (mean length of practice, 7 years) and 55 junior radiologists (i.e., residents) with 3-5 years of emergency imaging experience (mean length of practice, 4 years). Teleradiologists were on-call in groups of at least two teleradiologists per night in each of the two interpretation centres (Bordeaux and Lyon, France). All radiological reports involving COVID-19 made by junior teleradiologists were collegially validated.
Our study included all consecutive adult patients from 03/13/2020 to 04/14/2020 from 15/69 (21.7%) partner hospitals that regularly provided the RT-PCR results to IMADIS, as these patients ful l the following inclusion criteria: need for chest CT due to suspicion of COVID-19 according to a board-certi ed emergency physician, available chest CT, and available RT-PCR status ( Figure 1, Supplemental Data 1).

Teleradiological Interpretation Protocol
The IMADIS teleradiology interpretation protocol met the French recommendations for teleradiology practice 15 .
Reports and requests with clinical data for COVID-19 Chest CT image interpretation were received from the client hospitals at the IMADIS Teleradiology centres by using teleradiology software (ITIS; Deeplink Medical, Lyon, France).
The images were securely transferred over a virtual private network to a local picture archiving and communication system for interpretation (Carestream Health 12, Rochester, NY). Images were immediately interpreted by OCTRs.
CT examinations were systematically reviewed within one week following each on-call session by another senior teleradiologist (n = 15; mean length of practice, 12.1 years; mean number of reviews, 34 CTs) who was not involved during the on-call duty period, blinded to the RT-PCR result and the rst reader's report. All senior radiologists had a 2hour-long e-learning session on CT-Chest ndings in COVID-19, which became publicly available on April 7 (Webbased e-learning, developed by IMADIS Radiologists, Deeplink Medical, Lyon, France and RiseUp, Paris, France: https://covid19-formation.riseup.ai/).
The RT-PCR results from throat swab samples contemporary of the emergency room visit were retrospectively collected from the patients' electronic medical records from each partner hospital.

Radiological Data
At the end of the report, the OCTR had to propose a conclusion adapted from the SFR classi cation, as follows: (j) adenomegaly (de ned as lymph node with short axis > 10 mm); (k) bronchial wall thickening (further categorized as lobar/segmental or diffuse); (l) airway secretions; (m) tree-in-bud micronodules, and (n) pulmonary embolism.
Images for each radiological feature can be found in Supplemental Data 2.

Statistical Analysis
Statistical analyses were performed using R (version 3.5.3, R Foundation for Statistical Computing, Vienna, Austria).
A p-value of less than 0.05 was deemed signi cant.
Univariate associations between clinical-radiological categorical variables and RT-PCR status were evaluated with Pearson Χ 2 or Fisher exact tests, except for age, which was compared between the two groups with Student's t-test.
Classi cation and regression models are negatively affected by high correlations between explanatory variables; hence, correlations between variables were evaluated with Spearman's test. For each signi cantly correlated pair of dummy variables extracted from the same initial multilevel categorical variable, the variable with the lowest p-value at univariate analysis was selected for the multivariable modelling.
Next, the study population was randomly partitioned into a training cohort (n = 412/513, ≈ 80%) and a validation cohort (n = 101/513, ≈ 20%), with a same prevalence of RT-PCR positivity. We focused on two simple classi ers that do not require any computing interface to extract the probability for a positive RT-PCR, namely: classi cation and regression tree (CART, "rpart" package) and stepwise backward-forward binary logistic regression (Step-LRminimizing the Akaike information criterion, "MASS" package). The models were built on the training cohort based on (i) either all dichotomized radiological variables or (ii) all dichotomized clinical + radiological variables -with a pvalue < 0.05 at univariable analysis. The CART algorithm has a hyperparameter (i.e., a parameter that is set before the model building, while classical parameters are derived during the model building), named 'complexity', which controls the size of the tree and was selected following a cross-validation step in the training cohort as minimizing the classi cation error rate. Next, the tree was pruned following this optimal complexity hyperparameter. The minimal number of observations in the terminal node and the splitting criteria were set to 3 and the Gini index, respectively.
Models were evaluated and compared between themselves and the prospective conclusions made by the OCTRs on the validation cohort, according to AUC. Accuracy (number of correctly classi ed patients divided by the total number of patients), sensitivity, speci city, negative predictive value (NPV) and positive predictive value (PPV) were estimated after dichotomizing predicted probabilities per a cut-off of 0.5. All results were given with a 95% con dence interval (95%CI). AUCs were compared using the pairwise Delong test ('pROC' package).
Finally, we applied a decision curve analysis (DCA) to assess the clinical usefulness of the nal models in the validation cohort. DCA consists of plotting the net bene t of applying the model for clinically reasonable risk thresholds compared with two alternative strategies: (i) to treat all patients as affected by COVID-19 or (ii) to treat none of the patients 16 . Herein, the net bene t of our models refers to the correct identi cation of patients with a positive or a negative RT-PCR, and the risk threshold can be seen as the harm-to-bene t ratio or the risk at which patients are indifferent about COVID-19 17 . Hence, a low risk threshold would correspond to patients who are particularly worried about the disease 18 . Table 1 summarizes the descriptive features of the study population. Overall, 513 patients were included, with a median age of 68.4 years (range: 18-100) and 241/513 women (47%). The prevalence of RT-PCR positivity was 244/513 (47.6%).

Multivariate models
The correlation matrix of the relevant dichotomized variables is shown in Figure 2. Analysis of the correlations between similar explanatory variables enabled the selection of 'presence of GGO' (over 'rounded GGO' and 'nonrounded GGO'), 'peripheral predominant location' (over 'central predominant location' and 'mixed predominant location'), ' brotic band consolidation' (over 'non-rounded consolidation' and 'presence of consolidation'), 'moderate to severe extension' (over 'low to severe extension' and 'severe extension'), and 'bronchial wall thickening' (over 'diffuse bronchial wall thickening' and 'focal/segmental bronchial wall thickening'). Thus, the total numbers of variables ultimately entered in the multivariate radiological and clinical-radiological models were set to 13 and 19 dichotomized variables, respectively. Figure 3 shows the nal decision trees relying on radiological and clinical-radiological variables. Table 3 shows the tables enabling the calculation of the probability for RT-PCR+ according to the nal Step-LR models relying on radiological and clinical-radiological variables.

Performances of the models and OCTRs' conclusions
To evaluate the performances of the models, trained models were tested on the external validation cohort. Table 4 shows In the subgroup of patients with an indeterminate conclusion (4), the probabilities for RT-PCR+ with the clinicalradiological Step-LR model were signi cantly higher for patients with RT-PCR+ than for patients with RT-PCR-(0.63 ± 0.28 versus 0.39 ± 0.27, respectively, p = 0.004). Figure 5 illustrates the potential application of this model for patients with an indeterminate/compatible conclusion. An excel macro is provided in Supplementary Data 4 so that the interested reader can test the Step-LR models.
Regarding the outliers of the model with the highest AUC (i.e., clinical-radiological step-LR), 3 out of the 32 patients (9.4%) with a negative RT-PCR in the validation cohort were classi ed positively by the model. Of these 3 patients, 2 were highly suspicious of COVID-19 and 1 was indeterminate according to the OCTRs. Conversely, 7 out of the 39 patients (17.9%) with a positive RT-PCR in the validation cohort were classi ed negatively by the model. Of these 7 patients, 4 had conclusions of (1), (2)

or (3) according to the OCTRs.
Clinical usefulness of the nal models in the validation cohort (Fig. 6) The DCAs showed that the OCTRs' conclusion and the nal models added more bene t than the 'treat all approach' above a risk threshold of approximately 0.05. The two nal models added more net bene t than the 'treat all' strategy and the OCTRs' conclusion for threshold probability above 0.43.

Discussion
In this study, we developed practical and ready-to-use models to predict the RT-PCR status from categorical clinical and radiological variables that are routinely available or assessable by radiologists without either expertise in thoracic imaging or computer science or additional blood samples. We purposely elaborated parsimonious models through a cautious variable selection to facilitate their applications in practice. Our best models displayed high AUCs and accuracy on the external validation cohort and could be helpful to radiologists, notably to weight indeterminate conclusions 3 .
Our series represent a real-life multi-centric emergency population during the COVID-19 pandemic with well-balanced proportions of negative and positive RT-PCR results, making it appropriate for the development of predictive models. The clinical and radiological variables correlating with the RT-PCR status in the univariate analysis were consistent with the literature, namely, fever, asthenia, oxygen saturation, GGO, non-rounded consolidation, brotic bands, and intralobular reticulations with bilateral, diffuse, basal predominant and peripheral distributions [19][20][21] .
The aim of our models was to provide rapid assistance to non-specialized radiologists who may need support to conclude or modulate his report con dently without delaying patient management. We purposely chose classi cation algorithms that are easily explainable and do not require additional computing time, i.e., a binary logistic regression and a classi cation tree. These two algorithms are often used for benchmarking purposes in machine learning before using more complex models. In preliminary exploratory analyses, we actually tested other algorithms such as random forests or categorical boosted trees, which indeed showed slightly higher AUCs but could not be visually represented and explained to radiologists. Interestingly, similar performances of our models in the training and validation cohorts highlight the lack of over tting and their good generalizability to new patients.
Improving COVID-19 diagnosis with deep learning frameworks has already been attempted with good results, but without remarkable added value compared with the diagnostic performances of the OCTRs or our models as calculated in the independent validation set. Indeed, the AUCs mostly ranged between 0.90 -0.95 9,[11][12][13] . Other studies proposed practical scores such as the PSC-19, which relies on four variables (history of exposure, leukocyte count, peripheral lesion and crazy paving patterns) 12 . However, this score requires a blood sample and could be of limited use when investigating a patient in a new COVID-19 cluster of patients without proven exposure. Chen et al. also combined an explainable machine learning algorithm (i.e., penalized logistic regression) and bio-clinicalradiological variables 10 . They built three models: bio-clinical alone, radiological alone and bio-clinical-radiological models. Surprisingly, the results of the nal models in the validation cohort showed the highest performances when radiological variables were not taken into account (AUCs = 0.97, 0.81 and 0.94, respectively). Ridge and/or LASSO penalized logistic regressions have also been investigated in our preliminary data exploration but did not show added value to Step-LR. While deep learning studies trained their models in large cohorts of hundreds of patients, it should be noted that these two practical studies did not exceed one hundred patients, questioning their validity. In addition, the performances of standardized radiological conclusions were missing in all these studies involving arti cial intelligence.
Our results also highlight the very good performances of radiologists in daily practice, which can be explained by the considerable increase in knowledge of the COVID-19 radiological presentations since the rst peer-reviewed studies in January-February 2020, attained through open-source educational publications, webinars and recommendations issued by national and international radiological societies, which were immediately relayed to the teleradiologists working at IMADIS. Additionally, OCTRs could ask for their colleagues' advice when facing a complicated case, as they were never alone in one of our two emergency interpretation centres during on-call duty. Therefore, though our predictive models showed comparable performances with other machine-and deep-learning models, and slightly higher performances than OCTRs' conclusions, the gain was not striking, leading to no signi cant difference according to the Delong test.
To illustrate eventual clinical applications of the nal models, we used DCAs, which are a popular alternative to costeffectiveness studies 16 . Regarding the two settings (radiological data alone and clinical-radiological data), we found similar shapes of the DCAs for the OCTRs' conclusion and the nal models. Interestingly, at worst, the net bene ts of the models were equivalent to those of OCTRs (for the intermediate risk threshold) and to the 'treat all' strategy (for the very low risk threshold). However, the machine learning models would be complementary to the OCTRs' conclusion and would improve the net bene t for patients from intermediate-to high-risk thresholds.
The measurements of performance in our models should also be considered with the disease prevalence during our period of inclusion (≈ 48%) and the nature of the study population 22 . Indeed, although PPV and NPV are useful to rapidly sort patients with COVID-19 suspicion, they depend on this prevalence, which uctuates depending on the country and public health measures.
Our study has certain limitations. First, the RT-PCR is an imperfect gold standard, which may explain why no study has ever reached perfect performance. Indeed, the analysis of the outliers of the best model in the validation cohort with fully available clinical-radiological variables (n = 10, 3 false positives and 7 false negatives) demonstrated that 6 of these patients either could have received their chest CT before the disease manifestation or were false negative by RT-PCR. This nding stresses the risk of false negatives with chest CT due to the delay between the beginning of clinical symptoms and the appearance of the COVID-19 semiology on chest CT. Second, approximately 25-33% of clinical data were missing because they were collected in real-life emergency situations. Moreover, some radiological features were not evaluated because they were published after the beginning of the COVID-19 IMADIS work ow (for instance: multifocality, thickened vessels). Finally, although the OCTRs' conclusions were prospectively collected, the radiological features that were necessary to elaborate the predictive models were not collected in emergency situations, which could have led to the overestimation of the associations of these features with the RT-PCR status.
To conclude, we presented the largest French multicentric emergency cohort including prospective standardized reports following national recommendations. Our ndings illustrate the high diagnostic performances of the OCTRs working in a teleradiological structure entirely dedicated to emergency imaging, which promoted continuous training. This setting enables us to propose free and practical models built on easily available data, which can be used by general radiologists to con dently conclude their reports in daily practice.  Regarding variables with more than two levels, the p value in italics corresponds to the p-value considering all its levels. The p-values below are based on the Fisher or X 2 test for this variable that was dichotomized according to the level of the line. Abbreviations: GGO: ground-glass opacities the clinical-radiological model, the patient showed fever, myalgia, GGO, brotic bands, a GGO predominant pattern, a peripheral predominant distribution for abnormal acute ndings and intralobular reticulations but no bronchial wall thickening § § . The probability for RT-PCR+ are calculated as follows: P(RT-PCR+) = *: P < .05, **: P < .005, ***: P < .001. Signi cant results are highlighted in bold.