Development and Validation of a predictive model based on Radiomics to predict the short-term outcomes of patients with COVID-19

Juanjuan Xu Wuhan Union Hospital Mei Zhou Wuhan Union Hospital Zhilei Lv Wuhan Union Hospital Zhihui Wang Wuhan Union Hospital Tingting Liao Wuhan Union Hospital Yanliing Ma Wuhan Union Hospital Guorong Hu Wuhan Union Hospital Sufei Wang Wuhan Union Hospital Jin Gu Wuhan Union Hospital Zhengrong Yin Wuhan Union Hospital Yang Jin (  whuhjy@126.com ) Wuhan Union Hospital https://orcid.org/0000-0003-2409-7073

biochemistry (including renal and hepatic function). CT ndings included computer-aided objective radiomic features and direct ndings to be interpreted by doctors (lung involvement ratio, uni-/bilateral pneumonia, central/peripheral lesion location, ground-glass opacity, patchy exudation, consolidation, white lung, pleural effusion). The extraction of radiomic features and de nition of direct interpreted CT ndings were detailed in the ensuing section.

Pathogen identi cation
The virus of COVID-19 was tested by real-time RT-PCR through using speci c primers and probes. RNA was extracted from patients' samples including nasopharyngeal swabs or sputum. Patients were de ned as infected with COVID-19 when the RT-PCR results were positive for two targets (open reading frame 1a or 1b, nucleocapsid protein) [13].

Evaluation of hematologic indicators
Basic laboratory tests included blood routine test, CRP, PCT and serum biochemical tests (including renal and liver function). To characterize the effect of 2019-nCoV on patient immune system, the frequency of immune cells, including CD3 + T lymphocytes, CD8 + T lymphocytes, CD4 + T lymphocytes, B lymphocytes and natural killer (NK) cells, were examined. Immunity-associated factors, including Interleukin-2 (IL-2), IL-4, IL-6, IL-10, interferon γ (IFN-γ), tumor necrosis factor α (TNF-α), serum immunoglobulins and complement 3, 4 (C3, C4) were also detected. CT data acquisition The CT scans were performed at admission by using a number of multislice detector CT scanners, 1212LightSpeed VCT (General Electric Medical Systems, USA), Somatom Sensation (Siemens Heathcare), Somatom De nition (Siemens Heathcare), and Somatom De nition AS+ (Siemens Heathcare). Standard departmental protocols were used with volumetric datasets acquired with or without contrast as indicated clinically 14 . All images were reconstructed into axial images at a 1.5/2-mm slice thickness at 1.5/2-mm intervals using lung and soft tissue algorithms. Direct interpreted ndings of chest CT Direct CT ndings were performed by two experienced thoracic radiologists blinded to the clinical data. The disagreement was resolved by comparing notes and reaching a consensus. Direct interpreted features included the lesion-occupying ratio of the whole lung eld (lung involvement ratio), lesion distribution (uni-/bilateral pneumonia, central/peripheral lesion location), lesion density (consolidation, patchy exudation, ground-glass opacity), pleural effusion and lymph node enlargement. The peripheral lesion location was de ned as predominant distribution of lesions in the subpleural position, and the otherwise was de ned as central location. Lymph node enlargement was de ned as that the maximum short diameter of lymph node exceeded 1 cm [14].

Extraction of radiomic features
Radiomic features were extracted by mathematical calculations on image-based data matrices according to the formulas for each feature. Two basic elements of feature extraction are matrices and formulas (formulas were detailed in [https://pyradiomics.readthedocs.io/en/latest/features]). We extracted six aspects of the features, including shape, histogram and four high-order matrices transformed from the pixel matrix, namely, the gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRLM), gray level size zone matrix (GLSZM) and gray level dependence matrix (GLDM). Each radiomic feature was calculated on the corresponding matrix according to each speci c formula. Shape and histogram features were based on a pixel matrix. The last four categories were calculated from the four corresponding high-order matrices. Features of these six categories matrices are listed in e- Table 1. A total of 4327 features were extracted under four pixel size parameters. Radiomic signature building In the training cohort, we adopted the least absolute shrinkage and selection operator (LASSO) method for feature selection to identify the relevant features.
Radiomic scores (Rad-scores) were calculated for each patient by a linear combination of the extracted features with their respective coe cients for the prediction model [15].

Nomogram
The nomogram was used to visually score the patients' various parameters, and then to compute the probability of the event based on the patients' total score. To construct a highly accurate prediction model, we combined variables ltered through aforementioned two models to build nomograms [11]. The Receiver Operating Characteristic Curve (ROC) was generated by using a validation set dataset to validate the distinguishing power of the nomogram. And the calibration curve by plotting the observed probability against the predicted probability of poor outcomes was used to evaluate the calibration of the nomogram [16].

Statistical analysis
Categorical variables were presented as frequency rates and percentages, and continuous variables were expressed as mean (standard deviation [SD]) if they were normally distributed or median (interquartile range [IQR]) if they were not. Proportions for categorical variables were compared using the χ 2 test or Fisher's exact test. Means for continuous variables were compared using independent group t test when the data were normally distributed. Otherwise, the Wilcoxon rank-sum test was used. Some indicators were converted to binary variables according to the optimal cutoff values by employing receiver operating characteristic (ROC) analyses. Variables with P < 0.05 were regarded as potential risk factors and were included in multivariable Logistic regression analysis using the stepwise selection procedure with default setting. ROC curves were drawn to evaluate the distinguishing power of the constructed models and the difference between AUCs was compared using Delong's test. All statistical analyses were performed using SAS software package (version 9.4).

Results
Description of the population in training cohort A total of 148 patients in the rst cohort (Wuhan Union Hospital and West Union Hospital from Jan 16 to Jan 31, 2020) and 264 patients in the second validation cohort (Wuhan Union Hospital, West Union Hospital and Wuhan Central Hospital from Feb 1 to Feb 24, 2020) were included in this study. The owchart of study design showed in Fig. 1.
In the training set of 148 patients, 77 patients showed conspicuous improvement and the other 71 patients were deemed as having poor outcomes.
Univariable analyses were used to preliminarily compare the differences between patients with good prognosis and those with poor outcomes, and we screened out 24 indicators out of 63 variables analyzed that were associated with patients' outcomes (Table 1). chronic diseases between patients with poor outcomes and those with good ones (Table 1).
Days from symptoms onset to hospital admission were statistically signi cant between two groups (P = 0.0029), and the longer time (> 6 days) correlated with poor prognosis. All symptoms bore no association with the outcomes of the patients ( CT examination exhibited that, in 89 patients (60.1%), lesions were located in either central or peripheral lung eld and, and in 59 patients (39.8%), the lesions were both centrally and peripherally situated ( Table 1, e- Figure 1). One hundred and eleven patients (75%) had bilateral pneumonia. All indicators involving direct interpreted CT ndings were linked to patients' outcomes, including Rad-score, lung involvement ratio, laterality (uni-/bilateral) of pneumonia, central/peripheral lesion location, consolidation, patchy exudation, ground-glass opacity, pleural effusion, and lymph node enlargement (all P < 0.05) ( Table 1).
We analyzed the association between common immune cells, relevant indicators and patients' outcomes. The immune cells and other factors found to be associated with patients' outcomes included the counts of leucocytes, neutrophils, lymphocytes, eosinophils, and ESR, CRP, PCT, Complement 3 (C3), D-Dimer, IL-6, IL-10, ALT and albumin (all P < 0.05) ( Table 1).
The coe cients of them were 0.959222048, 0.968288021, 0.949917087, 0.005205795, 1027.865081 and 570.4715897 respectively (Fig. 2a, b). The Radscore was de ned as linear combination of the extracted features with their respective coe cients. ROC analyses showed that the AUC of Rad-score for differentiating patients' outcome were 0.76, 0.69 and 0.71 in the training dataset, testing dataset and the entire patient cohort, respectively (Fig. 2c, d, e).

Selection of clinical parameters and CT ndings associated with patients' short-term outcomes
We then included indicators with P values less than 0.05 in the univariable analysis and conducted multivariable regression analyses to assess the separate contribution of each single parameter to the prediction of patients' outcomes. The result showed that CRP was an independent outcome predictor in the model. Among all directly interpreted CT features included for multivariable analysis, lesion location (in both central and peripheral eld) was the only CT feature that possessed independent predictive value (OR: 16.22, 95% CI: 5.72-46.01, P < 0.0001) ( Table 2). Also, Rad-score in radiomics was shown to be an independent predictor (OR: 27.66, 95% CI: 3.35-228.13, P = 0.002) ( Table 2). Construction of nomogram scoring system for short-term outcome prediction In order to facilitate clinical application, we employed nomogram scoring systems to directly indicate the probability of poor prognosis in patients with COVID-19 based on the total score calculated. We developed nomogram scoring system on the basis of multivariable Logistic regression analyses for predicting the short-term outcomes of COVID-19 patients in the training set. The relevant parameters in the multivariable analysis were used to construct nomogram. Age is arti cially incorporated into the model. Accordingly, four indicators (age, CRP, Rad-score, lesion location) were selected to construct the nomogram (Fig. 3a) and the scores of variables were displayed ( Table 2). The ROC analysis yielded an AUC value of 0.880 (Fig. 3b), and the sensitivity and speci city at the optimal cut-off score of 77.50 were 81.25% and 87.27%, respectively.
Predictive performance of the nomogram scoring system in the independent validation set To validate the predictive performance of the constructed nomogram scoring system, 264 patients were enrolled from Feb 1 to Feb 24, 2020 in three hospitals, and we found that the nomogram scoring system in the validation set exhibited comparable differentiating power to the training set, as re ected by an AUC of 0.882 [95% CI: 0.833-0.920], and a sensitivity of 88.76%, and a speci city of 72.97% (Fig. 3e). Then we employed the calibration curve that plotted the observed probability against the predicted probability of poor outcomes to evaluate the calibration of the nomogram. The ideal calibration curve is the diagonal line which means that observed probability overlapped with the predicted probability of patient's short-term outcomes. In our independent validation dataset, the scoring system showed good calibration which was close to the diagonal line (Fig. 3d). All of the above further con rmed the feasibility and accuracy of this model.
As for demographic and epidemiological indicators, age was the sole statistically signi cant contributor, but the chronic underlying disease was not though it had been generally believed to be a prognostic factor [17]. The possible reason is that the number of people with chronic underlying diseases included in our subjects of training set was small, being only 19. All clinical symptoms except days from symptoms onset to hospital admission exhibited no association with the outcomes of the patients, which is also consistent with other studies [18]. Notably, all features of interpreted CT ndings were linked to patients' outcomes, including Rad-score, lung involvement ratio, laterality (uni-/bilateral) of pneumonia etc., which further illustrated the signi cance of CT imaging in the prognostic evaluation of COVID-19. In fact, according to the multivariable analysis, we found that the CT interpretation of lesion location (central/peripheral distribution) and Rad-score is the independent predictive factor compared to the other CT features (ground-glass opacity, patchy exudation, consolidation etc.), however, the latter were most focused in evaluating the outcomes of patients with COVID-19 in other research [19].
Recently, construction of mathematical models based on multiple markers has been increasingly applied in the eld of medicine. This approach combines a series of relevant parameters to generate a more accurate predictive model [20][21][22]. In this study, we constructed a predictive early warning model by using the most signi cant indicators based on the β-coe cient generated by multivariable Logistic regression analysis. Moreover, this model was in a form of nomogram scoring system, which makes it much more convenient for clinicians to use. This study integrated a total of 63 indicators, including not only the common indicators, but also the radiological characteristics of chest CT and immunological indicators used in other studies, such as IL-6, IL-10, C3, C4 [8].
To date, several studies have reported that patients with COVID-19 have decreased lymphocyte counts and increased serum in ammatory cytokine levels [18]. In ammatory storm, which can overwhelmingly cause single or multiple organ failure, is believed to be an important cause of death in COVID-19 patients in severe or critical conditions 9 . Similarly, we also found that lymphocyte counts and IL-6 and IL-10 levels were correlated with patient' outcomes in our cohort.
However, for hospitals at all levels, the detection of IL-6 or IL-10 is not suitable for large-scale disease prevention or screening in an epidemic, because only few people will be tested for the two cytokines. Similarly, only about 25% (37 out of 148) of patients in our training set received this test. For these reasons, apart from the basic indicator (age), we included the Rad-score plus the other two indicators (CT features and CRP) to build prognostic evaluation models for patients with COVID-19. The difference between the Radiomics and direct CT features lies in that it entails complicated calculation on CT images, and has potential power to facilitate better clinical decision making. As we know it, direct CT ndings are easier to judge, while the radiomic features have better advantage for the nature than the extent of the lesion. So they could complement each other. The sensitivity and speci city of the model which combined radiomic and direct CT features were 81.25% and 87.27% (cut-off score: 77.50) respectively. Consistently, the results of the independent validation set also con rmed the validity and accuracy of this model (AUC: 0.882; sensitivity: 88.76%; speci city: 72.97%).
In conclusion, the nomograms we developed based on four relevant variables, was easy to use and conductive to early judgment or assessment of short-term outcomes of patients with COVID-19. Since, clinically, the risk factors included in the prediction model are readily available. The nomogram can be used by physicians and medical settings effectively. Hopefully, its application may help to start intervention early and minimizing the likelihood of its development to serious illness, and ultimately, reducing mortality and easing pressure on medical resources. The protocol used in this project was reviewed and approved by the institutional review boards of Medical Ethics Committee of Union Hospital (NO.0036) and the informed consent was waived by the Ethics committee for this special emergency.
Availability of data and material

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. Supplementarymaterial.docx