CT-based Radiomics Combined with Signs: A Valuable Tool to help Physician Discriminate COVID-19 and Other Viral Pneumonia

Background: In this COVID-19 pandemic, the differential diagnosis of different viral types of pneumonia is still challenging. We aimed to assess the classication performance of computed tomography (CT)-based CT signs and radiomics features for discriminating COVID-19 pneumonia and other viral pneumonia. Methods: A total of 181 patients with conrmed viral pneumonia (COVID-19: 89 cases, Non-COVID-19: 92 cases; training cohort: 126 cases; test cohort: 55 cases) were collected retrospectively in this study. Pneumonia signs and radiomics features were extracted from the initial unenhanced chest CT images to build independent and combined models. The predictive performance of the radiomics model and the combined model were evaluated using an intra-cross validation cohort. Diagnostic performance of two models was assessed via receiver operating characteristic (ROC) analysis. Results: The combined models consisted of 3 signicant CT signs and 14 selected features and demonstrated better discrimination performance between COVID-19 and Non-COVID-19 pneumonia than the single radiomics model. For the radiomics model along, the area under the ROC curve (AUC) were 0.904 (sensitivity, 85.5%; specicity, 84.4%; accuracy, 84.9%) in the training cohort and 0.866 (sensitivity, 77.8%; specicity, 78.6%; accuracy, 78.2%) in the test cohort. After combining CT signs and radiomics features, AUC of the combined model for the training cohort was 0.956 (sensitivity, 91.9%; specicity, 85.9%; accuracy, 88.9%), while that for the test cohort was 0.943 (sensitivity, 88.9%; specicity, 85.7%; accuracy, 87.3%). Conclusion: CT-based radiomics combined with signs might be a potential method for distinguishing COVID-19 and other viral pneumonia with satisfactory performance.


Background
In December 2019, a highly infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection broke out in Wuhan, China, known as Coronavirus disease 2019 (COVID-19) [1,2] . COVID-19 is spreading around the world at an alarming rate. As of May 9th, more than 3.9 million patients have been diagnosed with COVID-19 with more than 270,000 deaths. Early studies have shown that almost all COVID-19 patients have pneumonia [3,4] . However, pneumonia caused by other viral infections is also very common at the same period of year, and clinical symptoms are very similar [5][6][7] .
Therefore, in this COVID-19 pandemic, the differential diagnosis of different viral types of pneumonia is di cult but highly important. Real-time reverse transcription-polymerase chain reaction (RT-PCR) is the gold standard for the diagnosis of viral pneumonia. However, recent reports have shown that RT-PCR detection of COVID-19 has low sensitivity [8] , and the high false-negative rate limits the rapid identi cation of viral pneumonia by RT-PCR.
Currently, computed tomography (CT) can play an important role in the diagnosis and treatment of viral pneumonia [9,10] . Studies have shown that the imaging signs of viral pneumonia and bacterial pneumonia are different [11,12] . However, little is known about the differences in imaging signs between COVID-19 and other viral pneumonia. And the radiologist's diagnosis of viral pneumonia through imaging signs is a subjective assessment, and the accuracy of diagnosis depends on the doctor's diagnosis experience. Therefore, it is also necessary to further develop a rapid quantitative auxiliary diagnostic method.
Radiomics is a new quantitative analysis technology based on medical imaging, which could extract thousands imaging features including rst-order statistical, shape, second-or higher order texture features. Previous studies have shown that radiomics has outstanding performance in tumor diagnosis, treatment effect evaluation, and prognosis prediction [13][14][15] . Recently, there had already been some constructed radiomics model based on deep-learning to predict the prognosis of COVID-19 patient [16] .
And Chen et al. found that radiomics model based on CT images is a feasible and promising method for monitoring poor prognostic outcome in patients with COVID-19 [17][18] . Besides, radiomics has been also used to identify focal organizing pneumonia and peripheral lung adenocarcinoma [19] . Therefore, radiomics may be a potential tool for the doctor to identify COVID-19 and other viral pneumonia.
Therefore, in present study, we aim to select signi cant chest CT signs and radiomics features that can effectively identify COVID-19 pneumonia and other viral pneumonia, and determine whether a CT-based radiomics signature combined with CT signs could be used as a tool in the differentiation of COVID-19 and other viral pneumonia.

Patients
This retrospective study was approved by our institutional review board and patient consent was waived. CT examination HRCT examination: CT scanners with 16 or more detector rows (Siemens, Germany; Philips, the Netherlands; and GE, USA) were used. The patient was scanned in the supine position while holding his or her breath after inspiration. The scanning range was from the thoracic inlet to the costophrenic angles. Scanning parameters: detector collimation width 64×0.6 mm or 128×0.6 mm, tube voltage 120 kV, adaptive tube current, high-resolution algorithm reconstruction, reconstruction layer thickness 1 or 1.5 mm, and layer spacing 1.5 mm.
Chest CT signs analysis Three Chinese radiologists were blinded to the RT-PCR results, all patient information, and type of viral pneumonia. First, two experienced radiologists in the cardiothoracic group independently read the radiographs. When their opinions were inconsistent, they discussed them and reached a consensus, which was reviewed and con rmed by the third senior radiologist in the cardiothoracic group. The signs of the rst CT examination after admission were analyzed. The CT imaging evaluation included lesion location (left upper lobe, left lower lobe, right upper lobe, right middle lobe and right lower lobe) and signs [GGO (ground-glass opacities), partial consolidation, consolidation (multifocal consolidation, focal consolidation), brous stripes, septal thickening, intralobular interstitial thickening, subpleural lines, crazypaving pattern, tree-in-bud, bronchial wall thickening, bronchiectasis, air bronchogram, halo sign, reversed halo sign, mediastinal lymphadenectasis, pleural thickening, and pleural effusion] [9][10][11][12]20] . The window width and level were set to 1600/-600 HU.

CT image processing and volume of interest (VOI) segmentation
The Lung Kit software (GE Healthcare, Version LK2.2) was used for pneumonia lesion segmentation. All the CT images were rstly resampled into isotropic 1 mm ×1 mm ×1 mm voxel size. The ve anatomic lung lobes were rstly automatically segmented. Then pneumonia lesion volume of interest (VOI) was automatically segmented and the margin of the VOI was manually com rmed by experienced thoracic radiologist. The distributed lesions were considered as a whole VOI in the next analysis steps.

Radiomics feature extraction and selection
A total of 1316 radiomics features were extracted from segmented VOIs by using open source of Python package Pyradiomics [21] . The extracted radiomics features were categorized into ve groups: (1)  The radiomics feature data was rstly preprocessed by replacing missing values with median values, and z-score normalization was followed. The whole dataset was randomly divided into training and test cohort at the ratio of 7:3. And the radiomics features in the training set was further screened for classi cation model construction. Firstly, the redundant collinear features were reduced by correlation analysis at a cut-value of 0.7. Then the features without statistical differences between COVID-19 and Non-COVID-19 groups were excluded by Mann-Whitney U test. The signi cant level was p<0.05. The univariate logistic analysis was used to select the potential classi cation indicators with P value less than 0.05. Next, the least absolute shrinkage and selection operator (LASSO) logistic regression method with 10-fold cross validation was applied for further feature selection and regularization to improve the model accuracy and avoid over tting. The minimum mean square error for model tting among the 10 folds was utilized to determine the optimized lambda values. The remaining features with non-zero coe cients at such lambda values were kept for model construction.

Classi cation model construction
The logistic regression model was constructed using the selected radiomics features to differentiate COVID-19 from Non-COVID-19 and the Radscore for each patient was calculated based on the regression coe cients. In addition, the independent predictors among CT signs were also selected by using Chisquare test (or Fisher exact test), univariate and multivariate logistic regression methods. These selected CT signs were further combined with radiomics features to construct combined model using logistic regression method. The nomogram of such combined model was also established.
The radiomics model and combined model constructed based on the training set were validated in the test cohort. The classi cation performances were evaluated by receiver operating characteristic (ROC) curve. The area under the curve (AUC), accuracy, sensitivity and speci city were derived. In addition, the calibration curves and decision curve analysis (DCA) curves were calculated to assess the models' classi cation performance and their clinical bene ts.

Statistical analysis
The continuous variables or ordinal variables were compared by t-test or Mann-Whitney U test. The distribution of different CT signs was compared by Chi-squared test or Fisher exact test when small sample sizes existed. For ROC analysis, the cut-off value in the training set at the maximum of Youden index of each model was calculated and the confusion matrix and sensitivity, speci city, accuracy in the training and test cohorts were derived at such cut-off value. The Delong test was used for comparison of ROC curves between different models. The reported statistical signi cance levels were all two-sided with the statistical signi cance set as p< 0.05. The statistical analyses were performed with SPSS Software (Version 25, IBM, Chicago, IL) and R software (Version: 3.6.1, https: www.r-project.org). The following R packages were mainly involved including: "glmnet" for logistic regression including LASSO regression; "pROC" for ROC analysis; "rmda" for DCA analysis.  (Table 2). Table 2 shows the performance of the prediction model by using GGO, intralobular interstitial thickening, and halo sign. Extraction and selection of radiomics features and building of the radiomics prediction model Predictive performance of the radiomics model and combined model

Clinical data
The radiomics features have good performance in the training and test cohorts (Table 3). After combining the radiomics feature and CT signs, the AUC values of were higher than that of radiomics model alone in both cohorts (Figure 5a and b). The AUC of the two models was compared by Delong test, and the differences were statistically signi cant (training cohort p = 0.01506, test cohort p = 0.01506) which indicated the enhancement of the prediction performance of combined model. The calibration curves of two models demonstrated the COVID-19 prediction probability had a good agreement between the prediction and observation in both training cohort and test cohort (Figure 5c and d). Moreover, the accuracy, sensitivity, speci city, precision, positive prediction, and negative prediction of radiomics features with CT signs are higher than that of radiomics features ( Table 3). The wide range of high-risk threshold (0-0.75) of the DCA curves in the combined model also indicated its clinical usefulness with standardized net bene ts larger than 0.6 (Figure 5e and 5f).

Discussion
Considering the similarity of period of year, clinical symptoms and CT ndings of COVID-19 and other viral pneumonia and the importance of differential diagnosis, this study systematically analyzed the differences in imaging signs and radiomics between them. Our research found that three signs and fourteen radiomics features are related to COVID-19 infection. In our study, the diagnostic performance of the radiomics model was better than the radiologists' subjective judgments. Moreover, the combined model was based on CT signs and radiomics features, which can distinguish COVID-19 from other viral pneumonia well and show excellent and encouraging performance.
This study included 181 con rmed patients with viral pneumonia, and we compared the CT signs of COVID-19 and other viral pneumonia. We found that GGO, intralobular interstitial thickening and halo sign of COVID-19 pneumonia are more common than other viral pneumonia, which is consistent with previous studies [20,22] . And the performance of CT signs to identify COVID-19 and other viral pneumonia is acceptable (AUC in the training cohort, 0.875; AUC in the test cohort, 0.812), which is consistent with Bai et al (Accuracy 60% -83%) [23] . The results of the radiologist's subjective evaluation showed that CT signs are of clinical value in identifying viral pneumonia, but there are still a lot of overlaps. Therefore, it is very challenging to distinguish two types of diseases through visual assessment.
Radiomics is the generation of minable high throughput data through conversion of digital CT and MRI images [24] . In previous studies, radiomics had outstanding performance in the diagnosis, staging, prognosis, and treatment response prediction of tumors [13][14][15] . In addition, radiomics can give rise to a deeper understanding of the heterogeneity of pneumonia lesions [19,25,26] . Therefore, radiomics is theoretically a feasible method to distinguish COVID-19 pneumonia from other viral pneumonia. In our study, we selected 14 of the most predictive radiological features, and most of them were ltered or transformed rst-order or texture features. It might indicate that the distinguishment between such highly imaging overlapped pneumonia may need the emphasized features in the spatial or frequency domains or the relatively higher stability of these higher-order features. In clinical cancer research, radiomics features have been shown to re ect tumor invasiveness, malignancy, and lymph node metastasis potential and other biological characteristics [27][28][29] . However, we speculate that the cause of CT image heterogeneity between COVID-19 and Non-COVID-19 may be different from the tumor.
Subsequently, the radiomics prediction model was constructed. The performance of the classi er in the test group was 77.8% sensitivity, 78.6% speci city, 78.2% accuracy, 77.8% positive prediction, and 78.6% negative prediction. In addition, the ROC curve was used for performance evaluation. The AUC of the test group was 0.886, indicating good performance. In order to further improve the performance of the prediction model, we combined the radiologist's subjective visual assessment and computer radiomics features to construct the prediction model. It was found that the combined model has higher sensitivity, speci city, accuracy and AUC. And the calibration curve and decision curve showed that the reliability and stability of the combined prediction model were better.
This study has some limitations. First, as a retrospective study, there may be selection bias. But the results of our preliminary study are encouraging and will be veri ed in future larger studies. In addition, because of the small size of other single cases of pneumonia, we did not compare the characteristics of different viral pneumonia. Finally, the response of the lung to the virus is highly related to the host factor. CT data alone cannot completely distinguish the type of viral pneumonia, and more clinical features and laboratory examination data need to be considered. Combined with more clinical data, the predictive model may be better at identifying viral pneumonia.
In conclusion, we determined the chest CT signs and radiomics features that distinguished COVID-19 from other viral pneumonia and developed an effective predictive model. Our research shows that CT signs and radiomics features are effective tools for identifying COVID-19 and other viral pneumonia, and can assist the more precise clinical diagnosis and treatment strategy for COVID-19.

Declarations
Authors' contributions     The selected features and their coe cient.

Figure 2
The selected features and their coe cient.