Development and validation of a CT radiomics signature for diagnosing COVID-19 pneumonia compared with CO-RADS


 Objectives: To develop and validate a CT radiomics signature for diagnosing COVID-19 pneumonia compared with clinical model and COVID-19 reporting and data system (CO-RADS).Methods: This two-center retrospective study enrolled 115 laboratory-confirmed COVID-19 patients with 1127 lesions and 435 non-COVID-19 pneumonia patients with 842 lesions. In study 1, a radiomics signature and a clinical model was developed and validated in the training and internal validation cohorts (patient/lesion [n] = 379/1325, n = 131/505) for identifying COVID-19 pneumonia. In study 2, the developed radiomics signature was tested in another independent cohort including all viral pneumonia (n = 40/139), compared with clinical model and CO-RADS approach. The predictive performance was assessed by receiver operating characteristics curve (ROC) analysis, calibration curve, and decision curve analysis (DCA). Results: Twenty-three texture features were selected to construct the radiomics model. Radiomics model outperformed the clinical model in diagnosing COVID-19 pneumonia with an area under the ROC (AUC) of 0.98 and good calibration in the internal validation cohort. Radiomics model also performed better in the testing cohort to distinguish COVID-19 from other viral pneumonia with an AUC of 0.96 compared with 0.75 (P=0.007) for clinical model, and 0.69 (P=0.002) or 0.82 (P=0.04) for two trained radiologists using CO-RADS approach. The sensitivity and specificity of radiomics model can be improved to 0.90 and 1.00. The DCA confirmed the clinical utility of radiomics model. Conclusions: The proposed radiomics signature outperformed clinical model and CO-RADS approach for diagnosing COVID-19, which can facilitate rapid and accurate detection of COVID-19 pneumonia.


Introduction
The ongoing pandemic of coronavirus disease 2019 (COVID-19) caused by "severe acute respiratory syndrome coronavirus 2" (SARS-CoV-2) has become a global threat [1,2]. The high contagion of SARS-CoV-2 and the virulence to cause severe illness involving multiple organs has caused many countries into a dilemma for screening, diagnosing, and treatment with limited healthcare resources. As of June 20, a total of 8,663,270 worldwide con rmed cases and 460,012 deaths have been reported [3], and the numbers continue to grow. The reverse-transcription polymerase chain reaction (RT-PCR) for SARS-CoV-2 was regarded as the diagnostic gold standard but with various sensitivities ranging from 59% to 71% depending on viral load and test sample quality [4,5]. Furthermore, the lengthy turnaround time and shortage of RT-PCR kit will delay the treatment, which contributes to the dilemma.
Chest CT imaging is a widely available, time-saving, and non-invasive approach for detecting COVID-19 pneumonia. Previous studies revealed that chest CT could serve as an e cient tool for diagnosing COVID-19 pneumonia with high sensitivity and monitoring disease course [4,[6][7][8]. Recently, a multinational consensus statement from the Fleischner Society also declared that CT scanning can be a major method if symptoms worsen or there is a situation short of RT-PCR kit [9]. However, COVID-19 pneumonia shared similar imaging features with pneumonia caused by other pathogens, especially other viral pneumonia. The speci city was relatively low when compared to RT-PCR results [4]. Quarantine for those with nal COVID-19 negative RT-PCR results increased stress on limited healthcare resources. As for distinguishing COVID-19 from other viral pneumonia on chest CT, high speci cities but moderate sensitivities were reported among different international radiologists [10]. To facilitate the evaluation of COVID-19 pneumonia, a standardized assessment scheme for pulmonary involvement of COVID-19 named CO-RADS (COVID-19 reporting and data system) was developed to estimate the risk [11,12]. The subjective CO-RADS classi cation demonstrated high discriminatory power but moderate to substantial agreement among observers. Hence, more measures should be taken for more rapid and accurate diagnosis of COVID-19 to combat the current pandemic.
Radiomics involved high-throughput extraction of a large number of quantitative features from medical images, thereby converting image data into high-dimensional data to objectively and quantitatively describe the characteristics of lesions that may not be perceptible by the naked eye. The potential bene ts of radiomics had been highlighted in improving diagnostic, prognostic, and predictive accuracy for cancers such as lung cancer, rectal cancer, etc. as well as other non-neoplastic diseases [13][14][15][16]. To date, there are limited data about the value of chest CT-based radiomics in detecting COVID-19 pneumonia.
In the present study, we aimed to develop and validate a radiomics signature model for distinguishing COVID-19 from pneumonia with other etiologies by using real-world data during the COVID-19 outbreak in China. Additionally, the predictive performance of radiomics model was tested in another independent viral pneumonia cohort in comparison with the clinical model and CO-RADS grading approach.

Patients
This study was approved by the Institutional Ethics Committee of the two enrolled centers. The informed consent requirement was waived for this retrospective study. The work ow of this study was displayed in Figure 1. In study 1, clinical and CT data of consecutive 115 patients with laboratory-con rmed COVID-19 from Bengbu City (center I) as well as 1205 patients with respiratory symptoms from Xinhua Hospital a liated to Shanghai Jiao Tong University School of Medicine (center II) were reviewed. Patients with common pathogen con rmation and disease improvement on follow-up CT after treatment were grouped as non-COVID-19 pneumonia patients. The exclusion criteria were showed in Figure 1. Consequently, 95 COVID-19 and 415 non-COVID-19 pneumonia patients were recruited, who were semi-randomly allocated to the training and internal validation cohorts according to the recruitment time. In study 2, another 40 patients with viral pneumonia who met the inclusion and exclusion criteria as an independent cohort were included to test the constructed models. Finally, 115 COVID-19 and 435 non-COVID-19 pneumonia patients were enrolled in this study. Among the non-COVID-19 patients, 128 were con rmed viral infections, 195 mycoplasma infections, 5 chlamydia infections, 3 fungus infections, and 104 coinfections.

CT imaging acquisition and interpretation
All the patients underwent non-enhanced chest CT examinations for detecting pneumonia with a 64section multi-detector CT scanner. The detailed imaging parameters were demonstrated in Appendix E1.
Initial CT images before any treatment were performed by three experienced radiologists in consensus with more than 9 years of experience in thoracic imaging. The disputes among observers were resolved by consulting another experienced radiologist with more than 20 years of experience in thoracic imaging.
All of them were blinded to laboratory results.

Image segmentation and radiomics feature extraction
Three-dimensional (3D) segmentation of the entire volume of interest (VOI) of lesions was performed manually and independently by two experienced radiologists via itk-SNAP 3.8.0 (www.itksnap.org). The outline of lesions was delineated along the border on thick-section images with lung window and excluded intralesional vessels, bronchi, necrosis, and cavitation ( Figure 2). The interobserver and intraobserver reproducibility evaluation of radiomics feature extraction was performed using intraclass correlation coe cients (ICC). An ICC of 0.81 to 1.00 showed almost perfect agreement, 0.61 to 0.80 as substantial agreement, and 0.41 to 0.60 as moderate agreement [14].
Radiomics features were extracted from VOIs by using pyradiomics 3.0.0 version [18] (http://www.radiomics.io/pyradiomics.html). Six classes of radiomics features were extracted. In addition, two image lters of wavelet and Laplacian of Gaussian were applied to the original image, respectively. Finally, 14 different image types were used for extracting radiomics features. The detailed image segmentation and radiomics feature extraction information was demonstrated in Appendix E2.

Development of clinical and radiomics models
For clinical model, univariate and multivariate logistic regression analysis were applied to select the independent predictors of clinical and radiological features for diagnosing COVID-19 pneumonia in the training cohort. Then, a clinical nomogram was developed with the selected variables. For radiomics signature model, minimum redundancy and maximum relevance (mRMR), and the least absolute shrinkage and selection operator (LASSO) logistic regression were used to select the best performed radiomics features in the training cohort. Radscore was calculated for each lesion and the mean Radscore (mRadscore) of multiple lesions for each patient was used for predicting COVID-19 pneumonia.

Internal validation and clinical utility of clinical and radiomics models
The predictive performance of clinical and radiomics models was assessed by using the receiver operating characteristic curve (ROC) analysis, in which the areas under the curve (AUCs), accuracies, sensitivities, and speci cities were established. Then, the predictive performance of the models was further tested in the internal validation cohort. Calibration curves were performed to assess the goodnessof-t of the clinical and radiomics models.
Decision curve analysis (DCA) was implemented to evaluate the net bene ts of the prediction models at different threshold probabilities in the validation cohort.
Predictive performance of radiomics model in distinguishing COVID-19 from other viral pneumonia compared with clinical model and CO-RADS Another independent cohort including 20 COVID-19 patients and 20 patients with other viral pneumonia was used to test the discriminatory power for the clinical model, radiomics model, and CO-RADS. The CO-RADS included 6 levels of suspicion for pulmonary involvement of COVID-19 besides CO-RADS 0, not interpretable (scan technically insu cient for assigning a score) [11]. The detailed information for each level was demonstrated in Appendix E3.
The CO-RADS categories for the 40 patients were independently performed by two experienced radiologists who were familiar with CO-RADS and blinded to laboratory results. The interobserver agreement was assessed by using Cohen kappa test, where 0.41-0.60 moderate agreement, 0.61-0.80 substantial agreement, and 0.81-1.00 almost perfect agreement [19].

Statistical analysis
Comparisons of patient characteristics between COVID-19 and non-COVID-19 pneumonia groups were performed by independent two-sample t test, Mann-Whitney U test, and chi-squared test or Fisher's exact test via SPSS 23.0 (IBM). Other statistical analyses were performed with R software (version 3.6.1, http://www.Rproject.org). The AUCs were compared by DeLong test [20]. A two-sided P<0.05 indicated a statistically signi cant difference.

Patient characteristics
The clinical and radiological features of the 550 patients in the training, internal validation, and testing cohorts were depicted in the Table 1 and 2. For clinical features, there were signi cant differences for age, cough symptom, white blood cell count, neutrophil ratio, and lymphocyte count in the training and internal validation cohorts. While in the testing cohort, only C-reactive protein showed signi cant difference. For radiological features, lesion location and distribution were signi cantly different between the COVID-19 and non-COVID-19 groups for all the cohorts.

Features selection and development of clinical and radiomics models
Eight clinical-radiological features were selected for building the clinical model in the training dataset, including age, gender, neutrophil ratio, lymphocyte count, location, distribution, reticulation, and CT score. The developed clinical nomogram was showed in Figure 3.
For radiomics model, 1218 radiomics features were extracted for 783 lesions in 66 COVID-19 patients and 542 lesions in 313 non-COVID-19 patients in the training dataset. The interobserver and intraobserver reproducibility of radiomics feature extraction was satisfactory with ICCs ranging from 0.7139 to 0.9999, and 0.7130 to 0.9999, respectively. Twenty-three wavelet-transformed texture features with nonzero coe cients were selected to construct the radiomics signature model (Appendix E4 and Figure S1). The developed radiomics signature was presented in Appendix E5. Radscore of each lesion was depicted in Figure S2.

Validation and clinical utility of clinical and radiomics models
The AUC of radiomics model developed in the training cohort was 1.00. Favorable performance was observed in the internal validation cohort. The radiomics model outperformed clinical model in diagnosing COVID-19 pneumonia, with an AUC of 0.98 compared with 0.83. The sensitivity and speci city of radiomics model was improved to 0.91 and 0.94. The AUCs, accuracies, sensitivities, and speci cities of clinical and radiomics models in the training and internal validation cohorts were depicted in Table 3. The ROC analysis results are displayed in Figure 4.
Calibration curves showed that radiomics model demonstrated a better agreement between the predicted and actual probabilities of COVID-19 both in both datasets ( Figure S3). DCA revealed that the radiomics prediction model was more bene cial than the clinical model, as well as the "treat-all-patients" or "treatnone" strategies when the threshold probability was from 0.0 to 1.0 ( Figure 5).
Predictive performance of clinical model, radiomics model, and CO-RADS category in distinguishing COVID-19 from other viral pneumonia In the testing cohort, radiomics model outperformed clinical model in distinguishing COVID-19 from other viral pneumonia with an AUC of 0.96 compared with 0.75 (P=0.02) ( Figure 6). In addition, the radiomics model also performed better than two trained radiologists by using CO-RADS. The AUC of radiomics model was signi cantly higher than 0.69 for radiologist 1 (P=0.002) and 0.82 for radiologist 2 (P=0.04) ( Figure 6). The AUCs, accuracies, sensitivities, and speci cities of clinical model, radiomics model, and CO-RADS in the testing cohort were demonstrated in Table 4. The interobserver agreement between the two radiologists was moderate with a kappa value of 0.53.

Discussion
In this study, we developed and validated a CT-based radiomics model for diagnosing COVID-19 pneumonia, and compared its predictive performance with clinical model as well as the performances of two trained radiologists by applying CO-RADS. Our results revealed that radiomics model outperformed clinical model in identifying COVID-19 pneumonia in the training, internal validation, and testing cohorts, and not only for the common pathogens' infections but also for the selective viral infections. The proposed radiomics model achieved favorable performances with AUC values of 1.00, 0.98, and 0.96 as well as high sensitivities and speci cities in the three cohorts. Furthermore, the developed radiomics model was also superior to CO-RADS in discriminating COVID-19 from other viral pneumonia with a sensitivity and speci city of 0.90 and 1.00.
Rapid and accurate diagnosis of COVID-19 is crucial for early intervention and healthcare allocation during the ongoing outbreak. Previous studies had explored the clinical and imaging features of COVID-19 for facilitating diagnosis of COVID-19 pneumonia, revealing that fever and/or cough, normal or decreased white blood cells, and decreased lymphocyte count, GGO lesions in the peripheral and posterior lungs on CT images could aid in screening the highly suspicious patients [6,[21][22][23]]. However, more common consolidation lesions could be detected due to the time interval from symptom onset and atypical features including brous stripes and irregular solid nodules were also presented in the subsequent studies, which complicated the diagnosis [8,24]. Our study found that older age, male, normal neutrophil ratio, decrease lymphocyte count, bilateral locations, peripheral distributions, and reticulation on CT as well as CT score were independent predictors for distinguishing COVID-19 from non-COVID-19 pneumonia derived from the training cohort, which was in accordance with the above studies.
Nevertheless, the predictive performance was not satisfactory with an AUC of 0.83 and a sensitivity of 0.63 in the internal validation dataset. The various sensitivities and speci cities of identifying COVID-19 subjectively with the clinical and radiological features were also found in previous studies [4,5,10].
When evaluating the diagnostic performance of clinical model in discriminating COVID-19 from other viral pneumonia in the testing dataset, the discriminatory power further decreased with an AUC and sensitivity of 0.75 and 0.60. In prior investigations conducting comparison between chest CT and RT-PCR results, the sensitivity of CT in identifying COVID-19 pneumonia can be estimated to 98%, but the speci city was only 25% by analyzing 1014 patients [4,5]. Regarding the diagnostic performance among different radiologists from different countries in distinguishing COVID-19 from viral pneumonia on chest CT, the sensitivity, however, was reported to be moderate but the speci city was high [10]. Even by applying the recently recommended CO-RADS with reported high discriminatory power of AUC 0.91 in identifying COVID-19 [11], the AUC, sensitivity, and speci city in our study were not satisfactory with 0.69, 0.80, and 0.55, respectively for a trained radiologist who was familiar with CO-RADS, as well as 0.82, 0.9, and 0.65, respectively for another trained radiologist in distinguishing COVID-19 from other viral pneumonia. The moderate inter-observer agreement was also not in favor of the accurate diagnosis of COVID-19. Therefore, it is urgent to develop a more objective approach for improving the current diagnostic accuracy of COVID-19 pneumonia.
Recently, arti cial intelligence (AI) using deep learning technology has demonstrated good performance to improve the diagnosis of COVID-19, with sensitivities ranging from 0.67 to 0.90 and speci cities from 0.83 to 0.96 [25-27]. However, deep learning requires a large amount of data to be trained, which limited its timely application based on the sporadic COVID-19 cases in most parts of China. More clinical implementations are warranted for the test of AI system and wide availability. Another machine learning approach radiomics rapidly developed in recently years can be widely available through open-source software and the radiomics signature is easily utilized. The potential for diagnosing and predicting outcomes of different lesions has been proven in the prior reproducible investigations [14,15], as well as our previous studies in predicting preoperative synchronous distant metastasis in patients with rectal cancer [28,29]. In this study, 23 textural features were selected to build the radiomics model and the proposed model performed well not only in the training cohort but also in the two validation cohort with AUCs of 1.00, 0.98, and 0.96, respectively. The high sensitivities and speci cities with 1.00 and 0.97 in the training cohort as well as 0.91 and 0.94 in the internal validation cohort were observed.
It was reported that there were overlaps in imaging ndings between COVID-19 and other viral infections, such as the coronavirus SARS-CoV and MERS-CoV pneumonia, as well as H1N1, H5N1, in uenza,, and so on [6,23,30]. Therefore, it is not di cult to understand that the textural features outperformed the other extracted morphological features or the rst-order statistical features according to the histogram analysis. Textural features encoded the relationships between nearby voxels within VOIs, re ecting the intralesional heterogeneity. It is the advantage that radiomics can transform conventional medical images into quantitative and high-dimensional data visual analysis [31,32]. To further test the robustness of our developed radiomics model, we enrolled an independent validation cohort including viral infection patients to assess the diagnostic performance. The AUC, accuracy, sensitivity, and speci city were satisfactory with values of 0.96, 0.95, 0.90, and 1.00, respectively. When compared with the clinical model and the CO-RADS for identifying COVID-19 pneumonia, the AUC value of radiomics model was signi cantly higher. The high sensitivity and speci city can not only facilitate to select the highly suspicious patients of COVID-19 for timely management, but also help to exclude the negative patients for relieving the stress of healthcare system. Our study has several limitations. First, the sample size of patients in our retrospective study was relatively small. Prospective investigation with a larger sample size from more centers will be required to validate our proposed model. Second, since we enrolled the non-COVID pneumonia patients with blood laboratory pathogen-con rmation and pneumonia improvement after treatment by follow-up CT scans, limited bacterial infection cases were available due to the lack of bacterial culture. Third, center II were a general hospital with a strong pediatric medical center, thus many children with mycoplasma infections were included in our study. The median age was demonstrated signi cantly lower than that of the COVID-19 infection patients, where selection bias may exist. However, our non-COVID-19 pneumonia cases were consecutively enrolled from the real word data in our center, and the children was also proved to be susceptible for COVID-19, which de nitely needed rapid and accurate differential diagnosis.
In summary, our preliminary study demonstrated that chest CT-based radiomics signature model outperformed clinical model and CO-RADS in diagnosing COVID-19 pneumonia. The useful quantitative lesion characteristics derived from the proposed radiomics model can facilitate rapid and accurate diagnosis as well as timely management of COVID-19. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Con ict of interest
One of the authors (SD) is an employee of GE Healthcare. The remaining authors of this manuscript declare no relationships with any companies whose products or services may be related to the subject matter of the article.       Figure 1 The work ow of this study   Supplementary Files