Multi-View Whole Lung Radiomics Quantitative Analysis of COVID-19 Pneumonia Based on Machine Learning

Background: Quantitative and radiomics imaging could realize non-invasive disease diagnosis. This study aimed to evaluate radiomics features of the whole lung for predicting new coronavirus disease 2019(COVID-19) from different views, and to investigate new radiomics features. 75 patients were retrospectively enrolled from December 1, 2019 to December 31, 2020. Both lungs were segmented by an unsupervised hybrid image segmentation approach. Radiomics features of the transverse plane, coronal plane and sagittal plane were separately extracted. After utilizing least absolute shrinkage and selection operator (LASSO), three radiomics models based on key radiomics features were built by machine learning. Meanwhile, the different categories radiomics models were constructed by the particle swarm optimization-deep extreme learning machine(PSO-DELM). Predictive accuracy, sensitivity, specicity and areas under receiver operating characteristic curve (AUC) were evaluated performances of these radiomics models. Results: Training and test cohorts had similar distributions of age and pneumonia type. 13 (transverse plane), 4 (coronal plane) and 8 (sagittal plane) selected features were constructed radiomics models in training cohort. Radiomics models based on PSO-DELM in the transverse plane, coronal plane and sagittal plane showed the favorable performance in the testing cohort(AUC=0.9444, 0.8636 and 0.9444, respectively). The phase congruency feature showed the stable predictive performance (AUC>0.9) among these radiomics features on the three different plane. Conclusions: Multi-view whole lung radiomics features could effectively differentiate COVID-19 from other types of pneumonia. Phase congruency may be attempted as a radiomics biomarker for the identication of pulmonary diseases. Merging the radiomics features into PSO-DELM is a promising direction for future research about medical radiology and deep learning. analyze multi-view whole lung radiomics in differentiating COVID-19 from other types of pneumonia by using machine learning methods and(2) to discover new radiomics biomarkers correlated with COVID-19, and (3) to assess the predictive ability of radiomis models before diagnosis and individual treatment.


Introduction
With the outbreak of the Corona Virus Disease (COVID- 19), it had rapidly spread to China and the rest of the world, and was becoming a worldwide epidemic. It result that more than 12,102,328 con rmed cases of the COVID-19 and more than 551,046 deaths all over world [1]. Of infected individuals, 80% were asymptomatic or only manifest mild symptoms [2]. CT scans were an effective non-invasive diagnostic method that was proved higher sensitivity than existing reverse transcriptase-polymerase chain reactions (RT-PCR) kits [3]. Patchy ground glass opacities, multiple, bilateral had been advised as the feature of COVID-19 infection on CT images [4][5]. The lesions had mainly a sub pleural distribution, and were extremely similar to other types of pneumonia [6].
However, these radiological ndings were a subjective experience evaluation of the clinicians and radiologists. So a rapid quantitative analysis method base on machine learning will contribute to the clinicians and radiologists to diagnose COVID-19.
Radiomics contained the high-throughput quantitative CT imaging features and had probable to generate radiomics biomarkers which can accurate diagnosis and treatment [7][8][9]. There were also varieties of studies reported that it was potential for the radiomics in cancer and tumor to evaluate the resectability, treatment response, and prognosis [10][11][12][13][14][15]. Radiomics features were the useful diagnostic biomarkers that could afford accurate quantitative information for clinical data [16]. It was a rst step in the development of radiomics combined with intelligence algorithms. So radiomics features might afford a effective tool to diagnosis of COVID-19 [17,18]. Basic CT features had been proved useful for diagnosing COVID-19, but it was needed to develop to utilize CT radiomics biomarkers as predictive biomarkers for distinguishing COVID-19. There were most literatures reported that had using a CT radiomic model to distinguish COVID-19 based on transverse plane. The lesions were segmented by using professional software or manually delineated by radiologists. [17,18]. It was very labor-intensive and time-consuming. Currently, there is fewer reported research on the radiomics features of the whole-lung which were automatically segmented based on multiple views. The distribution of ground glass opacities, number of lesions, maximum lesion range, and lobe involvement can be observed not only in the transverse plane of lung, but also in coronal plane and sagittal plane. So radiomics features of different planes could re ect richer radiomics information for radiologists. Therefore, the aim of this study were (1)to inquire into and quantitatively analyze multi-view whole lung radiomics in differentiating COVID-19 from other types of pneumonia by using machine learning methods and(2) to discover new radiomics biomarkers correlated with COVID-19, and (3) to assess the predictive ability of radiomis models before diagnosis and individual treatment.
Materials And Methods means and gray-level co-occurrence matrix (Fig.1). Segmentation ROI method was explained in detail in section 2 of Appendix A1 and it was performed through MATLAB software(version 2020a; The MathWorks Inc., Natick, MA, USA). The two radiologists assessed the results of segmentation and reached a consensus by discussion. 210 radiomics features were separately extracted form ROI of the transverse plane, coronal plane and sagittal plane. These features were divided into 10 categories:(a) Hu invariant moment(HU), (b) Gray-level co-occurrence matrix(GLCM),(c)Local entropy,(d) Statistics features including shape, sizes, extremes and so on, (e) Gray-level run-length matrix (GLRLM), (f)Entropy, (g)Gabor wavelet lter(Gabor), (h)Hessian matrix (Hessian), (i) Histogram of Oriented Gradients (HOG), (j)Phase congruency(Phase). The process of feature extraction was described in detail in section 3 of Appendix A1. This process was performed by using MATLAB software.

Radiomics signature construction
High-dimensional data were appropriate to use the least absolute shrinkage and selection operator (LASSO) method to reduce the dimension and this method was usually utilized as extracting the optimal features in other radiomics studies of the pneumonia [18][19]. The optimal features were selected from the radiomics features on the training cohort of three different planes. The "glmnet" package in R software (v3.5.3; R Development Core Team) was employed for LASSO logistic regression.

Construction of multiple radiomics predictive models
We validated and evaluated the performances of radiomics biomarkers on the validation and test cohorts from three different views. The optimal radiomics features were separately constructed radiomics models by using the support vector machine(SVM), particle swarm optimization-deep extreme learning machine(PSO-DELM) [20], back propagation-neural network (BP) in training cohort. To discover and investigate the function of different category radiomics features, we also constructed different category radiomics models and analyzed the predicting performance of these features in the three planes. The parameter settings were optimized and adjusted in the validation cohort and presented in section 4 of Appendix A1. The predictive accuracy, sensitivity, speci city, the area under receiver operating characteristic curve (AUC) were evaluated the performance of radiomics models. The processes were executed in Matlab 2020a.

Statistical analysis
Statistical analysis was implemented using R (version 3.5.3). Continuous variables were expressed as mean ± standard deviation or median and interquartile range (IQR) according the results of normality test, whereas categorical variables were expressed as absolute number and percentage. Chi-square test was used to compare the differences in patients' clinical factors (categorical variables) among two cohorts. Independent sample t-tests or Wilcoxon test were used to compare patients' clinical factors (continuous variables) and various radiomics features among two groups. A two-sided P value < 0.05 was considered as statistical signi cance.

Clinical characteristics of the patients
The characteristics of training and validation cohorts were shown in Table 1   Predictive Ability Of Radiomics Models The optimal features were implemented to generate the radiomics model for distinguishing COVID-19. We built and evaluated three radiomics models for selecting the best model; the performances of these models were described by accuracy, sensitivity, speci city, AUC as shown in Table 3. To develop and investigate the different category radiomics models on the different planes, we utilized PSO-DELM models to construct and validate the predictive ability of the different categories radiomics features, that were shown in Table 4 and Fig. 3. Note. AUC(area under the receiver operating characteristic curve) were values and were calculated by applying confusion matrix. Models: SVM = support vecto = back propagation-neural network, PSO-DELM = particle swarm optimization-deep extreme learning machine Note. HU = Hu invariant moment; GLCM = Gray-level co-occurrence matrix;Statistics = Statistics features including shape, sizes, extremes and so on; GLRLM = Gray-level run-length matrix ; Gabor = Gabor wavelet lter, Hessian = Hessian matrix ; Locent = Local entropy;HOG = Histogram of Oriented Gradients, Phase = Phase congruency.Acc = accuracy; Sen = sensitivity; Spe = speci city.

Discussion
In our study, we quantitatively analyzed multi-view whole lung radiomics to distinguish COVID-19 from other types of pneumonia by using machine learning methods. The radiomics model based on PSO-DELM could effectively differentiate COVID-19 from other types of pneumonia before diagnosis. Phase congruency features were used as radiomics features at the rst time, it may be considered as radiomics biomarkers to predict COVID-19.
The previous studies showed that COVID-19 were signi cantly associated with the following CT imaging characteristics, such as bilateral lungs, peripheral or diffuse distribution, ground glass opacity, maximum lesion range, number of lesions, lobe involvement, Hilar and mediastinal lymph nodes enlargement, and so on [21][22]. All of these CT characteristics were almost expressed in the lung pulmonary window (except mediastinal lymph nodes enlargement), so the whole lung of different planes could be displayed the richer radiomics information of COVID-19 to radiologists. The current most studies were focused on the radiomics features of the lesions, these lesions were segmented by manual segmentation or by deep learning approaches [17,21]. Manual segmentation was expensive to study and diagnosis in scarcity of clinical resources. Meanwhile, deep learning methods were need a amount of labels and the precision of the segmentation need to be improved(DSI:0.778) [21]. In our study, we tried to segment the lung unsupervised without manual operation or imaging software, the performance of segmentation was as good as classical region growing method(DSI: 0.9230 vs 0.9092) that was detailed in Appendix A1. FANG et al's study [17] segmented the lesion area by using a imaging software, AUC value of radiomics model for differentiating COVID-19 based on SVM was 0.826. It showed that the radiomics model based on PSO-DELM(AUC 0.9444) was better than radiomics models based on SVM(AUC 0.5556) in our study. For the transverse plane and sagittal plane, the sensitivity of radiomics model based on PSO-DELM was higher than 0.8889, which meant that the whole-lung radiomics features can accurately assess the COVID-19.
Non-invasive disease diagnosis could be realized by using imaging radiomics, 2D image features previously studied were only extracted from the transverse plane [23]. 2D image radiomics features extracted from the transverse plane or 3D radiomics features extracted from voxel of lesions could not fully describe the whole COVID-19 radiomics features comprehensively, but sagittal plane, coronal plane and transverse plane could roundly contain radiomics information of the lesions (including the shape, size, location). In this study, we found that Hessian matrix features were all captured on the three planes. It indicated that these features can display gray gradient change information from different views. Table 4 showed that the prediction accuracy of these radiomics features achieved 0.9412 both on three different planes. Entropy feature and HOG features which described the shape and texture were captured on the transverse plane and sagittal plane, but the performances of Entropy features were better than HOG features on the transverse plane(Accuracy 0.9412 vs 0.8824).
GLCM features and phase congruency features were captured both on the transverse plane and coronal plane, but our study showed that the prediction performance of phase congruency features were better than GLCM features on different planes, that was shown in Table 4 and Fig. 3. To our knowledge, there were fewer reported researches about phase congruency features used in radiomics. The phase congruency was invariant to image contrast and could extract effective and reliable texture features under different illumination conditions [24]. Phase congruency properly extracted features of any kind of phase angle, it was different from feature detectors based on gradient, which could only extract step features. It was usually used in palm print authentication, face representation technique, image segmentation technique, and so on [24][25][26][27][28]. In our study, phase congruency feature was favorable performance in differentiating COVID-19 from other types of pneumonia on the three planes (all AUC = 0.9444). It may be considered to be as diagnostic biomarker for lung in other diseases.
In this study, we constructed three radiomics models to identifying COVID-19. In the independent test cohort, the prediction accuracy, sensitivity, speci city and There were several limitations of this study. First, it was a retrospective study using a small sample size from three hospitals with no external test data, so small amount of COVID-19 may affect the problems of over-tting and robustness of the prediction effect. Second, we did not segment the lesions in lung parenchyma that may imply that extracted radiomics features did not express the all radiomics information, because so far there were fewer accurate automatic segmentation technologies. That might affect all real radiomic information. Third, phase congruency feature was used as radiomics features at the rst time, but no more proof of availability in other areas of diseases. Besides, 2D radiomics features were only used in our study. 3D radiomics features would improve the predictive ability of our models.

Conclusions
In conclusion, our study showed that multi-view whole lung radiomics features could effectively predict COVID-19 by using machine learning methods. Phase congruency features were rst proposed to use to differentiate COVID-19 from other types of pneumonias. The radiomics models based on PSO-DELM showed favorable performance in predicting COVID-19. We hoped that our rst trying of new radiomics features could be helpful for improving the diversi cation of diagnostic biomarker; employing deep learning to analysis the radiomics features of lung can assist clinical for diagnosis of lung diseases.

Consent for publicationNot applicable
Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
Competing interests The authors have no con icts of interest to declare.