Dataset
In order to develop any diagnostic tool, sufficient data must be available to improve this tool. In order to overcome the paucity of X-ray images of COVID-19 patients, we used three different open sources to collect a sufficient number of X-ray images to train and test the proposed network. The data set is an x-ray images of the human chest taken by the widely available X-ray machine. One of the challenges that we can face in training networks is the imbalance of data, therefore we do a balance in preparing data set images. We used 1828 chest x-ray images. The first of the three sources used in the study was Dr. Cohen, who collected data from public sources that did not violate patient privacy.
We extracted 241 chest x-ray images for the COVID-19 patients from Dr. Cohen's dataset [30]. The view of these images was from different sides, which is the posteroanterior (PA), anteroposterior (AP), laying down (AP Supine), and lateral (L). The second source used in this study is from the Kaggle platform. Data for this source contained 79 chest X-rays for COVID-19 patients [31]. It is noteworthy; there are ten similar images between the two sources, so they were deleted, and bringing the total number of chest x-ray for patients COVID-19 is 310 images. The third source is also from the Kaggle platform, which contains a broad set of chest x-rays for patients with pneumonia and healthy people [32]. From these data, we took 864 chest x-rays of pneumonia patients, included 467 images of bacterial pneumonia, and 397 images of patients with viral pneumonia. Also 654 chest x-ray images of healthy people were taken from the same data set. We divided the dataset into two parts, the training set and the testing set, which was 27% for the training the proposed model and 73% for testing it, as in Table 1.
Table 1 Database details
|
COVID-19
|
Pneumonia (Virus Bacterial)
|
Normal
|
Train
|
84
|
233
|
176
|
Test
|
226
|
631
|
478
|
Train + Test
|
310
|
864
|
654
|
Experiments
In this study, we performed tests to diagnose and classify COVID-19 using a chest x-ray. tests performed on two types of database group: the first type that includes two categories (COVID-19, Normal) and the second type includes three categories (COVID-19, Normal, and Pneumonia), the data set was divided into two parts 27% as training data and 73% as test data. To evaluate the efficiency and stability of the proposed model, the tests were repeated five times for both types of data. The optimizer (SGD) used with a 0.001 learning rate, batch size 32, Momentum 0.9, and epochs 30.
This study was performed using Python and Keras package with TensorFlow on Intel (R) Core (TM) i7-5700HQ CPU @ 2.70GHz (8 CPUs), ~ 2.7GHz. Also, besides, the experiments were performed using the NVIDIA GTX 970M GPU and RAM with 8 GB and 16 GB, respectively.
In Fig.1, the graph of classification losses and accuracy for training and testing stages.
Fig.1 shows that the amount of training losses decreases rapidly, as it recorded a rate of losses of approximately 0.1 during the first five epochs and continued downward until it reached nearly zero after 25 epochs. As for the rate of test losses, its descent was less steep, and this is normal because the data that tested proposed model were new data. As for the accuracy scheme, it is clear that the proposed model can generalize, as the scheme has a slight difference between the accuracy of training and testing, and this is a good indication of the efficiency of the proposed model CCBlock.
To evaluating proposed model CCBlock, a confusion matrix was calculated for each implementation, as shown in Fig.2 Fig.3. The results showed that the proposed model has the efficiency and high stability of the diagnosis COVID-19 for categories (Normal and Pneumonia) well, we have documented the rate of 98.52% accuracy on the two and 95.34% on the three categories. Amounts Sensitivity, specificity, and accuracy three categories and five implementation times in Table 2, and two categories in Table 3.
Table 2 Sensitivity, Specificity, and accuracy for three categories
|
Sensitivity
|
Specificity
|
Accuracy
|
Run1
|
98.21
|
98.94
|
95.21
|
Run2
|
99.10
|
98.72
|
95.43
|
Run3
|
99.10
|
99.15
|
95.43
|
Run4
|
96.85
|
99.36
|
95.13
|
Run5
|
99.10
|
98.72
|
95.51
|
average
|
98.47
|
98.98
|
95.34
|
Table 3 Sensitivity, Specificity, and accuracy for two categories
|
Sensitivity
|
Specificity
|
Accuracy
|
Run1
|
98.67
|
98.54
|
98.58
|
Run2
|
98.23
|
98.54
|
98.44
|
Run3
|
98.67
|
97.70
|
98.01
|
Run4
|
98.66
|
98.95
|
98.86
|
Run5
|
98.67
|
98.74
|
98.72
|
average
|
98.58
|
98.49
|
98.52
|
It is apparent in table 3 that the proposed model proved useful in the diagnosis and classification of COVID-19 from classes (Normal, Pneumonia). Where we recorded accuracy of 95.51% as the highest accuracy obtained. However, we decided to take an average of 5 implementation times, where a rate of 95.34% was obtained. To test the efficacy of the proposed model on the diagnosis and classification of COVID-19, we tested our proposed model CCBlock on the second database, which includes x-rays of people with COVID-19 and pictures of uninfected people. Where we recorded 98.86% as the highest accuracy for the proposed model, but we considered taking the average for five times implementation and considering it the accuracy of the proposed model, where the average accuracy 95.34% was recorded.
In the field of machine learning and specifically in matters of classification, a confusion matrix is one of the methods that allows a more precise visualization of the performance of the algorithm, as it shows errors of a classification algorithm for each category with other categories. The primary diameter of the array represents the classes that were correctly categorized, while the other elements represent the data that incorrectly classified as other categories. In Fig.3 we calculated the confusion matrix for each implementation on the first data set that includes the three categories (COVID-19, Normal, and Pneumonia). Fig.3, shows that the proposed model is highly capable of diagnosing COVID-19 from other categories. We recorded the highest rating accuracy for COVID-19 (98%). Whereas, the Train Run-1 matrix shows the COVID-19 classification for the first implementation using the proposed model on the training data for three categories.
In Fig.3, the confusion matrix of a set of tests performed on the second data set, which includes two categories (COVID-19, Normal). In this study, we focus on COVID-19, so it is best to test the proposed CCBlock model ability in diagnose COVID-19 from those who are not infected. Confusion Matrix showed that the proposed model was able to record a diagnostic accuracy of COVID-19 of 99.55%. Whereas, the Train Run-1 matrix shows the COVID-19 classification for the first implementation, using the proposed model on the training data for, two categories.
The results obtained in previous studies showed the proficiency of deep neural networks in the diagnosis and classification of COVID-19 well from other categories. However, our proposed model CCBlock proved its worth and superiority over previous studies in both issues two categories and three categories where we recorded a higher accuracy than the accuracy of previous studies, as shown in table 4.
Table 4 Comparison between the proposed model and previous studies
Study
|
Type of Images
|
Number of Cases
|
Method Used
|
Accuracy 2-classes (%)
|
Accuracy 3-classes (%)
|
Ioannis et al. [18]
|
Chest X-ray
|
224 COVID-19(+)
700 Pneumonia
504 Healthy
|
VGG-19
|
-
|
93.48
|
Tulin et al. [19]
|
Chest X-ray
|
125 COVID-19(+)
500 No-Findings
125 COVID-19(+)
500 Pneumonia
500 No-Findings
|
DarkCovidNet
|
98.08
|
87.02
|
Wang and Wong [20]
|
Chest X-ray
|
53 COVID-19(+)
5526 COVID-19 (-)
8066 Healthy
|
COVID-Net
|
-
|
92.4
|
Hemdan et al. [21]
|
Chest X-ray
|
25 COVID-19(+)
25 Normal
|
COVIDX-Net
|
90.0
|
-
|
Narin et al. [22]
|
Chest X-ray
|
50 COVID-19(+)
50 COVID-19 (-)
|
Deep CNN ResNet-50
|
98
|
-
|
Sethy and Behra [23]
|
Chest X-ray
|
25 COVID-19(+)
25 COVID-19 (-)
|
ResNet50+ SVM
|
95.38
|
-
|
Zheng et al. [26]
|
Chest CT
|
313 COVID-19(+)
229 COVID-19(-)
|
UNet+3D Deep Network
|
90.8
|
-
|
Wang et al. [27]
|
Chest CT
|
195 COVID-19(+)
258 COVID-19(-)
|
M-Inception
|
82.9
|
-
|
Xu et al. [28]
|
Chest CT
|
219 COVID-19(+)
224 Viral pneumonia
175 Healthy
|
ResNet+Location Attention
|
-
|
86.7
|
Ying et al [29]
|
Chest CT
|
777 COVID-19(+)
708 Healthy
|
DRE-Net
|
86
|
-
|
Proposed Study CCBlock
|
Chest X-ray
|
310 COVID-19(+)
654 Healthy
864 Pneumonia(virus &bacteria)
|
VGG-16+CCBlock
|
98.86
|
95.51
|