In this section, the materials and methods used in this research paper will be discussed in detail. Figure 2 shows the block diagram of the proposed methodology.
5.1 Dataset
The original dataset which was published by kermany et. al. [17] is consisting of three main folders: the training, testing, and validation folders; and inside each folder there are two subfolders one of them contains pneumonia chest x-ray images while the other contains normal chest x-ray images. A total of 5,852 Chest x-ray images of anterior-posterior cross-section were carefully chosen from retrospective pediatric patients between 1 and 5 years old [15]. The entire pneumonia chest x-ray images were named with bacteria or virus and these labels were used to split the pneumonia folder into two subfolders: viral pneumonia and bacterial pneumonia. Because of the small size of validation and testing images, and in order to balance the proportion of data assigned to the entire dataset, the original data categories were modified and combined, then the entire data was rearranged into a training set, validation, and testing sets with portion of 70%, 15%, and 15%, respectively. A total of 4,097 images were allocated to the training set and 877 images were assigned to the validation sets to improve the validation accuracy and 878 images allocated for testing dataset for test the system during K-fold process.
Table 1 The distribution of images used in the system.
Case
|
Number of Training Images
|
Number of Validation Images
|
Number of Testing Images
|
Total Number of Total Images
|
Normal
|
1,107
|
237
|
237
|
1,581
|
Pneumonia Bacterial
|
1,945
|
416
|
417
|
2,778
|
Pneumonia Viral
|
1,045
|
224
|
224
|
1,493
|
Total
|
4,097
|
877
|
878
|
5,852
|
5.2 CNN Architecture
The modified CNN model that has been used in this work was initially proposed by Alqudah [18]. Any CNN model consists of two major stages: the feature extractions stage and the classification stage. Each stage contains a set of layers; the feature extraction layers immediately take the previous layer’s output as input, and its output is passed to the next layer as an input. While the classification stage layers are placed at the end of the CNN model [9, 10, 18]. The classifier layer requires individual features (vectors) as input to perform computations like any classifier. Figure 3 shows the modified model that has been used, while Table 2 shows the details of the modified CNN model.
Table 2 Layers Information for Proposed CNN Architecture.
#
|
Layer
|
Information
|
#
|
Layer
|
Information
|
1
|
Input Layer
|
Size
|
64*64
|
9
|
Maxpol_2
|
Kernel Size
|
2*2
|
Stride
|
2*2
|
2
|
Conv_1
|
Number of Filters
|
32
|
10
|
Conv_3
|
Number of Filters
|
32
|
Kernel Size
|
3*3
|
Kernel Size
|
3*3
|
Activation
|
RELU
|
Activation
|
RELU
|
3
|
Batch_Norm_1
|
Number of Channels
|
32
|
11
|
Batch_Norm_3
|
Number of Channels
|
32
|
5
|
Maxpol_1
|
Kernel Size
|
2*2
|
13
|
Maxpol_3
|
Kernel Size
|
2*2
|
Stride
|
2*2
|
Stride
|
2*2
|
6
|
Conv_2
|
Number of Filters
|
16
|
14
|
Conv_4
|
Number of Filters
|
32
|
Kernel Size
|
3*3
|
Kernel Size
|
3*3
|
Activation
|
RELU
|
Activation
|
RELU
|
7
|
Batch_Norm_2
|
Number of Channels
|
16
|
15
|
Batch_Norm_4
|
Number of Channels
|
32
|
5.3 Deep Feature Extraction using CNN
The modified CNN model (AOCT-Net), initially proposed and designed by Alqudah [18], was retrained on Chest x-ray images dataset and then used for deep feature extraction process. This CNN model was designed for classification of OCT images into five different classes and used in this research as deep features extraction from chest x-ray images. In this paper, the FC is used as feature extraction layer, this layer precedes the classification layer (SoftMax Classifier); i.e., it will produce features vectors contain three features, each of which is used to describe one type of the classes [19, 20]. Such feature extraction technique is very efficient and able to extract very deep and selective features that are very representative for the entered data especially when the used CNN is designed well [18]. The number of extracted features from this method is the same as the number of classes where each feature is responsible for representing a certain class. Features space extracted using such method consists of an array of features ( ) where represents number of entered data (Signals or Images) and is the number of classes [19].
5.4 Class Activation Mapping (CAM)
CAM is used to visualize the results of the use of CNN to localize the targeted image regions for feature extraction. The probability for each class of a single image predicted using the trained CNN for each class gets mapped back over the input image to the final convolutional layer of the respective network to highlight the discriminative regions that are specific to each class [20]. The CAM for a specific class will result from the activation map of the last ReLU (Rectified Linear Unit) layer of the CNN which usually precedes the fully connected layer or after the final convolutional layer. Using this method, we can determine how much each activation contributes to the final score of that particular class. Therefore, it allows distinguishing the areas within an image that differentiates the class specificity prior to the softmax layer, which leads to the probability predictions [20].
5.5 Classification Stage
After feature extraction, the classifier is needed to find the corresponding class for every input test image. In literature, different types of classification algorithms have been used to accomplish this task, such as Support SVM, KNN, and ANN. In this research paper, SVM and KNN have been trained using 10 K-Fold techniques to generalize the classification model.
5.5.1 Support Vector Machine (SVM) Classifier
SVM is one of the known and most widely used supervised machine learning algorithms which is mainly used for classifying data into two main categories and later on has been expanded for multiclass classification [20]. During the training of SVM, it uses a specified training partition of the data to build a model that represents a hyperplane model used for expecting the new testing partition of the class. The main simple idea of the SVM is to find the best hyperplane that is able to separate the training dataset into two classes. This hyperplane will maximize the margin between the nearest data point and the hyperplane [21]. Since introducing SVM, it has been successfully applied to a wide range of medical applications including breast cancer diagnosis [21], melanoma skin cancer [23], and histopathological slices recognition [24].
5.5.2 K-Nearest Neighbor (KNN) Classifier
KNN is a well-known and widely used unsupervised machine learning algorithm which is mainly used for clustering the input data into main clusters (categories) [20]. KNN algorithm can be used for two main types of problem: classification and regression. KNN has different properties such as it is simple, lazy, non-parametric, and instant-based learning [25]. For this kind of problems, the input data vector must consist of the feature data (space) while the output data vector contains the class member that is obtained using the majority vote technique from it is neighbor's classes. The majority voting technique is applied to the weights representing the distance between each feature space point and the center of mass of the input data vector [20, 25].
5.6 Performance Evaluation
In any AI based system there is must be an evaluation of the system performance regarding any new data. To evaluate the performance of the proposed hybrid system, the original annotations of the x-ray chest images have been compared to the same images annotations generated by the system. Then based on these annotations, the accuracy, sensitivity, precision, and specificity have been calculated. These measures indicate how precisely the x-ray chest images are diagnosed [26]. To compute these measures, four different types of statistical values are computed which are TP, FP, FN and TN [27, 28]. Then using these values, the mentioned measurements have been computed as follows: