The modified CNN model that has been used in this work was initially proposed by Alqudah [18]. Any CNN model consists of two major stages: the feature extractions stage and the classification stage. Each stage contains a set of layers; the feature extraction layers immediately take the previous layer’s output as input, and its output is passed to the next layer as an input. While the classification stage layers are placed at the end of the CNN model [9, 10, 18]. The classifier layer requires individual features (vectors) as input to perform computations like any classifier. Figure 3 shows the modified model that has been used, while Table 2 shows the details of the modified CNN model.
Table 2
Layers Information for Proposed CNN Architecture.
#
|
Layer
|
Information
|
#
|
Layer
|
Information
|
1
|
Input Layer
|
Size
|
64*64
|
9
|
Maxpol_2
|
Kernel Size
|
2*2
|
Stride
|
2*2
|
2
|
Conv_1
|
Number of Filters
|
32
|
10
|
Conv_3
|
Number of Filters
|
32
|
Kernel Size
|
3*3
|
Kernel Size
|
3*3
|
Activation
|
RELU
|
Activation
|
RELU
|
3
|
Batch_Norm_1
|
Number of Channels
|
32
|
11
|
Batch_Norm_3
|
Number of Channels
|
32
|
5
|
Maxpol_1
|
Kernel Size
|
2*2
|
13
|
Maxpol_3
|
Kernel Size
|
2*2
|
Stride
|
2*2
|
Stride
|
2*2
|
6
|
Conv_2
|
Number of Filters
|
16
|
14
|
Conv_4
|
Number of Filters
|
32
|
Kernel Size
|
3*3
|
Kernel Size
|
3*3
|
Activation
|
RELU
|
Activation
|
RELU
|
7
|
Batch_Norm_2
|
Number of Channels
|
16
|
15
|
Batch_Norm_4
|
Number of Channels
|
32
|
5.3 Deep Feature Extraction using CNN
The modified CNN model (AOCT-Net), initially proposed and designed by Alqudah [18], was retrained on Chest x-ray images dataset and then used for deep feature extraction process. This CNN model was designed for classification of OCT images into five different classes and used in this research as deep features extraction from chest x-ray images. In this paper, the FC is used as feature extraction layer, this layer precedes the classification layer (SoftMax Classifier); i.e., it will produce features vectors contain three features, each of which is used to describe one type of the classes [19, 20]. Such feature extraction technique is very efficient and able to extract very deep and selective features that are very representative for the entered data especially when the used CNN is designed well [18]. The number of extracted features from this method is the same as the number of classes where each feature is responsible for representing a certain class. Features space extracted using such method consists of an array of features ( ) where represents number of entered data (Signals or Images) and is the number of classes [19].
5.4 Class Activation Mapping (CAM)
CAM is used to visualize the results of the use of CNN to localize the targeted image regions for feature extraction. The probability for each class of a single image predicted using the trained CNN for each class gets mapped back over the input image to the final convolutional layer of the respective network to highlight the discriminative regions that are specific to each class [20]. The CAM for a specific class will result from the activation map of the last ReLU (Rectified Linear Unit) layer of the CNN which usually precedes the fully connected layer or after the final convolutional layer. Using this method, we can determine how much each activation contributes to the final score of that particular class. Therefore, it allows distinguishing the areas within an image that differentiates the class specificity prior to the softmax layer, which leads to the probability predictions [20].
5.5 Classification Stage
After feature extraction, the classifier is needed to find the corresponding class for every input test image. In literature, different types of classification algorithms have been used to accomplish this task, such as Support SVM, KNN, and ANN. In this research paper, SVM and KNN have been trained using 10 K-Fold techniques to generalize the classification model.
5.5.1 Support Vector Machine (SVM) Classifier
SVM is one of the known and most widely used supervised machine learning algorithms which is mainly used for classifying data into two main categories and later on has been expanded for multiclass classification [20]. During the training of SVM, it uses a specified training partition of the data to build a model that represents a hyperplane model used for expecting the new testing partition of the class. The main simple idea of the SVM is to find the best hyperplane that is able to separate the training dataset into two classes. This hyperplane will maximize the margin between the nearest data point and the hyperplane [21]. Since introducing SVM, it has been successfully applied to a wide range of medical applications including breast cancer diagnosis [21], melanoma skin cancer [23], and histopathological slices recognition [24].
5.5.2 K-Nearest Neighbor (KNN) Classifier
KNN is a well-known and widely used unsupervised machine learning algorithm which is mainly used for clustering the input data into main clusters (categories) [20]. KNN algorithm can be used for two main types of problem: classification and regression. KNN has different properties such as it is simple, lazy, non-parametric, and instant-based learning [25]. For this kind of problems, the input data vector must consist of the feature data (space) while the output data vector contains the class member that is obtained using the majority vote technique from it is neighbor's classes. The majority voting technique is applied to the weights representing the distance between each feature space point and the center of mass of the input data vector [20, 25].
5.6 Performance Evaluation
In any AI based system there is must be an evaluation of the system performance regarding any new data. To evaluate the performance of the proposed hybrid system, the original annotations of the x-ray chest images have been compared to the same images annotations generated by the system. Then based on these annotations, the accuracy, sensitivity, precision, and specificity have been calculated. These measures indicate how precisely the x-ray chest images are diagnosed [26]. To compute these measures, four different types of statistical values are computed which are TP, FP, FN and TN [27, 28]. Then using these values, the mentioned measurements have been computed as follows: