Arti�cial Intelligence Framework for E�cient Detection and Classi�cation of Pneumonia Using Chest Radiography Images

Abstract


Background
Many terms such as peripleumoniacon, pleurisy, and peripneumony were used by ancient Romans and Greeks to describe an illness which includes many conditions that are currently known as pneumonia; and later in the 19 th century, a scientist called Laennec distinguished 'pleurisy' from pneumonia, and after that another scientist named Rokitansky recognized the bronchopneumonia and lobar pneumonia as different pathological entities [1,2].Nowadays, in the US only as an example, there are more than 1 million pneumonia patients that are hospitalized with approximately 50,000 deaths.Globally, around 450 million people get infected with pneumonia per year, amongst which 4 million people die [2].Currently, the best available, and most commonly used, technology to diagnose pneumonia is Chest x-ray imaging which plays a vital and crucial role in the daily clinical care of pneumonia patients [2,3].
The development and evolution of computerized technologies especially AI with its different branches have made the detection and diagnosis of diseases more accurate and made medical decisions more precise and effective [4,5].The detection and classi cation of pneumonia has gained the attention of researchers who have provided different types of ML and DL techniques for pneumonia detection with acceptable accuracy [3,5,6].On the other hand, one of the most challenging tasks in detecting pneumonia using chest x-ray images is that this process completely depends on the availability of expert radiologists who have the ability to correctly detect pneumonia and requires laboratory tests to differentiate its viral type from bacterial type [3,6,7].During traditional detection of pneumonia, expert radiologists search for white spots on the chest x-ray image that indicate an infection; in addition to the bright areas that represent pneumonia uids in the lungs [8,9].However, because of the limited color distribution of x-ray images (Gray Levels), it will consist of shades of black and white which makes it a challenge to decide whether there is a pneumonia infection in the lungs or not [6,9].Figure 1 shows Chest x-ray images with normal, viral pneumonia, and bacterial pneumonia conditions.
In the current research work, a hybrid arti cial intelligence methodology that combines deep learning and machine learning techniques has been proposed.In this methodology, a CNN has been applied and used for feature extraction instead of conventional methods to detect and classify pneumonia, the CNN was applied using different input image sizes to nd the optimal size, and then these extracted features were fed into two different machine learning algorithms (SVM and KNN); in addition to the softmax classi er which is the default classi er for CNN.In general, the original contributions of this work include proposing a new CNN architecture to detect pneumonia from Chest x-ray images rather than AlexNet, GoogleNet, ResNet, and DenseNet; as well as using the proposed architecture to extract automated deep features and pass them to machine learning algorithms to detect and classify the pneumonia.

Literature Review
Deniz et.al. [10] provided a deep learning-based method for pneumonia detection from chest x-ray images.They proposed a methodology with modi ed contrast images to increase the accuracy of the proposed model.They used digital image processing techniques that were able to enhance the ROI in which the ResNet based CNN model focus during the training process, the authors reported an accuracy of 78.73%.On the other hand, Okeke et.al. [11] proposed another deep learning approach which was e cient in pneumonia classi cation.The authors proposed a new dense CNN architecture designed by them that consisted of 8 layers only, the authors provided a comparison between different input image sizes and found that the best input size was 200×200×3 which scored a validation accuracy of 93.73%.Benjamin et.al. [12] proposed a supervised learning technique for pneumonia detection.The technique was based on deep learning methods using DenseNet based architecture which was trained using the adaptive method on 5606 random images with a size of 32×32 because of memory limitation.The authors reported an AUC of 0.609 for the proposed network.Saraiva et.al. [13] proposed a neural network-based methodology for pneumonia detection using both MLP and CNN.The authors used the K-Fold cross-validation for training the models and achieved an accuracy of 94.40% and 92.16% for CNN and MLP, respectively.Gu et.al. [14] proposed a methodology for the classi cation of pneumonia chest x-ray images from three datasets: JSRT, MC, and Guangzhou Women and Children's Medical Center with total number of 4892 images.The authors at rst segmented the region of interests (ROI) which was the right and left lungs using FCN model using pre-trained AlexNet model.After the segmentation of the lungs, different types of features were extracted including wavelet features, handcraft features, HOG features, GLCM features, and DCNN features.Finally, these features were fed separately to SVM, and the best result was obtained when using DCNN features with an overall accuracy of 80.48%.
Rahman et.al. [15] proposed a transferee learning-based methodology for classi cation and detection of pneumonia (both viral and bacterial).The authors proposed a comparison study between different predesigned CNN architectures to nd the best one that achieved the highest performance.The results showed that DenseNet201 had the best performance among all used architectures where it scored an accuracy of 98% for normal vs pneumonia and 93.3% for normal vs viral pneumonia vs bacterial pneumonia; while the accuracy was 95% for viral pneumonia vs bacterial pneumonia.
On the other hand, Rajaraman et.al. [16] proposed another transferee learning-based methodology for classi cation and detection of viral and bacterial pneumonia.The methodology was tested among two datasets baseline and cropped ROI using four different architectures (customized VGG16, sequential, inception, and residual).The customized VGG16 outperformed all other architectures and achieved an accuracy of 95.7%, 93.6%, and 91.7% for normal vs pneumonia, normal vs viral pneumonia vs bacterial pneumonia, and viral pneumonia vs bacterial pneumonia, respectively, using baseline dataset.While it achieved an accuracy of 96.2%, 93.6%, and 91.8% for normal vs pneumonia, normal vs viral pneumonia vs bacterial pneumonia, and viral pneumonia vs bacterial pneumonia, respectively, using cropped ROI dataset.

Results And Discussion
The hardware environment for testing the proposed algorithm was a desktop computer with Intel Core I7-6700 at 3.4 GHz and 16 GB of RAM and the code has been executed using a parallel environment.At rst, the system has been tested for different input image sizes to select the best size that is able to extract the best features to distinguish between the three classes.In this stage, the data was separated into two subsets; 70% of data was used for training and the other 30% was for validation to feed all CNNs. Figure 4 shows the training accuracy and loss for different input image sizes, while Table 3 shows the performance evaluation of the four different input sizes.Figure 5 shows the class activation maps for different cases using the last ReLU layer of the used CNN architecture.The CAM shows that the system has selected the proper ROI to distinguish between the three cases correctly.
Based on the results obtained and shown in Table 3, the best image input size was 64×64; and so, this input size was chosen for further processes and for feature extraction from FC layer.Figure 6 (A) shows the extracted features space from the whole dataset for two classes (Normal and Pneumonia).While Figure 6 (B) shows the extracted features space from the whole dataset for the three classes (Normal, Bacterial Pneumonia, and Viral Pneumonia).
As shown in Figure 6 A, Pneumonia can be detected easily and accurately using these features because the classes are well separated.However, dividing pneumonia into bacterial and viral types make it more di cult to classify them with a high accuracy because of their overlapped features as shown in Figure 6 B. This overlap of features spaces requires a classi er that is able to discriminate the classes effectively which cannot be accomplished by SoftMax classi er.So, after the feature extraction was executed, the extracted features were fed to two different types of classi ers (KNN and SVM).Because there was an overlap between the bacterial and viral features, the classi ers have been trained using 10 K-Fold Cross-Validation methodology.
K-Fold Cross-Validation methodology was used to make a generalization for classi ers models and make a perfect tting on data by dividing the data (features space) into K groups, and Cross-Validation ensured that each fold from the K-Folds has been used as a testing set.The KNN and SVM classi ers have been trained using 10 K-Fold Cross-Validation and the obtained models have been saved for further results analysis.For further analysis of the extracted features and the used classi ers, more performance evaluation metrics (Sensitivity, Speci city, and Precision) have been calculated using the generated confusion matrices.Based on these values, the used classi ers provided outperforming results and were able to discriminate between the normal and pneumonia and between the two types of pneumonia: the bacterial and viral types.Table 4 shows the performance evaluation of the used two classi ers to distinguish between the three classes (Normal vs Bacterial Pneumonia vs Viral Pneumonia) where SVM classi er slightly outperformed the KNN classi er; i.e, the used classi ers were both suitable for the extracted features with a high e ciency of classi cation.As an extended analysis for the classi ers, performance evaluation measures (Accuracy, Sensitivity, Speci city, and Precision) have been calculated for each single data class.Table 5 shows the performance evaluation of the three classes for each classi er.It is shown that the lowest performance of the system was for classifying viral pneumonia, while the highest was for the Normal class.In general, as noticed from Figures ( 7 and 8) and Tables 4 and 5, the used methodology and classi ers have a high performance for detection and classi cation of Pneumonia from Chest x-ray images, and it outperforms the CNN using Softmax classi er which had a maximum accuracy of 80.07% as seen in Table 3.The performance evaluation values of both Bacterial Pneumonia and Viral Pneumonia were lower than the Normal class because it is a di cult task to detect and classify pneumonia solely from chest x-ray images, in addition to the required knowledge of disease pathology and human anatomy.

Conclusion And Future Work
In this research study, an original new arti cial intelligence system has been proposed and evaluated for the detection and classi cation of bacterial Pneumonia and viral Pneumonia as well as normal cases using chest x-ray images.The hybrid arti cial intelligence model has been built using a CNN model which pretrained on other medical images (OCT images), the proposed methodology was different from the other methods in the literature that depended heavily on the transfer learning approach using a pretrained CNN architecture and modi ed version of them only.The results of the current study indicated that the proposed hybrid system outperformed previous systems and it was able to detect and classify Pneumonia e ciently (accuracy of 94%) from chest x-ray images.In future, this work will be extended and enhanced to detect other pulmonary diseases using chest x-ray images.

Materials And Methods
In this section, the materials and methods used in this research paper will be discussed in detail.Figure 2 shows the block diagram of the proposed methodology.

Dataset
The original dataset which was published by kermany et.al. [17] is consisting of three main folders: the training, testing, and validation folders; and inside each folder there are two subfolders one of them contains pneumonia chest x-ray images while the other contains normal chest x-ray images.A total of 5,852 Chest x-ray images of anterior-posterior cross-section were carefully chosen from retrospective pediatric patients between 1 and 5 years old [15].The entire pneumonia chest x-ray images were named with bacteria or virus and these labels were used to split the pneumonia folder into two subfolders: viral pneumonia and bacterial pneumonia.Because of the small size of validation and testing images, and in order to balance the proportion of data assigned to the entire dataset, the original data categories were modi ed and combined, then the entire data was rearranged into a training set, validation, and testing sets with portion of 70%, 15%, and 15%, respectively.A total of 4,097 images were allocated to the training set and 877 images were assigned to the validation sets to improve the validation accuracy and 878 images allocated for testing dataset for test the system during K-fold process.

CNN Architecture
The modi ed CNN model that has been used in this work was initially proposed by Alqudah [18].Any CNN model consists of two major stages: the feature extractions stage and the classi cation stage.Each stage contains a set of layers; the feature extraction layers immediately take the previous layer's output as input, and its output is passed to the next layer as an input.While the classi cation stage layers are placed at the end of the CNN model [9,10,18].The classi er layer requires individual features (vectors) as input to perform computations like any classi er. Figure 3 shows the modi ed model that has been used, while Table 2 shows the details of the modi ed CNN model.

Deep Feature Extraction using CNN
The modi ed CNN model (AOCT-Net), initially proposed and designed by Alqudah [18], was retrained on Chest x-ray images dataset and then used for deep feature extraction process.This CNN model was designed for classi cation of OCT images into ve different classes and used in this research as deep features extraction from chest x-ray images.In this paper, the FC is used as feature extraction layer, this layer precedes the classi cation layer (SoftMax Classi er); i.e., it will produce features vectors contain three features, each of which is used to describe one type of the classes [19,20].Such feature extraction technique is very e cient and able to extract very deep and selective features that are very representative for the entered data especially when the used CNN is designed well [18].The number of extracted features from this method is the same as the number of classes where each feature is responsible for representing a certain class.Features space extracted using such method consists of an array of features ( ) where represents number of entered data (Signals or Images) and is the number of classes [19].

Class Activation Mapping (CAM)
CAM is used to visualize the results of the use of CNN to localize the targeted image regions for feature extraction.The probability for each class of a single image predicted using the trained CNN for each class gets mapped back over the input image to the nal convolutional layer of the respective network to highlight the discriminative regions that are speci c to each class [20].The CAM for a speci c class will result from the activation map of the last ReLU (Recti ed Linear Unit) layer of the CNN which usually precedes the fully connected layer or after the nal convolutional layer.Using this method, we can determine how much each activation contributes to the nal score of that particular class.Therefore, it allows distinguishing the areas within an image that differentiates the class speci city prior to the softmax layer, which leads to the probability predictions [20].

Classi cation Stage
After feature extraction, the classi er is needed to nd the corresponding class for every input test image.
In literature, different types of classi cation algorithms have been used to accomplish this task, such as Support SVM, KNN, and ANN.In this research paper, SVM and KNN have been trained using 10 K-Fold techniques to generalize the classi cation model.

Support Vector Machine (SVM) Classi er
SVM is one of the known and most widely used supervised machine learning algorithms which is mainly used for classifying data into two main categories and later on has been expanded for multiclass classi cation [20].During the training of SVM, it uses a speci ed training partition of the data to build a model that represents a hyperplane model used for expecting the new testing partition of the class.The main simple idea of the SVM is to nd the best hyperplane that is able to separate the training dataset into two classes.This hyperplane will maximize the margin between the nearest data point and the hyperplane [21].Since introducing SVM, it has been successfully applied to a wide range of medical applications including breast cancer diagnosis [21], melanoma skin cancer [23], and histopathological slices recognition [24].

K-Nearest Neighbor (KNN) Classi er
KNN is and widely used unsupervised machine learning algorithm which is mainly used for clustering the input data into main clusters (categories) [20].KNN algorithm can be used for two main types of problem: classi cation and regression.KNN has different properties such as it is simple, lazy, non-parametric, and instant-based learning [25].For this kind of problems, the input data vector must consist of the feature data (space) while the output data vector contains the class member that is obtained using the majority vote technique from it is neighbor's classes.The majority voting technique is applied to the weights representing the distance between each feature space point and the center of mass of the input data vector [20,25].

Performance Evaluation
In any AI based system there is must be an evaluation of the system performance regarding any new data.To evaluate the performance of the proposed hybrid system, the original annotations of the x-ray chest images have been compared to the same images annotations generated by the system.Then based on these annotations, the accuracy, sensitivity, precision, and speci city have been calculated.
These measures indicate how precisely the x-ray chest images are diagnosed [26].To compute these measures, four different types of statistical values are computed which are TP, FP, FN and TN [27,28].
Then using these values, the mentioned measurements have been computed as follows:     The Class Activation Mapping (CAM) for Three Cases (Normal, Bacterial Pneumonia, and Viral Pneumonia) using Last ReLU Layer.

Figure 7 A
and B show the Confusion Matrix for the KNN classi er and the ROC of the same classi er, while Figure 8 A and B show the Confusion Matrix for the SVM classi er and the ROC of the same classi er.

Figures Figure 1
Figures

Figure 2 Block
Figure 2

Figure 3 The
Figure 3

Figure 4 Training
Figure 4

Table 3
Validation Performance Evaluation for Different Input Image Sizes.

Table 4
Performance Evaluation for the Used Two Classifiers.

Table 5
Performance Evaluation of Each Data Class for the Used Two Classifiers.

Table 1
The distribution of images used in the system.

Table 2
Layers Information for Proposed CNN Architecture.