Deep learning techniques for prediction of pneumonia from lung CT images

Pneumonia disease is caused by viruses and bacteria which affect one or both lungs. It is the most dangerous disease that causes huge cancer death worldwide. Early detection of Pneumonia is the only way to improve a patient’s chance for survival. We can detect this disease from X-ray or computed tomography (CT) lung images using deep learning techniques. This research paper provides a solution to medical practitioners in predicting the impact of virus as high-risk, low-risk and medium-risk among the population being tested through various deep learning techniques such as convolutional neural networks, artificial neural network (ANN) and recurrent neural networks using long short-term memory cells. We observed 3000 CT images of Pneumonia confirmed patients and achieved the accuracy resulting 98–99%. The performance of the classifiers is evaluated using parameters such as confusion matrix, accuracy, F-measure, precision and recall. The results prove that deep learning affords a fitting tool for fast screening of Pneumonia and discovering high-risk patients and preventing them by providing suitable medical remedies.


Introduction
The main objective of the research paper is to predict the stages of Pneumonia cases accurately in cost effective manner with short time span using LSTM.In order to predict the severity of disease we used Deep Learning Architectures and tested the result against the ve CNN architectures.Due to time constraint we have conducted the experiment of dataset with these ve CNN architectures and in future we have planned to extend it to classify stages of severity.
Arti cial Intelligence (AI) is a promising tool and contributed forcefully to predict Pneumonia from Lung CT images [1].The conventional imaging work ow completely depends on human labor whereas AI facilitates secure, precise and pro cient imaging resolution without human participation.AI plays vital role in cracking the complex problems in various elds, such as engineering, medicine and psychology [2][3].
Modern applications empowered with AI takes the X-ray and CT-scan images of lungs and it segment the affected region from the provided lung images.Apart from prediction, the proposed model also identi es the stages of severity.This demonstrate how extreme and toward what level, AI plays crucial role in advancement of health care organization on worldwide [7].AI tools have been utilized to help physician in predicting the severity of virus in less time and with high accuracy.
On the other hand, an additional dilemma that physician and researchers need to deal is large volume of data, called big data.Big data in healthcare division plays major role in clinical management, particularly in the eld of radiology by using Deep Learning (DL) techniques [13].DL is the division of Machine Learning (ML), it consists of several layers of Arti cial Neural Networks (ANN), so as to provide a diverse analysis of the data being fed into input layers.In this chapter, we propose a Deep Convolutional Neural Network (DCNN), an advanced tool to assist the physician to identify and predict the stages of Pneumonia using CT scan images of lungs.The proposed study is based on the Residual Neural Network (RNN) and it is assembled by using multiple parallel layers with varying kernel sizes to sense global and local features and by linking the residuals to further layers to pass information.The model is trained with 3000 CT lung images to predict the severity of pneumonia.In addition, the results were compared with other popular DL algorithms.DCNN identify the severity of the disease accurately in extremely short time and helps in premature diagnosis.Also chest x-ray is cost effective when compared to other radiographic images.

Organization of the Paper
The research paper is organized as follows.Section 2 summarizes the literature survey of Pneumonia prediction.Section 3 includes the proposed Architecture.Section 4 elaborates the data set, experimental setup and performance measure of proposed model.Section 5 includes Conclusion and future enhancement.

Literature Survey
The study and prediction of Pneumonia were seriously considered due its cruel effect.It is hard to identify exposed personnel, since symptoms cannot be determined instantly.AI is an alternative tool to predict this lung disease earlier compared to usual time consuming methods.Even though there were plentiful studies on this prediction, this paper pays attention towards ML and DL in predicting and diagnosing patients infected by Pneumonia through CT scan images of lung.ML is the subset of AI that train machines with statistical models in order to learn and detect patterns from training data.The test data were then compared against the generated pattern obtained from trained data to measure the accuracy.
The utilization of ML methodologies was rapidly increased in health care eld [1][2][3][4].Deep Learning is the subset of ML, which produces higher accuracy when compared to other ML techniques.The Convolutional Neural Network (CNN) had found to be the most promising technique to produce good results compared to other DL models [5][6].
X-ray apparatus were used to scan parts of the body, when people affected by lung diseases, injuries and bone dislocations.Whereas, CT scans are superior x-ray devices used by doctors for identifying internal organ damages [4].X-rays were preferred because it was cost effective and less harmful to radiation when compared with CT scan They are computer systems [7][8][9][10] that try to emulate some of the functions of living organisms.This means that they are made up of elements that mimic (in basic functions) the behavior and organization of the organism.The human brain can learn from experience and generalize from previous input to completely new input to predict an outcome.A neural network gains experience by analyzing data (Training) to determine behavior rules [11][12][13], based on which it can make predictions about new cases.

Pneumonia Prediction module
The X-ray images of lung are given as input to model.The images are pre-processed using pre-processing module.To remove the unnecessary details from chest X-rays, we perform the following over the images: resizing, shu ing, and normalization in-order to extract the needed information for further processing.Data augmentation like rotation, scaling was also carried out.The images were resized to 224 * 224 pixels.Then the pre-processed x-ray images were trained using the Recurrent Neural Networks (RNN).
In conventional Neural Network, the outputs and inputs were independent from each other.To solve this problem, we use RNN, in which the outputs from k-1 stride were fed as input to the k th stride.RNN was widely used because of its hidden layer, which plays a vital role in remembering the sequence of information.It remembers the sequence of calculated information using a "memory".The same weight and bias were applied to all the hidden layers, this diminish the complication of rising parameters.Here x, h and y indicates input, activation function and output, where W hx , W hh and W yh indicates weight applied to hidden layer, input layer and output layers.It's shown in Figure .2.The obtained output from the last stride was compared with the actual output and the error was calculated, then error (gradient) was back propagated to network to adjust the weight accordingly to obtain expected result.RNN was not able to remember the long sequence of information.Thus the ow of gradient in RNN leads to two major problems: vanishing gradient and exploding gradient.Gradient was computed by recurrent multiplication of derivatives.So, if the generated gradient was too small it causes vanishing gradient and if it was greater than the threshold, it leads to exploding gradient.
To overcome the above problems, we were moving to LSTM and GRU

Long Short-Term Memory:
LSTM can remember long sequence of information with the help of gates.These gates control the ow of information within the net by eliminating unnecessary information from preceding steps and fed the needed information to the succeeding steps.Architecture of LSTM was shown in Figure .3 The LSTM consist of three gates: forget gate, input gate and output gate.These three gates determine the cell state (act as memory) which was the central part of LSTM.The information were added or removed from the cell based on the gates.x t denotes current input.C t indicates the content of latest cell state and C t−1 denotes the cell state of previous LSTM unit.h t denotes the current output and h t−1 represent previous LSTM unit's output.W f , W i and W o were the weight applied to forget gate, input gate and output gate and b i , b f and b o were the bias applied to forget gate, input gate and output gate.It includes two functions (1) Sigmoid function (σ), (2) Tanh Function: Sigmoid activation function was used in all three gates, which converts the output value to stay in the range of 0 to 1. Similarly Tanh activation function converts the output to fall in the range of -1 to 1 which was used in input and output gate.Using these functions the network can learn itself about the data which was important and which was not important.
Therefore gates plays vital role that decides which information to be kept and which one to be discarded.

Forget gate:
The inputs to forget gate were h t−1 and x t .Depending on the value of inputs, the sigmoid function decides the value of C t−1 .If the output of σ was closer to 0, indicates that it was useless and hence it could be discarded from C t−1 and if it was closer to 1, indicates that it was useful information, hence it is kept in C t−1 .[35].Equation for forget gate was given in (1).
Input Gate: This gate was used to update cell state.Inputs to the sigmoid function of input gate were h t-1 and x t .
Similarly the same inputs were applied to Tanh function.The outputs obtained from both the functions were multiplied and the sigmoid function nalizes what should be kept from Tanh output.The equations of input gate were given in (2) and ( 3) Output Gate: The sigmoid function gets the two inputs current input (  ) and output of preceding hidden layer (ℎ −1 ).
The updated cell state was given as input to Tanh function.The outputs from sigmoid and Tanh functions were multiplied to obtain the next hidden layer's input.The equation is given below in (4) and ( 5) Cell State: The cell state c t was multiplied by the output obtained from forget gate.The output from forget decides whether to keep the cell state as it was earlier or to update it.Then the value of cell state c t was added with the output obtained from input gate in order to update the cell state to new value.The equation of cell state was stated in (6).

Gated Recurrent Unit (GRU):
GRU was similar to LSTM, it uses only two gates (1) Reset Gate (2) Update Gate, where as LSTM uses three gates.GRU use only fewer gates, hence it was bit faster than LSTM, but slight variation in performance.The update gate was similar to input and forget gate of LSTM.The reset gate decides at what extent the past information had to be discarded.Unlike LSTM, GRU does not use cell gate and output gate.
The equations of GRU cell were stated below from ( 9) to ( 12): r t = sigm(W xr x t +W hr h t−1 +b r ) (9) z t = sigm(W xz x t +W hz h t−1 +b z ) (10) h˜t = tanh(W xh x t +W hh (r t h t−1 ) + b h ) (11) h t = z t h t−1 +(1 − z t )h˜t (12) where r t , z t , x t , ht are the reset gate, update gate, input vector and output vector, respectively.Whereas, W and b indicates weight factor and bias.sigm and tanh denotes the sigmoid and tangent activation function.
We trained the model using two architectures of RNN such as LSTM and GRU.The features obtained from both architectures were independently used to predict Pneumonia.If the result of the prediction is positive then we detect the severity of the patient using their CT scan.Since CT scan provides more details when compared to X-Ray images.Also if the prediction of Pneumonia is negative, then there is no need of taking the CT scan.The severity of Pneumonia is identi ed using Convolution Neural Networks (CNN) with several architectures like LeNet-5, AlexNet, VGG-16, GoogLeNet and ResNeXt-50.The results obtained from individual architecture are tabulated and being compared to detect the severity of the disease.

Severity Prediction Module:
When Pneumonia was con rmed for the patient, we recommend the concern to take the CT scan and it was pre-processed.Then the processed image was analyzed using Convolution Neural Network (CNN).The features parameters of CNN models were determined without human intervention by Grid Search.It added advantage of using CNN models CNN architecture was a brilliant architecture as it mimics the neural pattern of human brain.It consists of several layers.Each layer has set of neurons which analyze every portion of the image.CNN compares the image portion by portion.The portion it looks for was called lters or features.It extracts the image features and converts them to lesser dimension with no loss of image characteristics.To do so, it has following layers.

Input Layer
The CT scan image was given as input to the input layer.Before feeding it to input layer, convert the image to single dimension as column matrix.For example, if the image dimension was 32 * 32, we have to reshape it into a single column 1024 * 1. (i.e.)If the training sample was "n", then input dimension would be (1024, n).

Convolution Layer
The image features were extracted using this layer.Hence it was also called as feature extractor layer.It was mainly used to extract important features from the input image.This layer contains numerous lters that do convolution operation shown in Figure .5.Performing dot product operation between the image pixel (portion of input image with same size as lter) and the lter was termed as convolution operation.The output of this operation was single integer of expected output dimensions.For example if the dimension of input image was 6*6 and the size of the lter was 3*3.Then the expected output dimension was 4*4.Then we glide the lter above the succeeding portion of the input image until we complete the entire image.Thus the output obtained from this layer becomes the input to succeeding layer.The dot product operation was shown below.This layer also includes ReLU (Recti ed Linear Unit) activation function which converts the negative value to zero.It was applied to the output obtained from convolution operation.This layer reduces the dimension of image obtained from convolution layer.This layer was also called as down sampling.It was mainly used to reduce the computational overhead.Either we can perform average pooling (takes the average of all pixels) or max pooling (takes the maximum of all pixels).For example if the dimension of image to the input was 4*4, it converts them to 2*2 after maximum pooling.It is shown in Fig. 6.

Fully Connected Layer (FC)
FC has three layers FC input layer, FC layer and FC output layer (Softmax Layer).The input layer convert the output obtained from pooling layer to single vector.The FC layer applies weight and bias to the features generated from input layer.Softmax layer generates the probability which was used for multiclass prediction.

Architectures of CNN:
To predict the severity of Pnuemonia we have used most popular ve architectures of CNN and they are as follows.curve.The result was shown in detail in Table 3.The severity of the disease was indicated based on the level of infection of chest scan and the clinical data obtained from hospital.The performance graphs for the RNN architecture were shown in Fig. 9 (a): GRU Model, (b): LSTM Model where as the performance graph for CNN architectures model was shown in (c).The loss function for each model was calculated by cross entropy loss or log loss function.

Conclusion And Future Enhancement
The chest of Pneumonia positive cases were obtained and they were trained using LSTM and GRU architecture of RNN algorithm.It was found that LSTM produce very good result over classi cation of Pneumonia positive and negative cases.The results obtained by LSTM were tabulated in Table 1.But it has more time to extract features.GRU takes less time than LSTM for extracting features, but it has some performance degradation compared to LSTM.The performance metric evaluated were Speci city, Sensitivity, Accuracy and F1-Score: In LSTM, we obtained: 99.2, 99.3, 99.2 and 98.9 and in GRU: 98.8, 98.9, 98.7 and 97.6 respectively.
Later the con rmed Pneumonia cases CT scan were obtained and evaluated using various CNN architectures such as LeNet, AlexNet, VGG-16, GoogLeNet and ResNeXt-50.The following performance metric were evaluated for the above CNN architectures such as sensitivity, speci city, accuracy and AUC.The experiment was conducted using 1020 CT scans of chest.The obtained result was tabulated in Table 2.Among them ResNeXt-50 produce very excellent result with sensitivity: 98.77%, speci city: 99.26%, Accuracy: 99.63% and AUC: 99.7%.In this pandemic situation, we trust that the projected

Figure. 1
Figure. 1 depicted architecture of the proposed model.It consists of two modules namely Pneumonia prediction and severity prediction module.

LeNet- 5 :
This was introduced in 1998.It has two convolution layers, sub sampling layers (pooling layer) and three FC layers.It contains nearly 60,000 parameters.The size of the convolution Filter was 5 x 5, and there were 6 * (5 * 5 + 1) = 156 Weights in total, where + 1 indicates the bias.So each pixel in Convolution Layer (C1) was connected to 5 * 5 pixels and 1 bias, resulting in total connection: 156 * 28 * 28 = 122304.AlexNet: It was introduced in 2012.AlexNet consist of ve convolution layers, three fully-connected layers, with nearly 60 million parameters.Input size of image was 224 * 224*3.It was passed to rst convolution layer with 11*11 lter size, 5*5 in layer two and 3*3 in layer three to ve.The window size of max pooling layer was 3*3 was used with stride 2 which was followed by three fully connected layers.It contains ten times more convolution layer than LeNet.This resolved the Over tting issue in dropout layers and also the size max pooling network get reduced.VGG-16: (Visual Geometry Group) This architecture was developed in 2014, rst runner up of ILSVRC 2014.It has thirteen convolution layers and three FC layers.The parameter of VGG was about 138 million.VGGNet was used to get better training time by diminishing the number of parameters (training variables) in the convolution layers.There by the learning rate was faster than AlexNet.The lter size of convolution layer and max pool layers were 3*3 and 2*2 with two strides.GoogLeNet (Inception): It was the winner of ILSVRC 2014 and named after Prof.Yan LeCun's LeNet.When compared to VGGNet it has minimum error rate [49-50].It was released in many version such as V1 (2014), V2 & V3 (2015) and V4 (2016).V1: It consist of 1*1, 3*3 and 5*5 convolution layer along with max pooling layer.Totally it contains 22 layers with 5 million parameters with error rate of 6.67%.V2: To reduce the parameters in V1, batch normalization was done.In this version they converted 5*5 convolution layers to 3*3 convolution layers.It ultimately reduces cost by increasing accuracy with error rate of 4.8%.V3: It consists of 48 layers and the parameters increase to 24 million.It produces high accuracy there by reducing the error rate to 3.58%, which was half the error rate when compared to V1. V4: It was introduced in 2016 with the

Figures
Figures

Table 2
Performance Metrics of Various CNN ArchitecturesThe dataset of 1020 CT scan were used for result analysis.Among them 816 was used for training and 214 was used for testing.The datasets were pre-processed by reshaping it to 224*224.The datasets were trained and tested with different CNN architectures such as LeNet, AlexNet, GoogLeNet,VGG-16 and ResNxt-50.The following performance metric were evaluated from the training and testing dataset.