Prediction and Detection of COVID-19 from Chest X-Rays using Transfer Learning based Deep Convolutional Neural Networks

With the ongoing outbreak of the COVID-19 global pandemic, the research community still struggles to develop early and reliable prediction and detection mechanisms for this infectious disease. The commonly used RT-PCR test is not readily available in areas with limited testing facilities, and it lags in performance and timeliness. This paper proposes a deep transfer learning-based approach to predict and detect COVID-19 from digital chest radiographs. In this study, three pre-trained convolutional neural network-based models (VGG16, ResNet18, and DenseNet121) have been ﬁne tuned to detect COVID-19 infected patients from chest X-rays (CXRs). The most eﬃcient model is further used to identify the aﬀected regions using an unsupervised gradient-based localization technique. The proposed system uses a classiﬁcation approach (normal vs. COVID-19 vs. pneumonia vs. lung opacity) using three supervised classiﬁcation algorithms followed by gradient-based localization. The training, validation and testing of the system are performed using 21165 CXR images (10192 normal, 1345 pneumonia, 3616 COVID-19, and 6012 lung opacity). Simulation and evaluation results are presented using standard performance metrics, viz, accuracy, sensitivity, and speciﬁcity.


Introduction
The COVID-19 is a deadly disease caused by the newly recognized coronavirus. In December 2019, coronavirus (SARS-COV-2) infected the human body for the first time. As per report published by World Health Organization (WHO), it can spread principally among humans through the droplets formed by the infected persons when they speak, cough, or sneeze. As the droplets are too heavy to travel far, they cannot spread person-to-person without coming in close contact [1]. Although the exact time is unknown, a new study has estimated that the COVID-19 can be viable in the air for up to 3 hours, on copper for 4 hours, and up to 72 hours on plastic and stainless steel. However, the exact answers to these questions are still not agreed upon by the general health research community and currently, under investigation. COVID-19 attacks the lung and damages the tissues of an infected person. At the early stage, some people may not find any symptoms where most people had fever and cough as the core symptoms. Other secondary symptoms could be body aches, sore throat, and a headache could be all possible.
At present, COVID-19 disease is increasing daily due to the lack of quick prediction and detection methods. Quick and accurate detection of the virus is a significant challenge for doctors and health professionals worldwide to reduce the death rate caused by this virus. The standard confirmatory clinical test Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test for detecting COVID-19 is manual, complex, and time-consuming [2]. The limited availability of test-kits and domain experts in the hospitals and rapid increase in the number of infected patients necessitates an automatic screening system, which can act as a second opinion for expert physicians to quickly identify the infected patients, who require immediate isolation and further clinical confirmation.
In recent days, the alternative testing approach using chest radiography (X-ray) is getting popular among medical practitioners [3] due to its wide availability in almost all parts of the world. The major challenge in this imaging approach is to distinguish other lung-related diseases like pneumonia and lung opacity from COVID-19. This open challenge needs a reliable screening mechanism for accurate prediction and detection of COVID-19.
Recently, machine learning and its allied domains like expert systems and deep learning methods are successfully applied in predicting and diagnosing COVID-19 and other diseases [4]. Due to change in symptoms, an expert system based COVID-19 predictor may not produce accurate prediction. Further, these approaches can not identify the affected area in the lungs. In this context, imaging technique processed through deep learning models are a better substitute for above mentioned diagnostic methods [7].
The significant contributions of this paper are depicted below.
i. A transfer learning-based deep CNN framework is proposed for efficient feature extraction from COVID-19 patients' chest X-rays.
ii. Identification of infected areas in lungs is proposed using a unsupervised localization technique. iii. An in-depth evaluation of the system is carried out considering standard performance metrics.
The rest of the paper is organized as follows. Section 2 presents the related work. Section 3 describes the problem formulation and proposed framework. Section 4 briefly outlines the CNN models used in this article. The detailed methodology is explained in section 5 followed by results and observations in section 6. The paper concludes in section 7.

Related Work
Chest X-Ray (CXR) is an important, non-invasive clinical adjunct that plays an essential role in the preliminary investigation of different pulmonary abnormalities. It can act as an alternative screening modality for the detection of COVID-19 or validate the related diagnosis [15]. Expert radiologists interpret the CXR images to look for infectious lesions associated with COVID-19. The earlier studies reveal that the infected patients exhibit distinct visual characteristics in CXR images. However, the manual interpretation of these subtle visual characteristics on CXR images is challenging and require domain expert. Moreover, the exponential increase in the number of infected patients makes it difficult for the radiologist to complete the diagnosis in time.
Deep learning with CNN has been used in disease diagnosis, such as cancer, through image classification. In [16], the authors have proposed two fully convolutional residual networks to produce segmentation, feature extraction and classification result from skin lesion images. A lesion index calculation unit was used to refine the classification results. The results achieved from the deep learning frameworks showed good accuracies (0.912) in cancer diagnosis. The proposed method was tested on 108 patients and found good results for both slice and patient levels. However, their system could use 3D CNNs and other deep learning methods to obtain better cancer diagnosis.
Esteva et al. [17] have demonstrated skin cancer classification by pretrained Inception V3 CNN model on 129,450 clinical skin cancer images and 3374 dermatoscopic images. The CNN was trained end-to-end from the images using pixels and disease labels as inputs. The overall CNN accuracy was 72.1 ± 0.9% (mean ± s.d.), whereas two dermatologists achieved accuracy of 65.56% and 66.0% on a subset of the validation set.
Due to the above cited success and scope of deep learning based framework on chest x-ray image analysis, recently it's usage has shown significant growth on COVID19's patients chest x-rays analysis [5,6]. In [7], the authors have proposed a deep uncertainty-aware transfer learning based framework using four CNN models, namely, VGG16, DenseNet121, ResNet50, and Inception-ResNetV2 for COVID19 prediction and detection. The extracted features by CNN models are then used for multiple classification techniques. The results show that SVM and multi-layer perceptron performs optimally. In [8], the PA views of chest x-ray images of COVID19 patients are analyzed using deep CNN models, viz,InceptionV3, ResXNet and Xception and the accuracy of prediction claimed is about 97.97%. A deep learning based approach using a pre-trained ResNet101 CNN is used in [9] with clinically available COVID19 patient's x-rays as training dataset and mutually exclusive confirmed patients' data as testing dataset with a prediction accuracy of 71.9%.
In [10], a modified deep CNN model is proposed by combining Xception and ResNet50V2 with a claimed average accuracy of 99.50%. A deep learning based help alert system is proposed in [11] for high risk COVID19 patients by utilizing a 3D densely connected CNN model. In [12], a mobile application is developed using deep lightweight neural network which can take chest xrays as input for COVID19 screening and radiological trajectory prediction. An iteratively pruned deep learning model ensemble is proposed in [13] using chest x-rays for COVID19 detection with a claimed accuracy of 99.01%.
Motivated by the above discussed present scope and limitation of existing works in the field X-rays images and its usage of COVID19 prediction and identification, the formulation of the cited problem and the proposed framework is described in the following section.

Problem Formulation & Proposed Framework
In this section, the COVID19 prediction and identification problem is mapped to a multi-class classification problem and the corresponding transfer learning based deep CNN framework is described.

COVID19 Prediction as Multi-class Classification Problem
We aim to classify a digital frontal-view chest x-ray image into the following classes: COVID-19, Lung Opacity, Normal, and Viral Pneumonia. It can be viewed as a multi-class classification problem, and therefore we have used Multinomial Logistic Regression loss as our loss function.
For a single example image i, the loss L i , can be calculated by computing the sof tmax for the correct class' score, S yi [14]: sof tmax(S yi ) = e Sy i j e Sj followed by negative log-likelihood: where S j is the score vector, output by the model. Intuitively, the sof tmax function output can be interpreted as probabilities, as it squashes the class scores to a range between zero to one. Furthermore, the class with the highest probability is considered the predicted class.
The above formulation is applicable only when a single image is considered, but while training a batch of images is considered as input. Mathematically, the loss for a batch is nothing but the average of losses for each image in the batch. Moreover, this is what the models try to minimize when they are being trained.
The loss L, for a batch with n images can be formulated as shown below [14]: where S yn is the nth image's correct class' score, and S n is the nth image's score vector.
Using the above problem formulation, the following section depicts the proposed transfer learning based approach and its relevance.

Transfer Learning-based Framework
Transfer learning is a technique that focuses on reusing the knowledge gained from one task to perform another similar task. This process significantly reduces the time required for training, as the pre-trained weights already contain vital information, making it a time and resource-efficient method. As the pretrained models are trained on large datasets, using this technique to train small datasets helps overcome the limited data barriers, like in our case.
We considered three such pre-trained models for performing the task of image classification on chest X-rays. The considered models are VGG16, ResNet18 and DenseNet121 [18]. These models are trained on the ImageNet dataset [19], with over fourteen million images belonging to a thousand different classes. Although the images are not trained on medical imagery, their complex feature extraction capabilities will be crucial for classifying medical imagery. The classifier/output layer of the pre-trained model classifies a thousand classes, but that is not the case for our problem; we aim to classify only four classes (COVID-19, Lung Opacity, Normal, and Viral Pneumonia). Hence it was replaced with a four-class classifier, as shown in Figure 1. The rest of the learned weights were transferred as is, to be used as an initialization point for the models. This approach treats transfer learning as a kind of weight initialization scheme.
After the models' weights were initialized using transfer learning, they were trained end-to-end (all layers), using Adam Optimizer [8] with the following hyper-parameters: β 1 = 0.9, β 2 = 0.999 and learning rate 3 * 10 −5 . The batch size was set to 32, and the models were trained for 25 epochs. The epoch with the least validation loss for a certain model was considered for further stages. This special case of transfer learning is often known as fine-tuning. It can be defined as the process of using pre-trained models as an initialization point and then fine-tuning or tweaking the model's weights to make it perform a similar task.
The following section discusses briefly about the CNN models which are used for feature extraction using the proposed transfer learning based framework. It follows a homogeneous architecture; that is, it considers only 3x3 convolutions with stride one and padding one followed by 2x2 max-pooling with stride two throughout the architecture. The main idea was that a smaller receptive field means lesser parameters to learn. It showed that using multiple smaller convolution filters in a sequence has the same effect/receptive field as using a single large convolution filter. VGG16 was one of the six proposed VGG configurations and consists of thirteen convolutions, nonlinear rectification units, max-pooling layers followed by three densely connected layers. All together resulting in a total of roughly 138 million trainable parameters.

Resnet18
ResNet [20] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015. This architecture wanted to ensure that a deep model should at least perform as well as a shallow model. It was achieved by the introduction of identity connections between layers. Doing so resulted in the formation of residual blocks within the model. Structurally, a residual network is a stack of many residual blocks, and each residual block has two 3x3 convolution layers. Periodically, the number of filters were doubled and were downsampled spatially using stride two to reduce the size of the feature maps across the layers. At the end there is a global average pooling followed by a single linear layer.
They found out that deeper networks perform better with the introduction of identity connections. The reason for this kind of behavior is because of the residual blocks, which help provide a direct path to earlier layers, resulting in easier gradient flow in the network and hence no more vanishing gradient problem. ResNet18 is a kind of ResNet variant having 18 layers and a total of roughly 11.174 million trainable parameters.

Densenet121
DenseNet [21] won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2017. DenseNet stands for Densely Connected Convolutional Networks. These consist of dense blocks and transition blocks. Within dense blocks, each layer is connected to every other layer in a feedforward fashion; in other words, all layers get all the feature maps from the previous layers as input. These interconnections result in alleviation of vanishing gradient, strengthening of feature propagation, encouragement in feature reuse, and substantial reduction in the number of parameters.
The set of layers between two adjacent dense blocks are referred to as transition blocks. They reduce the dimensions of the feature maps across dense blocks. They consist of batch normalization, nonlinear rectification, a 1x1 convolution, and a 2x2 average-pool with stride two.DenseNet121 was one of the four proposed DenseNet configurations with a total of 121 layers and with roughly 7.98 million trainable parameters.
The above discussed CNN models are fine-tuned with the proposed transfer learning based framework, preceding a systematic methodology which is described in details below.

Dataset Exploration
We use the COVID-19 Radiography Database [22] obtained from Kaggle. It has a total of 21165 digital chest radiographs (or X-rays) belonging to 4 different classes (COVID-19, Lung Opacity, Normal, and Viral Pneumonia). The perclass CXR image count is plotted in Figure 2.
The frontal-view chest X-ray images from the dataset are RGB images with 299x299 pixels each. Few sample images from the dataset can be seen in Figure 3. Each RGB image has the same pixel values across the three channels (Red, Green and Blue), representing a grayscale image as RGB.

Dataset Preparation and Sampling
This section briefly discusses the various operations done on the dataset before it is ready to be used.

Splitting the Dataset
The Chest Radiography Database [22] was randomly split into train, validation and test set. We choose 98% of the images for the train set, 1% for the validation set and the remaining 1% for the test set. The resulting image distribution after the split is shown in Table 1.

Data Pre-Processing
At first, we performed a downsampling of the images in the dataset. All images were resized from 299x299 pixels to 224x224 pixels using bilinear interpolation. It is done because the pre-trained models used for fine-tuning are trained on images with 224x224 pixels. Next, normalization of the input data was done. It is a method of shifting and scaling the values to use a standard scale without losing information. This makes the convergence faster while training the model [16]. In our case, the dataset is normalized to an image distribution on which the pre-trained models were trained. In transfer learning, resizing and normalizing the inputs to the same scale the network was originally trained on is one of the foremost steps.

Handling Imbalanced Dataset & Sampling
From Figure 2, it is noticeable that the dataset is highly imbalanced. The majority class (Normal ) has almost eight times as much data as that of the minority class (Viral Pneumonia). It can lead the model to ignore the minority class entirely, as the training loss of the minority class will get masked by the majority class. To deal with it, we have used Random Oversampling. It is a process of randomly duplicating images from the minority class.
In our case, it is achieved by using the concept of Weighted Random Sampling [23]. This method assigns a weight to the images, which later determines the probability by which a specific image is sampled during training. The weight of each image is set to the reciprocal of its class label's image count in the dataset. This way, the classes with lower image count get a higher weight, and the sampling probability of images from those classes increases. It leads to image duplication and thus achieving random oversampling.

Transfer learning based Fine-Tuning
Fine-tuning is a way of applying or utilizing transfer learning. As already discussed in section 3, fine-tuning is a process that takes a model that has already been trained for one given task and then tunes or tweaks the model to make it perform a second similar task. We have fine tuned our networks using pytorch built-in libraries. As a first step,the pre-trained model was trained on a dataset with 1000 classes, which means the last fully connected layer has 1000 nodes. To make it work on our problem, the last layer of the model is replaced with 4 nodes to classify 4 respective classes (i.e., COVID-19, Lung Opacity, Normal and Viral Pneumonia). The dataset was randomly split into Train set (21005 images),Validation set (80 images) and Test set (80 images). Train set was used for training the model, whereas validation set was used to validate whether the model is learning or not. Test set was later used to evaluate the results. In second step of model design and hyper-parameters selection, the considered pre-trained models are VGG-16, ResNet-18 and DenseNet-121 with the loss function as categorical cross entropy function as given in Equation in section 3 and optimizer as Adaptive Moment Estimation (ADAM) [24]. The best performing hyper-parameters are chosen as Learning rate: 0.00003, Batch size: 32 and Number of epochs: 25. The fine tuning results are shown in Figure  ,

Detection using Gradient-based Localization
Detection/Localization is the process of detecting or locating regions in an image for a specific class or a set of classes. For our case, it can be used to detect infected regions in the lungs given a frontal-view chest radiograph. To accomplish that, we used a technique used for producing visual explanations from large CNN-based models called Grad-CAM, short for "Gradient-based Class Activation Maps." It highlights the most effective pixels which lead to a model's final prediction [25]. Unlike supervised localization, which requires labeled localization data, this method does not explicitly need anything of such. It uses the gradient information from the last layer to provide us with insights into the parts of an image that influence a model's output, making it an unsupervised gradientbased localization technique.
To obtain the class activation localization map for a given class/label (either the predicted label or an arbitrary label), we first compute the gradient of the class score with respect to feature maps of the last convolutional layer. These gradients flowing back are global average-pooled to obtain the neuron importance weights for the target class.
This results in a coarse heatmap visualization for the given class label. We apply nonlinear rectification (ReLU) to the linear combination because we are only interested in the features that positively influence the class of interest. Without ReLU, the class activation map highlights more than required and achieves low localization performance. We can use this heatmap to verify where the CNN is looking visually, and simultaneously it is used for localization. An example localization attempt is shown in Figure 7.

Results & Observations
The performance of the proposed system is evaluated by the universal assumption that all the evaluation metrics for a multi-class classification model can be mapped to a binary classification model (where the classes are simply "positive" and "negative"). The standard components considered are True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN). By evaluating these components, the performance metrics computed are: accuracy, precision, recall and F1-score.
The training and testing accuracy for different models is shown in Table  2. Further, the performance for each model on different test cases (covid19, Lung opacity, normal, viral Pneumonia) are shown in Table 3. The respective confusion matrices are also shown in Figure 8, Figure 9, and Figure 10.The best result for classifying covid19 was achieved with resnet-18 and densenet-121 with 100% accuracy which is 12.1% more than that presented in [7]. Also, as shown in Figure 7, the detection of the affected area is optimal using Resnet-18 and Densenet-121. The improvement is due to skip connection and optimal feature extraction using hierarchical method in Resnet-18 and Densenet-121.

Conclusions
Due to lack of reliability on currently used RT-PCR test for predicting COVID19, a transfer learning-based deep CNN is presented in this study to predict and detect COVID19 from chest x-ray images. Three pre-trained convolutional neural network-based models (VGG16, ResNet18, and DenseNet121) have been fine tuned to detect COVID-19 infected patients from chest X-rays. The most efficient model is further used to identify the affected regions using an unsupervised gradient-based localization technique. The proposed system uses a classification approach (normal vs. COVID-19 vs. pneumonia vs. lung opacity)using three supervised classification algorithms followed by gradient-based localization. Random sampling is helpful in dealing with imbalance data to a great extent. The transfer learning based framework is useful in dealing with small dataset and speeding up the training process. Simulation results show that Resnet18 and Densenet121 have performed better than VGG16 due to skip connection and their ability for better feature extraction in hierarchical manner.
The future work will focus ensemble learning based framework for model optimization and with larger evolving dataset by using embarrassingly parallel and accelerated training and computation with graphical processing units (GPUs), to speed up the overall performance of the deep learning models.