Deep_Saliency: A Deep Learning based Saliency Approach to detect Covid-19 through x-ray images

Covid-19 a small virus has created a havoc in the world. The pandemic has already taken over 4 lakh lives. The tests to detect a Covid-19 positive takes time and is costly. Moreover, the ability of the virus to mutate surprises the doctors every day. Present paper proposes a saliency-based model called Deep_Saliency. The model works on chest x-rays of healthy, unhealthy, and covid-19 patients. An x-ray repository of Covid-19, available in public domain, is taken for the study. Deep_Saliency uses visual, disparity, and motion saliency to create a feature dataset of the x-rays. The collected features are tested and trained using Long Short-Term Memory (LSTM) network. A predictive analysis is performed using the x-ray of a new patient to confirm a Covid-19 positive case. The first objective of the paper is to detect Covid-19 positive cases from x-rays. The other objective is to provide a benchmark dataset of biomarkers. The proposed work achieved an accuracy of 96.66%.


I Introduction
A challenge for all the researchers today is to detect Covid-19 positive.More than 0.2126076 billion patients spread over 213 countries, the pandemic has brought the world to its knees.More than 0.761018 million people have already lost their lives [1].Scientists who work round the clock also could not find a vaccine for this virus in six months.Preventive measures and a strong immunity have helped recover 35 lakh patients.A vaccine may take another month or so to come but will only be commercially available by year's end.The dynamic nature of the virus has increased the difficulties of the doctors and the scientists.With no medicine available, and no vaccine the only way out is to prevent it from spreading.It is therefore important to detect and isolate the positive patients.A test conducted on a patient is no guarantee that the patient would not turn positive in the next few days or even the next day.
The test of Covid-19 involves studying of RNA.The procedure is lengthy and takes at least 24 hours.Rapid detection kits failed to serve the purpose [2].Asymptomatic individuals are another threat.Such patients do not show any symptom of Covid-19.Reverse Transcription Polymerase Chain Reaction (RT-PCR) depends on swabs from the nose and mouth.These swabs have the chances of getting an infection while collecting the samples.A big issue with RT-PCR testing is that it depends on patented products.Many manufacturers can't quickly develop these to meet the shortfall.Any glitch would further reduce RT-PCR 's reliability, which is already plagued with precision problems.An overview [2] of the available RT-PCR kits for COVID-19 reveals that some are only 90% sensitive and 96% accurate.This could be just 66 to 80 per cent in real-world conditions, which means one in every three would be falsely tested as negative.
Considering the challenges faced by the Doctors and scientists the authors of the paper are proposing a Deep_Saliency approach to detect the corona positive cases.The aim is to detect the patient at a very early stage.A simple chest x-ray can immediately detect a Covid-19 suspect and the doctors can isolate the patient.An immediate isolation would help in stopping the community spread of the virus.
For the research work the dataset is taken from the x-ray dataset available in public domain.The x-rays are of Normal, Pneumonia (virus, and Bacteria), and Covid-19 positive and negative patients.For better results, chest X-ray has been taken from benchmark datasets, references and details given in experimental setup section of the paper.The objectives of the paper are to extract the salient features from the images and predict a Covdi-19 positive case.To extract the features visual saliency, disparity, and motion saliency is calculated.A fusion map is generated using these features to get a Feature Group Map (FGM) for Long-Short Term Memory (LSTM) model.Before proceeding further, first understand what saliency is and why it was chosen along with motion and disparity.Saliency is the component which pops first to the human eye in an image.There are two types of visual saliency detection techniques, top-down approach and bottomup approach.Top-down approach is used to identify important objects like faces whereas bottom-up approach is used to analyze the pixel values and compute saliency values for each pixel like high pixel contrast and group highly salient pixels into regions.Based on depth, texture, luminance and contrast of color in [4] a novel stereoscopic saliency detection framework based on new fusion method was proposed, named Graph-based saliency detection algorithm.The algorithm in [5] was designed to represent an image as a fully connected graph.This model achieved 98% of the ROC.It was further [6] elaborated that how mathematics and biology could play an important role as image saliency detector.The mathematical model and understanding of a scene quickly break a complex problem into a simpler version which can be analyzed easily.As saliency was aimed at stereoscopic images a motion estimation method was developed in [7] which aimed at providing the motion vectors.It also gave a demonstration of full search algorithm with significantly less amount of time.Saliency considers left and right images when it comes to stereoscopic (or 3D) images.A method was proposed in [8] to detect saliency by the amount it differed from the object's neighborhood.The approach was easy to implement and could be used in many image and video content analysis applications.For improving visual attention in machine, an element in brain similar to neuron was designed in [9].Koch used bottom up saliency model for this work.Various methods in [10][11][12][13][14][15][16][17][18][19][20] were proposed for saliency detection, but when it came to multiple objects the performance was not as expected.A new method in [21] came up which combined local feature, regional feature, edge and multiple scales.It did help to overcome the issue of multiple object detection, but the cost was too high and also it did not perform well on videos.To overcome this, a 2D model on 2D images in [22] was proposed but it only exploited depth contrast.Further, an automatic detection system in [23] introduced a method that automatically detected salient image region in stereoscopic videos.The method was improved in [24] and tested on two publicly available stereoscopic datasets.The method was further improved in [25].
Though saliency was aimed at stereoscopic images or videos, but it has not limited the researcher.It has found its application in medical field as well.To detect breast cancer, a derivative-based feature saliency model in [28] was designed.It could highlight suspicious places in mammograms.For leakage detection in diabetic and malaria retinopathy, saliency map was used in [29].Wireless capsule endoscopy (WCE) with saliency was used to detect ulcers in [30], it was further used with k-mean clustering to classify abnormalities in [31].Feature extraction is a must for any classification algorithm to run successfully.Saliency with Convolutional Neural Network (CNN) is used to segment Ultrasound heart images in [32].Exploiting saliency capabilities to detect boundaries and connectivity, optic disc segmentation is proposed in [33].
The analysis of previous work on saliency indicates that it is a good choice for medical images.Coming back to Covid-19 and the proposed model.As mentioned above RT-PCR in a realworld problem may give just 66% to 80% accuracy.Covid-19 is a chest related disease, improving the chest x-ray diagnosis would result in higher accuracy.Deep learning approaches have been studies and explored for diagnosis of Covid-19 [34][35][36][37][38][39][40].Covid-Net proposed in [34] claims to achieve an accuracy of 80%.Inspired Covid-Net, a patch based convolutional neural network approach is proposed in [41].CovidGan proposed in [42] had similar problem of small dataset for Convolutional Neural Network (CNN).They produced synthetic Covid X-ray images using their approach.The accuracy claimed is 95% but cannot be trusted as the dataset used is synthetic or manufactured to be more precise.The proposed models up until now have not been able to achieve more than 80% accuracy, due to limited dataset.
To improve the accuracy the paper proposes to use Fusion Map (Saliency+Disparity+Motion) to generate a Group Feature MAP(GFM).GFM would be the input for the deep Learning Model.CNN has been th first choice of the researcher when it comes to classification.But, as seen in [41 and 42] the model is not able to achieve more than 80% accuracy.Keeping this in mind a CNN + LSTM model is designed for classification and prediction.LSTM's forget gate is expected to retain remember the features for improved classification.The highlights of the study are: 1.The Saliency map gives the salient features but for image classification more detailed features are required for better accuracy.Disparity map improves the luminosity of the features.Motion map is drawn between original and Saliency Map, this enables to capture the minutest difference between the neighbouring pixels.The Fusion map generated has more visible features.

II Methodology
The aim is to reduce the objective function in fn (1).

 
The objective function is expanded and explained in the following section followed by brief steps and algorithms.

Objective functions explained
Fusion map is designed by the authors merging visual saliency map, disparity, and motion saliency.

1.1.1
Step involved i. Compute feature maps from an x-ray image.Low-level features are derived from the color, intensity and orientation channels.The features extracted using GBVS are inspired by biologically based filters proposed in [4].In order to create a set of scale-space images, each input image is first filtered by a series of Gaussian filters with different scales.A cross-scale image collection of differences is then specified by interpolation and subtraction between the variations of the chosen image size.Intensity and color characteristics are derived depending on contrast from each differential image diagram.The information regarding local orientation is also extracted at each scale by oriented Gabor pyramids.In all, 42 feature maps are calculated: 6 for strength, 12 for color and 24 for orientation.ii.Activation maps are computed using center-Surround Subtraction approach from feature maps.A graph map is identified as nodes by treating all the positions of pixels within the features detected in the last step.This determines and measures the dissimilarity between nodes and the weight of an edge of the graph map.To evaluate the dissimilarity between the node and its neighbouring nodes, the equilibrium distribution is computed.iii.Activation maps are normalized using global-mean maximum scheme.Saliency regions are clustered, dependent on distribution, and used as the final saliency map.

Steps
To get the disparity map, as shown in Fig. 3, a matching procedure is executed.Every pixel in the original image and corresponding pixel in the GBVS feature map image is taken and disparity map (distance between the pixel values) is computed.In the disparity map, brighter shades indicates more shifts and lesser distance as compared to darker shades, which gives lesser shifts and more distance from the GBVS feature map.

Mathematical Formulation
To calculate disparity local optimization technique is used.It calculates the disparity of each pixel independently, using the selected pixel's single matching cost.Local optimizations typically yield precise estimates of the disparity in textured regions.The "Winner-Take-All" [45] (WTA) is a basic local optimisation technique.The optimization of WTA consists of selecting the disparity d for each pixel ( x , y) which is represented as:  ( , , ) ) ) ( ( , , ) ) which is the first order Taylor series approximation.The optical shift can be found by solving the minimization problem , ( , ) x y minimize x y ) Since this is a linear least square problem, the optimal Dx , and Dy can be determined by setting the derivative of the objective function to zero.Thus we have, 0 x  =  and 0 y  =  and consequently we can setup the following system of linear equations: Therefore, by solving this system of linear equations we can determine the optimal solution.Note that in order to make Taylor approximation valid, we implicitly assume that The resultant map is a fusion map of Saliency, Disparity and Motion.Figure 6 shows a fusion map.The source image is chest x-ray of a covid-19 positive patient.
1.5 Group Feature Map 0 ( ) .... fn (6) here, GFM is Group fusion map is the number of images in the folder GFM is a collection of all the fusion maps created using fn (5) and saved in a separate folder.

III Experimental Setup
There are two parts to any software based experimental setup.Hardware and Dataset.The model is tested using Matlab.

Model and implementation
The training and testing set is given in fn (7).TP is the number of correctly detected ailment; FN is the number of ailments that have not been correctly identified or wrongly identified.FP is the number of ailments identified as one in the list of detection, but they are not.In simple terms they are falsely identified.TN is the number of not an ailment correctly identified.F1 score is taken for early stoppage of the classification.

V Results
The quantitative comparison results of the fusion map are given in Table 1.In this  [26]) and depth model in (Wang et al., 2013 [22]); Model 3 [HOU'S] represents the saliency detection model by fusion method of linear combination from 2D saliency detection model in (Hou and Zhang et al., 2007) [27] and depth model in (Wang et al., 2013).From this table, we can see that the proposed model performs better than the existing fusion models.

VI Conclusion
Since the pandemic outbreak the authors have been looking for the datasets, clinical datasets have been uploaded as of Feb 2020.The datasets were good for analysing the severity of the outbreak and understanding the virus too.The number of cases around the world was rising at a high -speed, more testing was performed and complete lockdown in some countries was enforced.Europe saw deserted streets for the first time, everyone staying at home, but the cases continue to shoot up.The number is rising even today, and there's no cure for it either.Isolation is the only solution, until the vaccine is out.But who isolated?Do the tests give accurate results?Covid-19 has a quality to mutate itself, in such scenarios how accurate the tests can be?
The proposed model is an attempt by the authors to detect Covid-19 at an early stage using chest x-rays.The model has achieved an accuracy of 96.66%, it could have been better had there been more images available.The available datasets are raw datasets, some are even with equipment.The proposed work is able to identify any virus or bacteria in the lungs.Saliency, motion, disparity, and fusion maps would work as biomarkers for the doctors and researchers.
The dataset generated would act as benchmark dataset and would been soon uploaded in the public domain.RT-PCR claims to be 96% accurate but in the real-world scenario the expected accuracy is just 80% as there are always chances of swab taken as sample could be infected by a foreign element.

VII Competing interests
There is no competing interest Graph based visual saliency, adopted from [32] Figure 3 Different stages in saliency map  Saliency and Disparity Fused Saliency, Disparity, Motion fused Covid-19 with instruments Normal x-ray images.shows the saliency, disparity, motion and fusion map.Four sample images are shown in the table to give an ide of these maps on Normal x-rays.
Comparative analysis of time consumed by different algorithms

Figure 1 .
Figure 1 .1 the two parrots first pop up in our eyes and not in the background.

Figure 1 .
Figure 1.2 shows a general working model of LSTM.

Figure 2
Figure 2 is a general working model of Graph based Video saliency(GBVS).

Figure ( 2 )Figure 4 :
Figure (2) shows the original image.GBVS map drawn from the original image.Salient parts detected.

Figure 3 :
Figure 3: Different stages in saliency map

Figure 5 : Motion Map 1 . 3 . 1
Figure 5: Motion Map 1.3.1 Mathematical formulation Motion Estimation is done using a combination of Block Matching algorithm and Optical Flow algorithm.For optical flow estimation, two consecutive consider frames f (x , y ) and g (x , y).We have G (x,y) = f (x + Dx , y + Dy )

FmapFigure( 9 )Figure( 10 Figure( 11 )
Figure(9): Normal x-ray images Figure(9) shows the saliency, disparity, motion and fusion map.Four sample images are shown in the table to give an ide of these maps on Normal x-rays.Original Image

Figure ( 12 )
Figure (12): Comparative analysis of time consumed by different algorithms

Figure 11 COVID- 19 .
Figure 11 2. Covid-19 is found to mutate, it changed more than 20 times as of now.The virus thus is difficult to detect.The fusion map is able to retrieve the salient features more precisely.Classifiers work perfectly on benchmark datasets but in the case of Covid-19 there is limited availability of data and the data is in raw form as well.An image is normalized or pre-processed before it features are passed on to the classifier.Fusion map over comes the issue of visibility of salient features.The results obtained supports the claim.3. The dataset has data from day 1 to day 30 of few patients, this enables the Deep Learning network to train itself better.The model is CNN+LSTM.LSTM work fine on large datasets but CNN works better on smaller datasets.The forget gate of LSTM enables the network to retain the features for a period of time.The forget gate will help to detect the Covid-19 positive case at an early stage.4. The proposed model is expected to help the doctors and scientists to (a) detect a Covid-19 positive patient through the Chest X-rays at a very early stage, and (b) it would enable them to verify the authenticity of RT-PCR report.
This is the final objective to predict Covid-19 using chest x-ray of a new patient.Algorithm, given in the following section, explains clearly how the entire model works.
The Deep learning model based on LSTM requires an activation function.For accuracy True Negative (TN), False Negative (FN), True Positive (TP), and True Negative (TN) are measured.These are further used to calculate Precision, Recall, Accuracy and F1 score.The mathematical formulation used is:

Table 1 :
The AUC and CC comparison.

Table 2 :
Different Classification algorithms using NIH dataset

Table 2
helped in choosing the LSTM-CNN model for the proposed Deep-Saliency.The dataset does not contain any Covid-19 images.To select the correct optimizer and the activation function the code was again run on NIH dataset using different activation functions and optimizers.

Table 4
shows the classification results using the local approach [42] and the proposed model.An accuracy of 96.66 is good considering the limited dataset of Covid-19 dataset.

Table 7 :
Comparative analysis on the basis of Precision

Table 6
and 7 are comparative analysis of the proposed model with the existing models proposed for Cvoid-19.The results give an accuracy of 96.66% in case of Cvoid-19.As LSTM has an issue of taking more time a comparison is drawn between different classifiers.