Deep Learning Approach to Recognition of Novel COVID-19 Using CT Scans and Digital Image Processing

COVID-19 was announced as a global pandemic by the World Health Organization (WHO) in March 2020. With more than 31.3 million con�rmed cases and over 965 thousand deaths recorded as of September 2020, it has in�icted catastrophic damage worldwide. The aim of this study is to develop an algorithm based on arti�cial intelligence (AI) and image processing techniques to identify COVID-19 patients with the aid of CT chest scan images. This study used a CT scan image dataset that is publically available for the researchers at Kaggle. We randomly extracted 27% of positive CT (pCT) images and 11% of negative CT (nCT) images from the original dataset. In the testing process, 120 of the test subjects in both nCT and pCT were used to validate the algorithm. Based on the experimental �ndings, the proposed COVID-19 detection algorithm shows promising results for the identi�cation of COVID-19 patients with 90.83% accuracy at an average precision of 0.905.


Introduction
Coronavirus is a large family of viruses that can cause a human being to develop a serious illness. The rst reported major epidemic was Severe Acute Respiratory Syndrome (SARS) [1] in 2003, while the second severe outbreak of Middle East Respiratory Syndrome (MERS) [2,15] in Saudi Arabia began in 2012. The latest outbreak of coronavirus disease was announced in late December 2019. This new virus is very infectious and has spread globally rapidly. On January 30, 2020, as it had spread to 18 countries, the World Health Organization (WHO) declared this outbreak a Public Health Emergency of International Concern (PHEIC) [3]. This virus was named 'COVID-19' by the World Health Organization on February 11, 2020 [4]. As of September 2020, the WHO reported that 31.3 million con rmed cases and over 965 thousand deaths have been registered in 213 countries. Figure 1 shows con rmed cases of global COVID-19 as of September 2020. The disease has spread rapidly around the globe since it was rst identi ed and has become an international concern. An analysis performed by Jiang et al. [5] found that COVID-19's death rate is 4.5% worldwide. In the age group of 70-79 years, the death rate for patients is 8.0%, while 14.8% for patients over 80 years. Patients over 50 years of age with chronic diseases are at the highest risk and it is critically important to nd a way to detect illness before getting into serious conditions.
As the COVID-19 epidemic has become a global pandemic, real-time analysis of epidemiological data is required to prepare society for better disease response plans. COVID-19 belongs to the SARS-CoV and MERS-CoV families, where symptoms of the common cold to severe respiratory diseases, causing trouble breathing, exhaustion, fever, and dry cough, start at the initial level. Shan et al. [9] have developed a method focused on a deep learning mechanism for the segmentation and quanti cation of contaminated regions and the entire lung using CT images in the chest. A total of 249 COVID-19 patients and 300 new COVID-19 patients have been used for validation in their study. They used the Dice similarity 2 coe cient concepts and achieved 91.6% accuracy.
Sachin Sharma [10] from the Institute of Advanced Research of India has engaged with a study about the role of machine learning techniques in obtaining important insights, such as whether a lung CT scan is a rst screening/alternative test for RT-PCR. Training and testing have been carried out using custom vision software based on Microsoft Azure machine learning techniques. The accuracy of nearly 91% has reached, although some false indicators were found in their analysis.
Harmon [11] and her research team from the USA have shown that a number of deep learning algorithms have been trained in a multi-national cohort of 1,280 patients to locate parietal pleura/lung parenchyma followed by a COVID-19 pneumonia classi cation. They achieved 90.8% accuracy, with 84% sensitivity and 93% speci city.
Xavier [12] has engaged in a study to evaluate the performance of Arti cial Intelligence methods to detect COVID-19 using chest X-Rays and CT scan images. A total of 363 patients have been used by combining two different data sources. 191 patients have COVID-19 positive and the rest of them were healthy subjects. The accuracy of the proposed system has reached 90.9% for the 121 testing samples. A high false detection rate has been observed in some experiments. A limited range of test samples has been used for the study in each geological location.
[11] High accuracy showed (90.8%), Larger dataset has been used for training and testing. The multinational dataset has been used to cover the different geological locations of the world.
Model training has been limited to patients with positive RT-PCR testing and COVID-19 related pneumonia on chest CT.
[12] The results have been achieved with high accuracy (91.8%), Larger dataset has been used for the study.
They also state that it is unknown if the tested procedures may be used to diagnose asymptomatic patients.

CT Image Dataset
This study used the CT scan image dataset from the Kaggle [14] which is publically available for the researchers. The dataset consists of three types of CT images obtained from Union Hospital (HUST-UH) and Liyuan Hospital (HUST-LH). The dataset consists of non-informative CT (NiCT) images, positive CT (pCT) images, and negative CT (nCT) images. We were randomly extracted 27% of pCT and 11% of nCT data for this study. In the testing process, 120 of the extracted data in nCT and pCT were used to validate the method. Table 2 depicts the dataset description used in the study. All the images in the dataset were originally sized to 512×512 pixels. In terms of lung changes, the presence of various types of lungs was observed in COVID-19 positive patients. Figure 3 illustrates the CT chest scan images of the COVID 19 positive and healthy test subject's lungs.

Proposed CNN Architecture
As the initial phase of this study, the chest CT scan images of COVID-19 subjects and normal healthy subjects are taken and stored in the computer. Then we have performed some image pre-processing steps, such as image cropping and image resizing to extract effective pulmonary regions before using the dataset.
Convolutional Neural Networks (CNN) is a versatile method that is commonly used for image classi cation. The hierarchical structure and the powerful functionality of image extraction render CNN a complex model for image classi cation. The proposed CNN architecture is composed of two stages: a feature learning stage and a classi cation stage as shown in Fig. 4.
The developed feature learning step consists of two convolutional layers and two pooling layers. The rst convolution layer includes a 3×3 convolutional lter for initial feature extraction. Then resultant features passed into the rst pooling layer which consists of 2×2 max-pooling lters. Then, the extracted features from the rst convolution and pooling passed to the second convolution and pooling layers. Furthermore, the second convolution layer consists of 3×3 convolutional lters and the pooling layer consists of 2×2 max-pooling lters. In the classi cation stage, the feature score matrix passed into a fully connected layer which consists of fully connected three neural layers. Each layer includes 500, 100, and 2 arti cial neurons. Finally, the softmaxLayer was used to obtain probability and classify input samples to indicate whether the test subjects are COVID positive or negative. In this model, we have used a 200×200 size input layer.

Page 7/15
The Model Hyperparameters are properties that control the whole training process. These include the variables that determine the structure of the network and the variables that determine how the network is trained. The Stochastic Gradient Descent with Momentum (SGDM) optimizer was used as the solver of the training network. The ReLU activation function was used to activate the nodes. The initial learning rate of 0.1 and 0.01 learn rate drop factor was observed at the 20 maximum epochs. We used a minibatch with 20 observations in each iteration.

Results And Analysis
The proposed design was tested with 120 randomly selected nCT and pCT chest images from the extracted dataset. Figure 5 illustrates the COVID-19 positive test subjects whose lungs have filled with hazy areas. Hazy areas suggested that patients have COVID-19 infection in the body at what level. These subjects are identified as the COVID-19 positive patients by the clinical trials. Figure 6 depicts the sample of healthy test subjects used in this study. According to the images, lungs are observed to be clear and detect less gray spots. Detection of less gray spots suggested that the test subject is negative from COVID-19. These healthy test subjects are identified as COVID-19 negative by the clinical trials.
For the training, a classification model was developed. A total of 20 epochs and 2000 iterations were undertaken during the training process in order to achieve optimal model parameters. The accuracy curve of the training process is shown in Figure 7. Based on the training results, the average classification accuracy for each individual mini-batch was 94.25% and the classification accuracy for each individual mini-batch was reached to the maximum at epoch 8.
According to the mini-batch loss curve shown in Figure 8, the mini-batch loss for multi-class classification decreased from 0.695 to 0.0977 at the end of the 20 epochs Based on the test results, positive subjects for COVID-19 were classified with a range of 0.65-1.00 probabilities and healthy subjects (COVID-19 negative subjects) ranged from 0.10-0.40 probabilities. Figure 9 illustrates the experiment results for 120 test subjects of COVID-19 positive and healthy subjects (COVID-19 negative). A confusion matrix or also known as an error matrix is a representation of the performance of an algorithm. The confusion matrix is commonly used in the area of machine learning, typically supervised learning. The entries in the confusion metrics were calculated from the coincidence matrix by using the following hypothesis, True Negative (TN) is the number of correct predictions that an instance is negative.
True Positive (TP) is the number of correct predictions that an instance is positive.
False Positive (FP) is the number of incorrect predictions that an instance is positive.
False Negative (FN) is the number of incorrect predictions that an instance is negative.
Mean Absolute Error (MAE) is a calculation of errors between paired measurements that express the same phenomena. Equation 3 represents the relationship between real data and the prediction data. The best Mean Absolute Error of a system is considered to be less than 0.200 and the MAE of the proposed system was 0.095. Therefore, the Lower MAE validates the accuracy of the proposed model for the identification of COVID-19 using chest CT scan images. Root Mean Square Error (RMSE) of the system was 0.149 calculated. Lower RMSE suggested the higher accuracy of the proposed algorithm.

Conclusion
Coronavirus disease outbreak 2019 (COVID-19) is a worldwide epidemic that has a signi cant effect not only on the health of peoples but also on the global economy. Pneumonia caused by coronavirus reveals a common hazy spot on the outer edges of the lungs, which indicates a trend such that machine learning methods can be used for early coronavirus identi cation.
In this paper, we addressed the role of arti cial intelligence (AI) techniques in identifying the novel COVID-19 using CT chest scan images of corona patients. Training and testing were carried out using the dataset published by Ning and  This study found some drawbacks, such as validation data of the CT dataset collected from one geographical region, which may not be representative of all COVID-19 patients in other geographic areas.
In our future work, we will extend the algorithm to quantify the severity of other pneumonia using transfer learning and to validate the results using data obtained from many geographical regions of the world.  Model validation results of the COVID-19 positive and healthy test subjects Figure 10 Confusion matrix of the test results