How Adversarial attacks affect Deep Neural Networks Detecting COVID-19?

—Considering the global crisis of Coronavirus infection (COVID-19), the essence of utilizing novel approaches to achieve quick and accurate diagnosing methods is required. Deep Neural Networks (DNN) showed outstanding capabilities in classifying various data types, including medical images, in order to build a practical automatic diagnosing system. Therefore, DNNs can help the healthcare system to reduce patients waiting time. However, despite acceptable accuracy and low false-negative rate of DNNs in medical image classiﬁcation, they have shown vulnerabilities in terms of adversarial attacks. Such input can lead the model to misclassiﬁcation. This paper investigated the effect of these attacks on ﬁve commonly used neural networks, including ResNet-18, ResNet-50, Wide ResNet-16-8 (WRN-16-8), VGG-19, and Inception v3. Four adversarial attacks, including Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini and Wagner (C&W), and Spatial Transformations Attack (ST), were used to complete this investigation. Average accuracy on test images was 96.7% and decreased to 41.1%, 25.5%, 50.1%, and 56.3% in FGSM, PGD, C&W, and ST, respectively. Results are indicating that ResNet-50 and WRN-16-8 were generally less affected by attacks. Therefore using defence methods in these two models can enhance their performance encountering adversarial perturbations.


I. INTRODUCTION
The outbreak of COVID-19 provided new opportunities for machine learning (ML). Applying ML can aid the healthcare system in diagnosing quickly, accurately, and reliably. ML models can diagnoser COVID-19 by learning its symptoms. Fever, dry coughs, and lung engagement are general signs of this viral disease [1]. Therefore analyzing lung images along with Reverse Transcription Polymerase Chain Reaction test (RT-PCR) is used as diagnosing methods. Machine Learning (ML) could accelerate the COVID-19 diagnosing procedure while maintaining accuracy and less false negative value by utilizing a wide range of data such as medical images and cough recordings [2], [3].
Convolutional neural networks (CNNs) are specialized architectures in image processing used in medical image classification [4]. IDx-DR is the first U.S. Food and Drug Administration (FDA) approved medical diagnosing system designed to detect diabetic retinopathy [5]. ML-based diagnosing systems were used widely to identify lung engagement from CT-Scan or X-Ray images [6], [7]. According to the capabilities of image classification in rapid diagnosis, several datasets containing lung X-Ray/CT-Scan images were introduced. In addition, Cohen [8].
With all the advancements in Deep Neural Networks (DNNs) robustness to increase precision and accuracy, networks can easily get fooled, putting practical deployment of deep learning systems in jeopardy [9]. Adversarial attack is a procedure that leads to false classification. ML models can be fooled simply even by changing one pixel [10]. Depending on the use of each trained DNN, dreadful results can ensue from attacks. Chernikova et al. studied the effects of such attacks on self-driving cars, in which the prediction of steering angle varies by manipulating input images [11]. In another research, Li et al. studied fooling CNN-based medical diagnosing systems [12]. Hirano et al. investigated universal adversarial perturbation (UAP) on three medical image classification tasks, including skin cancer, referable diabetic retinopathy, and pneumonia images over seven model architectures [13].
Adversarial attacks can be non-targeted or targeted. In a targeted attack, adversarial perturbation affects determined class. Another categorization is based on the access of the attacking kernel to model parameters. Attacking a model by knowing model parameters, gradients, and training process is called a white-box attack. On the contrary, the black-box attack has only access to input and output [14].
In this study, popular DNN models in medical image classification including Residual Networks (RN), Wide Residual Networks (WRN), VGG networks and Inception v3 were investigated.
The rest of the paper organized in three sections. In section II, implemented architectures and attacks are explained. Results reported and discussed in Section III. Finally, we concluded our findings in the last section.

II. MATERIALS AND METHODS
Overall procedure of this investigation is shown in figure  1 and fully described in following subsections.

A. Dataset
Curated Dataset for COVID-19 provides 1281 COVID-19, 3270 Normal, 1656 viral-pneumonia, and 3001 bacterialpneumonia X-Rays collected from 15 public and available datasets [15]. All images are normalized, standardized, and resized into 255x255 pixels with one color channel, so the final shape is 255x255x1. Additionally, we used normal and COVID-19 sets to train models in binary classification models.

B. DNN Models
Five different DNN architectures with different preferences, depths, and number of parameters were implemented to investigate the effects of adversarial perturbations on their accuracy. RN-18, RN-50, WRN-16-8, VGG-19, and Inception v3 were selected due to their popularity in medical image classification [16], [17]. WRN-L-k is k-level wider than a regular RN with L layers, outperforming common RNs [18]. The number of parameters (trainable and nontrainable) is listed in Table I to compare the complexity of models.

C. Adversarial Attack Kernels
Adversarial Robustness Toolbox (ART) provides a diverse range of attacks, which are used in this study [19].
We focused our investigation on four state-of-art attacks, including Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini and Wagner (C&W), and Spatial Transformations Attack (ST). The resulted image of each attack may not be noticeable from human eyes, but they can easily fool the DNN model. FGSM attack uses the gradients of the loss function with respect to the input image to create an adversarial image that maximizes the loss [20].
where I adv is final adversarial image which crafted from I original image. ǫ is to ensure the perturbations are small, ∇ I is gradient, L is the loss function, and θ is model parameters.
On the contrary, the PGD attack is an iterative version of FGSM [21].
I t adv = P α (I t−1 + +ǫ × sign(∇ I L(θ, I t−1 , y)) Where P (.) is the projection function, and I t adv is the adversarial image at t-th step. PGD uses random start for I 0 = I + U (−α,α) , where U is the uniform distribution.
C&W attack has a different approach to affect models by changing the L(.) function in Eq. (2) [22].
where Z is logitis regarding y or y target and τ is a parameter to control the confidence of attack. Despite above mentioned attacks, the ST attack is based on grid search over possible translations and rotations to find optimal attack parameters [23]. The domain of translations for this study was constrained between -30 to 30 degrees In most of the cases, the final image of implemented attacks (except ST) are not visible to human eyes, Figure 2 illustrates the kernel of attack. Due to the white-box nature of FGSM, PGD, and C&W, the kernel varies with model and hyper-parameters. Also, all implemented attacks are in their default settings in the ART library with infinity norm (L ∞ ).

III. RESULTS AND DISCUSSION
All of the models were implemented in a Python 3.7 environment from scratch using Kaggle notebooks. Models were trained with 200 epochs, 64 batch sizes, and 10% validation split using Stochastic Gradient Descent (SGD) optimizer. From 4551 images, 3640 were used to train models, and 911 images were used as a test set. While 10% of the train set (364 images) were used in the training validation process. Table II contains the accuracy of models in classifying original test images and resulted adversarial images by feeding benign test set to FGSM, PGD, C&W, and ST attacking kernels. Figure 3. Illustrates the accuracy of each model in five different evaluation sets. The average test accuracy across five DNN models was 96.7%, while this number decreased over testing with adversarial images to the minimum of 26.1% in PGD attack. VGG-19, one of the most commonly used model in medical image classification, was affected by this simple attack, while RN-50 showed moderate resistance. Considering the iterative nature of PGD, the effectiveness of this attack is more than FGSM while C&W attack with its different kernel could reduce the accuracy of the classifiers by more than 40%.
Unlike other attacks, implemented ST attack rotates images from -30 to 30 degrees. The result of this investigation denotes that even slight rotation can reduce the classification accuracy. So training medical image classification models using data augmentation and rotated images is essential.
In overall, PGD attack was more effective that other three attacks. Also, the results demonstrate that WRNs and RN-50 showed more resistance than other models. Therefore securing these models with adversarial retraining can result in a more solid and reliable classification model.
Our findings suggest that selecting a DNN model based on accuracy on benign test set could result in deceivable system. More evaluations should be considered to check the security, generality, and reliability of the model. According to Table I, number of parameters of VGG-19 could not help this model to survive attacks, therefore more research is needed to evaluate layer types, activation functions and filter sizes in confronting with adversarial attacks.

IV. CONCLUSIONS
Using ML to classify medical images has been widely investigated by various researchers. One of the main threats against using these models in real applications is adversarial images. In this study, five models were evaluated under four most common attacks. The investigation demonstrated that obtaining acceptable accuracy and Y is not an appropriate criterion for selecting a model out of other architectures and adversarial perturbations could lead to severe problems in such systems. Therefore, adversarial training and model robustness is required for developing practical medical image classification systems. We hope that our findings help researchers enhance their model's security level and increase awareness of developing DNNs with various defense methods.