Improving diversity and quality of adversarial examples in adversarial transformation network

This paper proposes PatternAttack to mitigate two major issues of Adversarial Transformation Network (ATN) including the low diversity and the low quality of adversarial examples. In order to deal with the first issue, this research proposes a stacked convolutional autoencoder based on patterns to generalize ATN. This proposed autoencoder could support different patterns such as all-pixel pattern, object boundary pattern, and class model map pattern. In order to deal with the second issue, this paper presents an algorithm to improve the quality of adversarial examples in terms of L0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_0$$\end{document}-norm and L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}-norm. This algorithm employs adversarial pixel ranking heuristics such as JSMA and COI to prioritize adversarial pixels. To demonstrate the advantages of the proposed method, comprehensive experiments have been conducted on the MNIST dataset and the CIFAR-10 dataset. For the first issue, the proposed autoencoder generates diverse adversarial examples. For the second issue, the proposed algorithm significantly improves the quality of adversarial examples. In terms of L0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_0$$\end{document}-norm, the proposed algorithm decreases from hundreds of adversarial pixels to one adversarial pixel. In terms of L2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$L_2$$\end{document}-norm, the proposed algorithm reduces the average distance considerably. These results show that the proposed method can generate high-quality and diverse adversarial examples in practice.


Introduction
Convolutional Neural Network (CNN) is usually applied to classify images (Sultana et al. 2019). CNN is trained to learn important pixels on a labelled dataset consisting of images and their labels to predict the label of new images. School of Information Science, Japan Advanced Institute of Science and Technology (JAIST), ASAHIDAI 1-1, Nomi 923-1211, Japan CNN could achieve high accuracy on the training set and the test set. However, in practice, the accuracy could be significantly lower than expected (Moosavi-Dezfooli et al. 2015;Pei et al. 2017;Su et al. 2017). A reasonable explanation for this issue is that the training process only focuses on the correctness of CNN in terms of accuracy, precision, or F1score. Meanwhile, in real-world situations, input data could contain perturbation, which can be rarely existed in the training set and the test set. Attackers could create perturbational inputs to interfere the trained models to behave unexpectedly. Therefore, it is important to evaluate the behaviors of CNN in the presence of perturbation.
Robustness is one of the popular measurements to evaluate the quality of CNN in the presence of perturbation (IEEE 1990;Carlini and Wagner 2016;Baluja and Fischer 2017;Zhang et al. 2019). To ensure the robustness of CNN, an adversarial attack is a popular approach which attempts to modify a correctly predicted image. There are two main types of this approach including untargeted attack and targeted attack (Akhtar et al. 2021). These two types of attacks aim to modify a correctly predicted input intentionally. The output of an attack is an adversarial example classified as a target label. The target label is any label except for the ground truth label. The main difference between the untargeted attack and the targeted attack is the target label. While the target label is not fixed in the untargeted attack, the target label is a specific label in the targeted attack.
In the targeted attack, an autoencoder-based attack is a promising approach to generate adversarial examples. This approach was firstly proposed in Adversarial Transformation Network (ATN) (Baluja and Fischer 2017). The general idea is that an autoencoder is trained from an attacked model, a set of correctly predicted inputs, and a target label. The main advantage of ATN is that the trained autoencoder could be reused for generating adversarial examples from new correctly predicted inputs with extremely low computational cost. However, ATN has two major disadvantages including the low diversity of adversarial examples and the low quality of adversarial examples.
Firstly, ATN usually generates low-diversity adversarial examples. This method modifies all pixels to produce adversarial examples from correctly predicted inputs. However, some specific regions of correctly predicted inputs could be modified to generate adversarial examples while keeping the remaining regions unchanged. For example, on the MNIST dataset (Lecun et al. 1998b), some regions of the correctly predicted inputs may contain small noises due to the low quality of cameras. These noises could look like dust in human eyes. In DeepXplore (Pei et al. 2017), they claim that testing domain-specific constraints is important. They could add some black rectangles to any regions of the correctly predicted inputs to generate adversarial examples.
Secondly, ATN usually generates low-quality adversarial examples. The purpose of ATN is to generate adversarial examples close to the correctly predicted inputs. Their differences are measured by using L 2 -norm. These distances are small if the adversarial examples are close to their corresponding correctly predicted inputs. A stacked convolutional autoencoder can be utilized to generate adversarial examples from the correctly predicted inputs. However, some layers on the stacked convolutional autoencoder may be non-linear such as ReLU activation. The adversarial examples may be significantly different from their corresponding correctly predicted inputs. In the worst case, perturbations are added to all pixels of correctly predicted inputs. As a result, the corresponding adversarial examples could be very different from their corresponding correctly predicted inputs. Therefore, it is worthwhile to improve the quality of adversarial examples generated by ATN.
This paper proposes a method named Pattern-based Adversarial Attack (PatternAttack) to address the two issues. Firstly, to deal with the low diversity of adversarial examples, a generalized ATN is proposed in PatternAttack. Similarly to ATN, the generated adversarial examples should be close to the correctly predicted inputs in terms of L 2 -norm. The gen-eralized ATN aims to modify a set of specific pixels, which is defined by a pattern. This research suggests several patterns including all-pixel pattern, object boundary pattern, and class model map pattern. Among these patterns, all-pixel pattern is used in ATN. Secondly, to improve the quality of adversarial examples, an optimizer algorithm is proposed in PatternAttack. The proposed optimizer is able to improve the quality of adversarial examples in terms of both L 0norm and L 2 -norm. The main idea of the proposed optimizer is to restore redundant adversarial pixels in an adversarial example to the original values in the corresponding correctly predicted input. An adversarial pixel is redundant if it does not contribute to the decision of the attacked model, which is a predicted label. This research proposes to use JSMA (Papernot et al. 2015) and COI (Gopinath et al. 2019) to rank pixels in terms of their impact on the decision of the attacked model. As a result, L 0 -norm and L 2 -norm between the improved adversarial example and its correctly predicted input could be reduced significantly.
Addressing the two problems of ATN bring three significant benefits. Firstly, using different patterns in the generalized ATN could generate more evidence about the robustness of CNN. For example, if a machine learning test wants to check whether the object boundary pixels are prone to adversarial attack, they could use the object boundary pattern. The proposal suggests three types of patterns, however, the tester could define additional patterns. Secondly, the proposed optimizer could be used to improve the quality of any adversarial attacks. Currently, many adversarial attacks are proposed to generate high-quality adversarial examples. In many cases, their generated outputs are not as good as expected. By using the proposed optimizer, the problem of choosing an adversarial attack is not much important. For example, the tester could use FGSM (Goodfellow et al. 2015) to attack, then use the proposed optimizer to enhance the output of FGSM. Thirdly, generating various adversarial examples could play an important role to build an adversarial defense. One of the most common approaches is to build a defense and put it in front of CNN (Aldahdooh et al. 2021). Whenever there is a new image sent to CNN (i.e., adversarial or clean), it is sent to the defense to (i) transform to the valid image or (ii) check the validity of the input image. For example, DefenseVAE (Li and Ji 2020) analyzes the insight of adversarial examples generated by FGSM, BIM, CW, etc. and builds a variational autoencoder-based defense.
Briefly, the contribution of the paper is described as follows: • We propose the generalized ATN, a pattern-based adversarial attack for CNN. • We propose an optimizer to improve the quality of adversarial examples generated by any adversarial attacks.
The rest of this paper is organized as follows. Section 2 delivers the overview of related research on adversarial example generation. Section 3 provides the background of CNN and the targeted attack. The overview of PatternAttack is shown in Sect. 4. Next, Sect. 5 presents the experiment to demonstrate the advantages of PatternAttack. The discussion is presented in Sect. 6. Finally, the conclusion is described in Sect. 7.

Related works
This section presents an overview of related research. Firstly, some well-known adversarial example generation methods for CNN are presented. Secondly, outstanding research on finding the saliency map of CNN is discussed.
Pattern attack shows the pattern of the possible modified pixels. The comparable methods support all-pixel pattern such as FGSM, CW or n-most important pixel pattern such as DeepCheck and Nguyen-Pham. Meanwhile, our proposal could modify a set of pixel satisfying a specific pattern such as all-pixel pattern, object boundary pattern, and class model map pattern. The purpose of using various patterns is to clarify the robustness of the attacked model or create various adversarial inputs for building defenses against adversarial attacks.
Optimization is to enhance the quality of adversarial examples. The comparable methods do not support this capability. These methods usually minimize the distance between the generated adversarial examples and their correctly predicted inputs to achieve high-quality outputs. However, it is hard to generate optimal adversarial examples because the objective function of CNN is nonlinear. By contrast, the proposal could recognize and remove such redundant adversarial perturbations in the generated adversarial example.
For generalization, if one method could generate new adversarial examples from new correctly predicted inputs based on the result of the previous attacks, it has the generalization ability. Only ATN and the proposal support this ability. From the correctly predicted inputs, ATN and the pro-  Distance metric describes the L p -norm that the adversarial attack uses. L 0 aims to minimize the number of modified pixels. L 2 considers maximizing the Euclidean distance between the generated adversarial examples and their corresponding input images. Using L ∞ usually does not lead to better adversarial examples in terms of L 0 and L 2 .
Pixel modification is the mechanism to modify a correctly predicted input. DeepCheck uses one SMT-Solver such as Z3 (Moura and Bjørner 2008), however, the performance of this method is quite poor. Later, Nguyen-Pham proposes a heuristic solver to improve the performance of DeepCheck. Both DeepCheck and Nguyen-Pham could modify 1 pixel of correctly predicted inputs. The other methods use a gradient descent-based technique to modify pixels rather than a solver.

Saliency map generation
Saliency map generation draws great interest from research groups. Saliency map describes the impact of pixels on the classification of CNN (Simonyan et al. 2013;Gu and Tresp 2019). The saliency map usually has the same size as the input of CNN. Recent research groups have proposed different methods to generate saliency maps. Simonyan et al. (2013) firstly propose a method to generate saliency map, known as class model map. This map could be used to explain how CNN classifies a set of images as a specific label. Similarly to the idea of Simonyan et al. (2013), Zeiler and Fergus (2013) use a multi-layered deconvolutional network to produce a saliency map. Springenberg et al. (2014) propose guided backpropagation, which is better than gradient-based technique proposed in Simonyan et al. (2013). Papernot et al. (2015) propose Jacobian-based saliency map attack to construct an image-specific class saliency map of an input image. Cao et al. (2015) present a novel feedback convolutional neural network architecture to capture the high-level semantic concepts of an image, then project the obtained information into salience maps. Zhang et al. (2016) introduce excitation backprop to enhance the limitation of gradient-based technique. Fong and Vedaldi (2017) attempt to find the portion of an image, which has the largest impact on the decision of CNN. Unlike other saliency map generation methods, their method explicitly modifies the correctly predicted input to generate an adversarial example. The modification of this correctly predicted input could be interpretable to human observers. Dabkowski and Gal (2017) develop a fast saliency map detection method to train a model. This model could predict a saliency map for any correctly predicted input with a low computational cost. Yu et al. (2018) propose a method to predict the saliency map of an input image with low computation cost.
The relationship between saliency map and adversarial example is discussed in many works. Fong and Vedaldi (2017) and Gu and Tresp (2019) claim that saliency maps can be used to explain adversarial example classifications. Tsipras et al. (2019) find out that robust models have interpretable saliency maps. Etmann et al. (2019) quantify the relationship between the saliency map and correctly predicted input by analyzing their alignment. They conclude that the more linear CNN has a stronger connection between its robustness and alignment.
Due to the existence of the relationship between the saliency map and adversarial example, our research utilizes the class model map (Simonyan et al. 2013) as a pattern in the generalized ATN. By generating a class model map of a correctly predicted input set, the generalized ATN could find top-n important pixels which have the largest impact on the classification of CNN. The generalized ATN then modifies these important pixels to generate adversarial examples.
The activation function of the k-th layer, denoted by θ k , is either linear or non-linear. The most popular activation functions are softmax, sigmoid, hyperbolic tangent, ReLU, and leaky ReLU . The activation function of the output layer is softmax. Given an input image x ∈ R n , a typical CNN M can be illustrated in a simple representation as • presents linear connection, k is the number of classes, L 0 is the input layer, and L h−1 is the output layer. The input image has the dimension (width, height, channel). If the input image is monochrome, the size of channel is one. If the input image is RGB, the size of channel is three corresponding to red, green, and blue channel.
For example, Fig. 1 illustrates the architecture of LeNet-5 (Lecun et al. 1998a). This CNN model is designed for handwritten and machine-printed character recognition. This model comprises 8 layers. Convolutional layers are labelled C i , where i is the layer index. Sub-sampling layers and fullyconnected layers are labelled S i and F i , respectively. The  (Lecun et al. 1998a) dimension of the input image is (32, 32, 1). Layer C 0 is a convolutional layer with 6 feature maps of size 28 × 28. Layer S 2 is a sub-sampling layer with 6 feature maps of size 14 × 14. Layer C 3 is a convolutional layer with 16 feature maps of size 10 × 10. Layer S 4 is a sub-sampling layer with 16 feature maps of size 5 × 5. Layer C 5 is a convolutional layer with 120 feature maps. The size of a feature map in this layer is 1 × 1. Layer C 5 is then flattened into fully-connected layer F 6 with 84 hidden units. The output layer has 10 outputs in which each output is corresponding to a digit.
Let X be the training set containing a set of images. The be the gradient of the j-th output neuron with respect to the pixel x i . The ground truth label of x is denoted by y true .

Targeted attack
The key idea of a targeted attack method is that perturbations are added to a correctly predicted input to generate an adversarial example. Let x be an adversarial example generated from the attack. This adversarial example is classified as a target label y * = y true . The output neuron corresponding to the target label is called the target neuron.
Definition 2 (Mandatory perturbation) A perturbation vector ζ is mandatory if and only if x is predicted correctly by the attacked model and (x + ζ ) produces an adversarial example classified as the target label y * .

Distance metric
Given a correctly predicted input x, its adversarial example x should be similar to the corresponding correctly predicted input. L p -norm is one popular way to measure the difference between a correctly predicted input and its corresponding adversarial example. The equation of L p -norm is defined as follows: The popular metrics of L p -norm are L 0 -norm (p=0), L 2norm (p=2), and L ∞ -norm (p=∞). L 0 -norm is known as Hamming distance, which is interpreted as the number of different pixels when comparing x and x . L 0 (x, x ) ≈ 0 if and only if both x and x are very similar. L 2 -norm is known as Euclidean distance. It is calculated as the Euclidean distance from x to x . L ∞ -norm measures the maximum change in the value of pixels in every axis of the coordinate.
There is a wide range of attack methods using L p -norm. Targeted FGSM (Goodfellow et al. 2015) and BIM (Kurakin et al. 2016)

Success rate
In addition to L p -norm, the success rate is a common metric to evaluate the effectiveness of an adversarial example generation method. Let sr(M) ∈ [0, 1] be a success rate of an adversarial example generation method used to attack a CNN model M. Ideally, an adversarial example generation method should produce a set of adversarial examples with a small L p -norm and achieve a high success rate. The success rate of an adversarial example generation method is computed as follows: where X attack is a set of correctly predicted inputs, x is an adversarial example generated by modifying an attacking sample on X attack , and 1 is an indicator function. Function 1(argmax(M(x )) = y * ) returns 1 if the model M classifies x as the target label y * , and returns 0 otherwise.

Autoencoder
Generally, an autoencoder consists of one encoder and one decoder (Bengio et al. 2006;Masci et al. 2011). They differ on the architecture of the encoder and decoder, on the computation of the loss, or on the noise inserted into the network. In the context of image reconstruction, the input and the output of the autoencoder are called the input image and reconstructed image, respectively. This research uses a stacked convolutional autoencoder to generate adversarial examples. The conventional autoencoders including sparse autoencoder and stacked autoencoder are presented in detail.

Sparse autoencoder
A sparse autoencoder has one input layer, one hidden layer, and one output layer. This autoencoder is the simplest autoencoder. This autoencoder takes an input image x in ∈ R m×1 . In the encoder stage, this autoencoder maps this input image to a latent representation z ∈ R n×1 in which n < m: where b 1 ∈ R n×1 is the bias matrix of the hidden layer, W 1 ∈ R m×n is the weight matrix between the input layer and the hidden layer, and f (.) is an activation function. In the decoder stage, the latent representation z is then transformed into the input image as follows: where x out is the reconstructed image, g(.) is an activation function, b 2 ∈ R m×1 is the bias matrix of the output layer, and W 2 ∈ R n×m is the weight matrix between the hidden layer and the output layer. This reconstructed image should be similar to the input image. To fulfil this requirement, the popular objective function of this autoencoder is as follows:

Stacked autoencoder
The stacked autoencoder is an extension of the sparse autoencoder. A stacked autoencoder has at least one hidden layer.
In the encoder stage, the size of the previous layer is larger than the size of the next layer. In the decoder stage, the size of the previous layer is smaller than the size of the next layer. Figure 2 indicates an example of stacked autoencoder. The input layer has 5 neurons. In the encoder stage, there are 2 hidden layers with the size of 4 neurons and 3 neurons. The latent representation has 2 neurons. In the decoder stage, there are 2 hidden layers with 3 neurons and 4 neurons. The output layer has the same size as the input layer.

Stacked convolutional autoencoder
The stacked autoencoder does not focus on learning the image structure because the input image is flattened into a k-dimensional vector. Each dimension represents a pixel of the input image. For example, on the MNIST dataset, a (28, 28, 1) image is flattened into a 784-dimensional vector. Because the stacked autoencoder does not consider the image structure, this type of autoencoder does not reconstruct the input image effectively (Masci et al. 2011). To mitigate this issue, the stacked convolutional autoencoder is proposed. In the encoder stage, a layer can be convolutional, downsampling, and fully-connected. In the decoder stage, a layer can be deconvolutional, up-sampling, and fully-connected. The typical objective function of the stacked convolutional autoencoder is identical to Eq. 5. Figure 3 illustrates an example of stacked convolutional autoencoder. The input image is a monochrome image with dimension (28, 28, 1). In the encoder stage, this image is fed into a convolutional layer with the stride of value 2. The resulting layer Conv1 consists of 32 feature maps of size 14 × 14. At the end of the encoder, the layer is flattened into a fully-connected layer of size 10 × 1. The fully-connected layer is the latent representation. In the decoder stage, this latent representation is fed into a fully-connected layer FC, an upsampling layer Reshape and deconvolutional layers DeConv3, DeConv2, and DeConv1 to reconstruct the input image. Szegedy et al. (2014) propose box-constrained L-BFGS to generate adversarial examples by solving the following boxconstrained optimization problem:

Box-constrained L-BFGS
where f is a function to compute the difference between M(x ) and the target label y * , and β is the weight to balance two terms. A common choice of f is cross-entropy. Goodfellow et al. (2015) propose targeted FGSM to generate adversarial examples which are similar to the correctly predicted inputs in terms of L ∞ -norm. Targeted FGSM generates adversarial examples by modifying all pixels:

Targeted FGSM
where function sign(.) returns the sign of and β is a positive value to shift values of all pixels in x.
The main disadvantage of targeted FGSM is that its effectiveness is sensitive to the value of β. If the value of β is large, the success rate of the targeted FGSM could be high. However, adversarial examples could be very different from the corresponding correctly predicted inputs in terms of L pnorm. Otherwise, if the value β is very small, the success rate of the targeted FGSM could be low. In terms of L p -norm, adversarial examples could be very similar to the corresponding correctly predicted inputs.

Carnili-Wagner L 2
Inspired from box-constrained L-BFGS, Carlini and Wagner (2016) propose a method to minimize L 2 -norm. An adversarial example is generated by solving the following equation: where f is defined as: where β is the weight to balance two terms, t is used to control the confidence of the output, Z (.) returns the pre-softmax value of the output layer, term max{Z (x ) i : ∀i = y * } is the largest pre-softmax value except the target neuron, and term Z (x ) y * is the pre-softmax value of target neuron.

Adversarial transformation network
Baluja and Fischer (2017) introduce ATN to generate adversarial examples based on stacked convolutional autoencoder. The input and output of ATN are correctly predicted inputs and adversarial examples, respectively. The authors suggest using L 2 -norm to compute the distance between the adversarial examples and their corresponding correctly predicted inputs. The loss function of ATN is as follows: where β is a weight to balance two terms and function r α (.) modifies M(x) with the expectation that the modification on the correctly predicted input x is the least: where α is greater than 1 and n(.) normalizes a vector into an array of probabilities. However, this method does not support modifying specific portions of a correctly predicted input to generate an adversarial example. Instead, this method adds perturbation to all pixels of the correctly predicted input. Adding perturbation to all pixels could be considered a pattern. This paper generalizes ATN to support different patterns, in which a pattern defines a set of modified pixels.

Pattern-based adversarial transformation network
To generate high-quality and diverse adversarial examples, PatternAttack consists of two proposals named the generalized ATN and the adversary optimizer. The illustration of PatternAttack is shown in Fig. 4. In the first phase, the generalized ATN is applied to produce a set of adversarial examples based on patterns. However, these adversarial examples could not be high-quality since they could contain redundant adversarial pixels. Therefore, in the second phase, the proposed optimizer is employed to improve the quality of these adversarial examples. The proposed optimizer identifies redundant adversarial pixels and restores them to their original values. By removing these pixels, the resulting adversarial examples are still classified as the target label. Additionally, they are more similar to the correctly predicted inputs than the prior adversarial examples.

Generalized ATN
In the first phase of PatternAttack, the generalized ATN is applied to produce a set of adversarial examples. The architecture of the generalized ATN is based on a stacked convolutional autoencoder. These adversarial examples must satisfy a specific pattern. The objective of the generalized ATN satisfies two requirements. The first requirement is that the adversarial examples should be similar to the correctly predicted inputs as much as possible. The second requirement is that the labels of adversarial examples are the target label y * . Based on these two requirements, the objective of the generalized ATN is defined as follows: x (1 − ) · L 2 (γ (P, x), res) 2 + · CE(y * , M(x )) (12) where P is the pattern, CE is cross-entropy, and res is the reconstructed image and computed by ae(γ (P, x)) (ae is the training autoencoder). The function γ (P, x) creates a pattern map of x satisfying P. Specifically, if the pixel x i ∈ x does not satisfy the pattern P, the value of the corresponding element in the pattern map is zero. Otherwise, if the pixel x i satisfies the pattern P, the value of the corresponding element in the pattern map is x i : The adversarial example x is computed by using the equation x − γ (P, x) + γ (P, res). This equation replaces the pixels on x satisfying the pattern P with new values returning from the autoencoder ae. All pixels not satisfied by the pattern P remain unchanged. This research uses three patterns including all-pixel pattern, object boundary pattern, and class model map pattern: All-pixel pattern: all pixels could be modified to generate adversarial examples. ATN uses this pattern to generate adversarial examples. Object boundary pattern: all pixels located at the edge of objects on correctly predicted inputs would be modified. For example, on the MNIST dataset (Lecun et al. 1998b), objects are digits.
Class model map pattern (Simonyan et al. 2013): Let I be a class model map of correctly predicted inputs. These correctly predicted inputs are classified as label y true . The dimension of I is the same as the dimension of the correctly predicted inputs. The class model map I is generated by minimizing the following objective function: where S c is the score of the label y true , λ is a hyper-parameter, and I is initialized as a zero image. The class model map I is constructed by minimizing Eq. 14. The authors use backpropagation to find out I. An example of training a class model map is illustrated in Fig. 5. This class model map is trained on 1,000 correctly predicted inputs classified as label 3. These correctly predicted inputs belong to the MNIST dataset. From the leftmost side to the rightmost slide, the class model map is captured after 10, 20, 30, and 40 epochs. The value of λ is chosen as 0.2. The more dark pixel on I means that this pixel has more impact on the output neuron corresponding to the label y true .
Compared to Eqs. 10, 12 has the following differences: • The weight between terms: In our objective function, the weights are (1 − ) for the first term and for the second term. In ATN, the first term has the weight β and the second term has the weight 1. Essentially, our weights are similar to the weights of ATN. We set up this type of weight to have a better intuition about the importance of terms. • Cross-entropy: In the second term of Eq. 10, while ATN uses L 2 -norm, PatternAttack uses cross-entropy function. This would lead to better optimization of Eq. 12. • Pattern: In Eq. 10, all pixels could be modified. Otherwise, in Eq. 12, only pixels satisfying a pattern P would be modified. • Function r α : This function is used in Eq. 10. Equation 12 does not use function r α . We only consider this function as an optional choice. We find out that using r α does not produce a better result than not using r α when attacking CNN models based on patterns. Instead of using the func-tion r α , this paper uses a general strategy by proposing an optimizer, which improves the quality of adversarial examples in terms of L 0 -norm and L 2 -norm.

Adversarial example improvement
The adversarial examples generated by the generalized ATN might contain redundant perturbation. Hence, this section presents the proposed optimizer to improve the quality of adversarial examples in terms of L 0 -norm and L 2 -norm. We empirically observed that a large number of the gener- Algorithm 1 Improve the quality of adversarial examples in terms of L 0 -norm and L 2 -norm Input: correctly predicted input x, adversarial example x , target label y * , CNN M, step α, threshold δ, and decay rate t Output: An improved adversarial example 1: while α > δ do 2: S ← COMPUTE_DIFF_FEATURES(x, x ) 3: S ← RANK_FEATURES(S) 4: #block ← |S |/α 5: if end > |S | then 9: end ← |S | 10: end if 11: x clone ← x 12: x start...end−1 ← x start...end−1 13: if argmaxM(x ) = y * then 14: x ← This research proposes Algorithm 1 to improve the quality of adversarial examples in terms of L 0 -norm and L 2 -norm. The inputs of this algorithm are an adversarial example x , the correctly predicted input x of x , a target label y * , an attacked model M, a step size α >= 1, a positive threshold δ < α, and a decay rate t >= 1. The output of the algorithm is an improved adversarial example. If α is greater than δ and the restored rate is not converged, the algorithm detects adversarial pixels by comparing x and x , and then stores these pixels in a set S (line 2). After that, the adversarial pixels of S are ranked by applying adversarial pixel ranking heuristics (line 3) and stored in S . The beginning elements of set S tend to have a low impact on the classification of adversarial example x . The last elements of S tend to affect significantly the decision of the attacked model. The proposed algorithm then restores sequentially adversarial pixels in S . A set of these sequentially adversarial pixels is called a block. For each block starting from position start to position end, in which the size of block is step α (line 7-11), the algorithm updates the adversarial example x by restoring the block (line 12). If the restoration changes the classification, the current adversarial pixels are reverted to their original values (line 13-15). Next, α is decreased by t times (line 20).
Algorithm 1 terminates when α is not greater than threshold δ (line 1) or the restored rate (denoted by r ) converges (line 16). At the i-th modification, the restored rate r of an adversarial example improvement is defined as follows: where x i is the improved adversarial example at the i-th modification. At the p-th modification, the restored rate r converges if and only if it satisfies the following equation: where k >= 1 is number of prediction with the improvement of adversarial examples less than β. The value of step α plays an important role in the performance of our proposed optimizer. The step α must be greater than or equal to 1, which controls the performance of the proposed optimizer. If step α is equal to one, the proposed optimizer calls the prediction M(x ) at least #block times. The cost of making a prediction call is expensive. When #block is large enough such as in the case of all-pixel pattern, the total cost of prediction could be extremely huge. Therefore, step α should be large enough to decrease this computational cost.
Adversarial Pixel Ranking Heuristics: The purpose of adversarial pixel ranking heuristics (line 3) is to estimate the impact of an adversarial pixel on the target neuron. The impact of adversarial pixel x i on the output neuron y * -th is denoted by s y * (x i ). Adversarial pixel x i has a higher impact on the output neuron y * than adversarial pixel x j if and only if s y * (x i ) > s y * (x j ).
For the pixel x i on a correctly predicted input, there are several well-known pixel ranking heuristics such as JSMA (Papernot et al. 2015), COI (Gopinath et al. 2019), etc. Concerning JSMA, Papernot et al. (2015) propose two heuristics. This research calls them JSMA + and JSMA − . These two ranking heuristics are applied to compute the score of pixels on the correctly predicted input. Specifically, for the pixel x i which should increase to generate an adversarial example, JSMA + assigns this pixel to a non-zero score. By contrast, for the pixel x i which should decrease to generate an adversarial example, JSMA − assigns this pixel to a non-zero score. Concerning COI, the score of a pixel x i is computed by the multiplication of its intensity and its gradient.
However, the above heuristics do not consider ranking adversarial pixels. In this research, we apply these heuristics in a different context. We would compute the scores of adversarial pixels rather than the scores of pixels on the correctly predicted inputs. The score of an adversarial pixel x i is defined as follows: • JSMA + : • JSMA − : • COI:

Dataset
The experiments are conducted on the MNIST dataset (Lecun et al. 1998b) and CIFAR-10 dataset (Krizhevsky et al. 2009) to demonstrate the advantages of PatternAttack. These two datasets are commonly used to evaluate the robustness of CNN models . The MNIST dataset is a collection of handwritten digits. This dataset has 60,000 images on the training set and 10,000 images of dimension (28, 28, 1) on the test set. There are 10 labels representing digits from 0 to 9. The CIFAR-10 dataset is a collection of images for general-purpose image classification. This dataset consists of 50,000 images on the training set and 10,000 images of dimension (32, 32, 3) on the test set. Each pixel is represented in a combination of three colours including red, green, and blue. There are 10 labels including aeroplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.

Attacked models
This research uses LeNet-5 (Lecun et al. 1998a) and AlexNet (Krizhevsky et al. 2017) to train on the MNIST dataset and the CIFAR-10 dataset. LeNet-5 is proposed to recognize handwritten digits such as the MNIST dataset. AlexNet is introduced to deal with general-purpose image classification such as the CIFAR-10 dataset. This architecture is considered one of the most influential CNN architectures. The accuracies of the trained models using these architectures are shown in Table 2. As can be seen, on the MNIST 1 https://github.com/carlini/nn_robust_attacks. dataset, LeNet-5 produces better results than AlexNet. On the CIFAR-10 dataset, AlexNet outperforms LeNet-5. The result of attacks m → n is stacked convolutional autoencoders. These autoencoders reconstruct correctly predicted inputs labelled m into adversarial examples labelled n. To evaluate attacks m → n, the experiments split the MNIST dataset and the CIFAR-10 dataset into separate subsets for evaluation. Let M i be a subset of the MNIST dataset and C i be a subset of the CIFAR-10 dataset. For simplicity, each subset of these datasets is called dataset. Based on the purpose of the evaluation, the experiments define three groups of datasets, namely G train , G val , and G new . Group G train = {M train , C train } is a portion of the training sets and used to train autoencoders. Each dataset of G train consists of 1,000 correctly predicted inputs labelled as m. Group G val = {M val , C val } is a portion of the training sets and not used to train autoencoders. Each dataset of G val contains 4,000 correctly predicted inputs labelled as m. Group G new = {M new , C new } is the test sets of the attacked models. M new has 1,000 correctly predicted inputs labelled as m. Due to the accuracy limitation of two CNNs on CIFAR-10 test set, we set C new to have 862 and 759 correctly predicted inputs labelled as m for AlexNet and LeNet-5, respectively. Group G val and Group G new are used to evaluate the generalization ability of the autoencoders trained on Group G train . For each CNN model, there are 90 attacks in total (i.e., 10 original labels × 9 target labels). However, in this research, we select some representative attacks to make a comprehensive comparison. Specifically, on the MNIST dataset, the attack is 9 → 7 for AlexNet and 9 → 4 for LeNet-5. Because the shape of 9 is very different from 7 and 4, it is likely hard to perform adversarial attacks on 9 digits. On the CIFAR-10 dataset, the attack is truck → horse for AlexNet and truck → deer for LeNet-5. Because the shape of the truck is very different from the horse and the deer, it is likely hard to perform attacks on truck images.

Architecture of generalized ATN:
It is difficult to select an architecture of a stacked convolutional autoencoder which  attacks well on a CNN model. Instead, the architecture of the stacked convolutional autoencoder should be chosen manually based on the experience of attackers. In this research, for each dataset, the experiments use a stacked convolutional autoencoder for all attacked models. These stacked convolutional autoencoders are shown in Tables 3 and 4. Group G train is used to train autoencoders. Each autoencoder is trained up to 500 epochs with a batch size of 256. The experiment uses an early stopping strategy to terminate the training when there is no improvement in the loss over subsequent epochs. After the training process, correctly predicted inputs would be fed into the trained autoencoders.

Adversary improvement
The proposed optimizer is used to improve the quality of adversarial examples in terms of L 0 -norm and L 2 -norm. The proposed optimizer uses four main parameters including step α, threshold δ, decay rate t, and ranking heuristics. These parameters are used to adjust the performance of the proposed optimizer. In our experiment, the value of δ is assigned to 0. The value of decay rate t is assigned to 2.
Step α: By analyzing the result of different steps α, we observed that using the appropriate value of step α would have a positive impact on the performance of the proposed optimizer. Using a too-large value of step α or a too-small value of step α might slow down the performance of the proposed optimizer. Based on the empirical data, we suggest using step α = 6 for attacking the MNIST dataset and step α = 30 for attacking the CIFAR-10 dataset.
Ranking heuristics: The experiment uses three ranking heuristics including COI, JSMA, and random. Heuristics random is used as a baseline. While heuristics COI and heuristics JSMA arrange adversarial pixels in increasing order of impact on the target neuron, heuristics random sorts adversarial pixels randomly. By analyzing the result of ranking heuristics, for the MNIST dataset, applying heuristics COI or heuristics JSMA tend to converge the average restored rate faster than using heuristics random. For the CIFAR-10 dataset, we observed that using JSMA usually improves adversarial examples most effectively.

Configuration of compared methods
We compare PatternAttack with other state-of-the-art methods including targeted FGSM, box-constrained L-BFGS, and Carnili-Wagner L 2 . The value of weights in targeted FGSM and box-constrained L-BFGS is the same as the generalized ATN. Concerning Carnili-Wagner L 2 , this method does not need to choose weight manually. This method uses a binary search algorithm to find the optimal weight to produce highquality adversarial with a high success rate.

RQ1: Improvement in diversity issue
This section investigates the ability of the generalized ATN to improve the low diversity of adversarial examples. In order to generate diverse adversarial examples, the experiment applies different patterns including all-pixel pattern, object boundary pattern, and class model map pattern. Table 5 compares the average success rates of the generalized ATN and other comparable methods. There are two promising results.
Firstly, the generalized ATN could generate adversarial examples with various perturbations than other methods. Specifically, all-pixel pattern achieves the best average success rates among the used patterns for the MNIST dataset and the CIFAR-10 dataset. In most cases, the average success rate of this pattern is above 99%. The average success rate of object boundary pattern achieves from 6.2% to 55.1%. Class model map pattern achieves the average success rate from 9.4% to 68.3%. If the generalized ATN only uses allpixel pattern like Carnili-Wagner L 2 , targeted FGSM, and L-BFGS, a large number of adversarial examples could be ignored.
Secondly, the generalized ATN could generate adversarial examples from new datasets with an average success rate approximate to the trained datasets. The trained datasets are in Group G train (i.e., M train and C train ). The new datasets are in Group G val (i.e., M val and C val ) and Group G new (i.e., M new and M new ). The new datasets have the same classification purpose as the trained datasets. The maximum difference between the average success rate on the new datasets and the trained datasets is about 10%, which is acceptable. It means that if the trained autoencoders have a high average success rate on the trained datasets, these autoencoders would likely achieve approximate average success rates on other similar datasets. This is the generalization ability of the generalized ATN, which is not supported in comparable methods.

RQ2: Improvement in quality issue
This section investigates the ability of the proposed optimizer to enhance the quality of adversarial examples. The conclusion is that the proposed optimizer could enhance the quality of adversarial examples significantly for the generalized ATN, targeted FGSM, and box-constrained L-BFGS. For the The average success rate is the average success rate resulting from the usage of the configured weights. By using different patterns, the generalized ATN could discover more adversarial examples with various perturbations than other methods adversarial examples generated by Carnili-Wagner L 2 , the proposed optimizer could enhance their quality slightly.
In the proposed optimizer, rather than improving the quality of adversarial examples individually, we put them in a batch and improve them at the same time to reduce the computational cost. Because the used heuristics produce approximate final results, the experiment reports the average reduction rate for simplicity. The average reduction rate is computed by (a − b)/a, where a is the average L p -norm distance without applying the ranking heuristics and b is the average L p -norm distance after applying the ranking heuristics. Table 6 shows the average reduction rate of L 0 -norm and L 2 -norm when applying the proposed optimizer. Generally, L 0 -norm and L 2 -norm could be reduced by applying the proposed optimizer. Especially, concerning L 0 -norm, we observed that the proposed optimizer could decrease from a large number of adversarial pixels to one adversarial pixel. For the generalized ATN, targeted FGSM, and boxconstrained L-BFGS, the proposed optimizer could enhance the quality of the generated adversarial examples significantly. The proposed optimizer could decrease at most about 98% L 0 -norm and 89% L 2 -norm. For Carnili-Wagner L 2 , the proposed optimizer could improve the quality of adver-sarial examples slightly. This is explainable since Carnili-Wagner L 2 usually generates adversarial examples with a minimum L 2 -norm. This method uses a binary search algorithm to find out the optimal weight in Eq. 8, which leads to a minimum L 2 -norm.
Examples of some improvements for adversarial examples are shown in Fig. 6. Labels origin and adversary denote the correctly predicted inputs and their corresponding adversarial examples, respectively. Label difference shows the difference between adversarial examples and their improvement in white colour.   Table 7 The overall performance comparison (seconds) between PatternAttack and other methods and Group G new ) with lower computational cost than the trained datasets (i.e., Group G train ). The computational cost on Group G train includes the training process, the adversarial example generation process, and the improvement process for adversarial examples. The computation cost on Group G val and Group G new includes the adversarial example generation process and the improvement process for adversarial examples. As can be seen, the overall performance of adversarial example generation on Group G val and Group G new is from 2 times to 27 times faster than that of Group G train . It means that the machine learning testers only need to spend effort on training autoencoders. The trained autoencoders could be reused for generating adversarial examples from new correctly predicted inputs with an extremely low computational cost. By contrast, the other methods do not support generating adversarial examples based on the information from previous adversarial example generation. In other words, every adversarial example is generated independently of the other. As a result, when dealing with a set of correctly predicted inputs, PatternAttack usually produces adversarial examples faster than the other methods.

Model
Another promising finding is that PatternAttack could mitigate the trade-off between the average success rate and the overall performance. Targeted FGSM and box-constrained L-BFGS are mostly worse than PatternAttack in terms of the average success rate and overall performance. About Carnili-Wagner L 2 , although this method could achieve mostly 100% average success rate as shown in Table 5, this method consumes a large amount of computational cost, which is mostly from about 1,000 seconds to around 3,000 seconds. By contrast, by applying all-pixel pattern, PatternAttack could not only achieve approximate average success rates as Carnili-Wagner L 2 but also require a smaller computational cost significantly. In particular, the computational cost of applying all-pixel pattern is from 7.5 times (i.e., LeNet-5 with dataset M train ) to 80 times (i.e., AlexNet with dataset M val ) faster than Carnili-Wagner L 2 .

Discussion
Integral constraint: On the MNIST dataset and the CIFAR-10 dataset, the value of each pixel is an integer in the set D = {0, 1, ..., 255}. After normalization, the value of each pixel is in the set D = 0, 1 255 , 2 255 , ..., 1 . The value of an adversarial pixel must be in the valid set D or D . However, similarly to targeted FGSM (Goodfellow et al. 2015) and Carnili-Wagner (Carlini and Wagner 2016), we ignore this requirement in the objective function of the stacked convolutional autoencoder in the experiments. Instead, the output of the stacked convolutional autoencoder is continuous in the range [0, 1]. After that, this output of the stacked convolutional autoencoder is normalized to a valid domain. The output x of the stacked convolutional autoencoder is an adversarial example if and only if M x * 255 255 = y * , where y * is the target label. We see that ignoring the integral constraint of adversarial examples in the objective of the stacked convolutional autoencoder rarely affects the success rates of the attacks.
The generation of class model map: A class model map I of a label l is generated from a set of correctly predicted inputs classified as l. In the Objective 14, the process of finding I is a training procedure. A major problem of this procedure is that the generated I could be a locally optimal class model map. This issue could be addressed by applying different gradient descent such as Backpropagation (Springenberg et al. 2014), Excitation Backprop , etc.

Conclusion
We have presented PatternAttack method to generate highquality and diverse adversarial examples. PatternAttack has two proposals including the generalized ATN and the adversary optimizer. Firstly, the generalized ATN is a pattern-based stacked convolutional autoencoder. This autoencoder is used to generate diverse adversarial examples for CNN. We conducted experiments with three patterns including all-pixel pattern, object boundary pattern, and class model map pattern. Secondly, we introduce the optimizer to enhance the quality of adversarial examples. The main idea of the proposed optimizer is to restore redundant adversarial pixels based on their impacts on the classification of the attacked model.
The comprehensive experiment on the MNIST dataset and the CIFAR-10 dataset has shown several promising results. In terms of adversarial diversity, the generalized ATN could generate diverse adversarial examples by using different patterns. Especially, for all-pixel pattern, most of the attacks could achieve above 99% average success rate. In terms of adversarial quality, concerning L 0 -norm, the proposed optimizer could reduce from hundreds of adversarial pixels to one adversarial pixel. About L 2 -norm, the proposed optimizer could reduce the average distance considerably. The proposed optimizer could enhance the quality of adversarial examples generated by targeted FGSM and boxconstrained L-BFGS significantly and Carnili-Wagner L 2 slightly. Additionally, we suggest using the adversarial pixel ranking heuristics COI and JSMA in the proposed optimizer to achieve better performance than using the random heuristics. In the future, PatternAttack will be extended to support other distance metrics such as L ∞ -norm. In addition, the research will make a more comprehensive comparison with other well-known datasets.