Imperceptible adversarial attacks against traffic scene recognition

Adversarial examples have begun to receive widespread attention owning to their potential destructions to the most popular DNNs. They are crafted from original images by embedding well-calculated perturbations. In some cases, the perturbations are so slight that neither human eyes nor detection algorithms can notice them, and this imperceptibility makes them more covert and dangerous. For the sake of investigating the invisible dangers in the applications of traffic DNNs, we focus on imperceptible adversarial attacks on different traffic vision tasks, including traffic sign classification, lane detection and street scene recognition. We propose a universal logits map-based attack architecture against image semantic segmentation and design two targeted attack approaches on it. All the attack algorithms generate the micro-noise adversarial examples by the iterative method of C&W optimization and achieve 100% attack rate with very low distortion, among which, our experimental results indicate that the MAE (mean absolute error) of perturbation noise based on traffic sign classifier attack is as low as 0.562, and the other two algorithms based on semantic segmentation are only 1.503 and 1.574. We believe that our research on imperceptible adversarial attacks has a certain reference value to the security of DNNs applications.


Introduction
The DNNs applications have shown significant potential in computer vision. However, some research works in recent years disclose that DNNs are vulnerable to adversarial attacks (Szegedy et al. 2014;Goodfellow et al. 2015). With some special perturbation on raw images, the adversarial examples can easily fool the models and make wrong predictions. Adversarial examples can attack not only image classifiers (Szegedy et al. 2014;Kurakin et al. 2017;Carlini and Wagner 2017) but also semantic segmentation and object detection models (Arnab et al. 2018;Alassad et al. 2021;Merrer et al. 2020;Ma et al. 2020;Taheri et al. 2020). However, most existing attack algorithms are with visible distortion or too much additional noise (Shen et al. 2018;Naseer et al. 2018;Boloor et al. 2020;Failed 2017), which can easily be detected and thus lose their attacks. It is still a large challenge to design adversarial examples which is able to fool not only detection algorithms but also human eyes. In this paper, we focus more on the imperceptible adversarial examples, study imperceptible adversarial attacks against different traffic scene recognition, including traffic sign classification, lane detection and street scene segmentation. Figure 1 shows the three adversarial attacks. Our experiment results reveal that adversarial attacks can be implemented in various network models, and the attack effects can be designed arbitrarily. Moreover, the perturbation noise can be very small and imperceptible to both human and detection algorithm. Our attacks are all white-box attacks (Ma et al. 2020;Goodfellow et al. 2015), the three networks involved are the famous MobileNetV2 (Sandler et al. 2018), U-net (Ronneberger et al. 2015) and DeepLabV3 ? , which have been trained on well-known datasets: BelgiumTS (Timofte et al. 2014), Pascal VOC (Everingham et al. 2010) and Kitti (Geiger et al. 2012), respectively.
The contributions of this paper include: We study the adversarial attacks of automatic traffic identification system and reveal the potential security  rakin et al. 2017). They also proposed a targeted attack method of BIM: iterative least-likely class method (ILCM) (Kurakin et al. 2017. ILCM can fool the models to achieve any pre-set class, even the least-likely class. Besides that, Carlini and Wagner presented a powerful attack approach (C&W) by using the special objective function defined to suit for optimization (Carlini and Wagner 2017), the approach is regarded as more robust while with less distortion. Noting that, at that stage, almost all the approaches were designed for attacking the image classifiers at the beginning stage.
As the attack research progresses, many studies started trying to transfer the classifier attacks to other DNNs models such as semantic segmentation. But due to the architecture differences, most classifier attacking algorithms could not be directly introduced into semantic segmentation models. It is commonly recognized that it was much harder to generate the adversarial examples on semantic segmentation models than on classifiers (Arnab et al. 2018;Alassad et al. 2021;Osahor and Nasrabadi 2019;Klingner et al. 2020), because: (1) The structures of semantic segmentation models are usually more complicated than classifier. (2) The attack purpose on semantic segmentation is flexible and need more manipulations. (3) It is hard to set the success standard for an attack.
To investigate the feasibility and imperceptibility of adversarial attacks on various networks, we build three models regarding different application of traffic scene recognition and try to attack them with the lowest distortion cost. Our experimental results show that all the adversarial examples are of good quality, and they are hard to be detected by both human or detection algorithm.

Adversarial attack against traffic sign classifier
In this section, we investigate several adversarial attacks against a traffic sign classifier. We adopt the MobileNetV2 pre-trained on ImageNet dataset as our target network. To Fig. 1 Adversarial attacks against traffic scene recognition. The first branch is adversarial attack experiment in traffic sign recognition, the second branch is lane detection experiment, and the third branch is street scene recognition train the model to classify various traffic signs, we finetune the MobileNerV2 by unlocking the untrainable properties of the last five parameter layers. The training dataset is Belgium traffic sign (BelgiumTS) (Timofte et al. 2014), which contains 62 types of traffic signs and has 7125 images in total. After 50 epochs, the model achieved 0.974 training accuracy and 0.887 validation accuracy, and we turn to conduct various adversarial experiments on it to defeat the classification strategies.

Method
To generate the adversarial traffic sign images, the C&W strategy is adopted in our program which applies gradient descent optimization to solve the following problem: where I 0 is the adversarial example of I, f I 0 ð Þ represents the objective function of adversarial attack, Z I 0 ð Þ i denotes the i-th logits value. The parameter a is the super-parameter that is used to adjust the proportion between L2_norm restraint and attack strength. The parameter j is used to ensure the high confidence of the misled class. For searching the best combination of coefficients, let a start with 1 and multiply by 10 each time, j start with 0 and add 5 each time to search the best combination of coefficients, and a = 100, j = 20 is the final coefficient values which can gain the superior attack performance and ensure the 100% attacking rate (Carlini and Wagner 2017). In addition, to reduce the iterative [0,1] clips in optimization, we also apply a variable w to substitute I 0 : Figure 2 shows the diagram of Formula (3), it indicates that the upper function can well protect the white or brightcolor areas from severe modification, thus keeping the visual quality of whole image.
For investigating the performance of our method above, the ILCM, L-BFGS algorithms are also tested on the same model. Figure 3 shows the images of adversarial instances generated by different attack algorithms. All the three methods are targeted attacks, we appoint the target class is ''Straight,'' and stop the iterative optimization when it achieves the target class confidence not less than 0.9. Under the same attack conditions, the C&W has the smoothest noise fluctuations and least distortions, so it shows the best quality.
For evaluating the perturbations of each algorithm more comprehensively, we randomly select 100 images from the validation set of BelgiumTs to generate adversarial examples by upper methods. The images are considered in [0, 255] range, their target class is set by C target-= mod(C correct ? 30,62), where C correct is the correct class of original image. Three criteria, mean squared error(MSE), mean absolute error(MAE) and L-infinity norm (L ! ), are introduced to detect the perturbation strength. MSE and MAE represent two different distances between the original image and the adversarial image, while L ! indicates the maximum pixel modification. The smaller the values of these three criteria, the smaller the amount of disturbance, the better the algorithm effect. The criteria are evaluated from two perspectives: the 0.9 Confidence Satisfied and the Lowest Threshold. For the 0.9 Confidence Satisfied, setting the learning rate(lr, for L-BFGS and C&W) or the factor of attack strength(e, for ILCM) be 0.01, the adversarial sample is iteratively optimized until being identified as the target class with a confidence of not less than 0.9. The least threshold means the least perturbations to mislead model and we search them by doubling the lr (or the e) from 0.0001. Table 1 shows the results. On the Lowest Threshold, surprisingly, the MAEs of all methods are less than 1. Compared to the [0 255] range of image value, our modification scale is very slight and hard to be found. C&W achieves both the lowest MSE and the lowest MAE, which indicated its minimum amount of perturbation noise. Although it cannot get the minimum L ! , the adoption of Formula (3) can effectively avoid creating obvious noise in bright areas (i.e., eye sensitive areas) and protect its image quality.
Inspired by the superiority of C&W, we try to develop our semantic segmentation attacks by using the same optimization strategy. However, the C&W objective function is not applicable to segmentation model directly, so we have to recraft proper optimization ways considering both the attack intent and the segmentation network feature. In the following, a universal attack architecture against semantic segmentation will be put forward.

Semantic segmentation attack
On semantic segmentation attacks, non-targeted methods simply fool the models to get unrecognizable maps which are useless and meaningless, while targeted attacks achieve the available maps that look ''normal'' but with covert destruction intent. Figure 4 shows four attack patterns, the first two rows show two non-targeted methods which output chaotic and striped maps (Poursaeed et al. 2018). The third row shows a category-hidden attack which outputs a ''normal'' map with little pedestrian information (Failed 2017). The fourth row shows an explicit attack which can indicate wrong driving direction by painting black lines on road image (Boloor et al. 2020). By contrast, the third method has less perturbation noise and the prediction map looks more normal, so it is the most difficult to detect.

Universal attack architecture
To support the adversarial attacks on different models, we propose a universal adversarial architecture of semantic segmentation based on the targeted logits map in Fig. 5. In the architecture, the bottom black flow line represents the  Bold values indicate the best results. The criteria are evaluated from two perspectives: the 0.9 Confidence Satisfied and the Lowest Threshold regular semantic segmentation prediction. The pixel-case targeted logits map is extracted from it and modified to be the targeted logits map (the yellow cuboid). The adversarial example generation frame is in the top dotted rounded rectangle, the red flow lines form a cycle structure, where the adversarial example is iteratively optimized by constantly reducing the loss between two logits maps, till the attack purpose is satisfied.
The key step of this attack architecture is to introduce the targeted logits map. According to the purpose of the attack, the attacker may tamper the normal prediction map into the desired target effect, and through iterative optimization, the adversarial samples that can predict the desired output map will be obtained. In the following sections, we will propose two attack methods against traffic scene segmentation according different tasks. The C&W optimization strategy is adopted in both methods to generate adversarial examples, and Tanh function described in Formula (3) is also used to improve their imperceptibility.

Lane attack
In this section, we provide an adversarial attack method with the purpose to deflect the detected lane. Both the target direction and deflection angle can be set arbitrarily.
The state-of-the-art DeepLabV3 ? network is adopted as our attack model. After being trained on the KITTI dataset (a famous road scenes dataset) (Geiger et al. 2012), the model achieves the 0.974 training accuracy and 0.921 validation accuracy. Then, we begin to perform the adversarial attack against it. We construct the targeted logits maps by shearing the original logits maps: Define Z x,y,c to be the logits value at (x,y,c). Here, x,y represent the pixel locations and c represents current class channel. Z 0 x;y;c represents the targeted logits at (x,y,c) and it is set as: where x 0 is obtained with: The h in the formula is the attack angle, positive is to left and negative is to right. W stands for the width of original image. After constructing Z 0 , the adversarial example can be generated by solving the C&W optimization problem: where a = 100, which is also obtained from pretest and can implement most attack within a configured iterations number. Figure 6 demonstrates the attack against lane scene detection. We extract its pixels-case logits map while predicting and shear it by left and right 45 degrees. Here, lr = 0.001, and the adversarial examples are generated after 300 iterations of optimization. Attacked results show obvious deflections with about ± 45 degrees yet the adversarial examples seem the same as the raw image. Figure 7 shows an another instance with ± 60 degrees deflection attacks.
Hundred random images are taken from the KITTI validation dataset to be tested the same ? 45 degree attack. And Table 2 shows the mean MSE, MAE and L ! which are evaluated from two attacking intensities: ordinary attack (lr = 0.001 with 300 iterations) and low-distortion attack (lr = 0.0001 with 1000 iterations). The two settings can obtain similar attacking results, but bring different perturbations. Actually, the perturbations under the two settings are both imperceptible, the less learning rates, the less perturbations, but with more iterations. By setting lr = 0.0001, our methods can significantly mislead the lane detection model by a very slight perturbation whose average MAE is only 1.503.

Street scene recognition attack
In this section, we demonstrate another semantic segmentation attack, people-hidden attack. We will fool the model to see none of peoples. The model adopted in this task is the U-net networks (Ronneberger et al. 2015), and the training and testing datasets are created from the Pascal VOC2012 datasets (Everingham et al. 2010). The model was trained to identify five categories: people, cars, motorcycles, bicycles and background. The category of background represents all other objects. The trained model achieves 0.967 training accuracy and 0.863 validation accuracy. The attack purpose of this task is to hide all the people in the scenes without affecting the identifications of other categories. So the targeted logits maps, e.g., Z 0 , must be crafted in advance.
To hide the ''people,'' we can set all C people in Z 0 be 0, but that would make it hard for algorithm to convergence and generate an adversarial example. Here, C people represents the ''people'' channel. Therefore, before setting it to 0, we distribute the ''people'' logits data to other channels. Each ''people'' data in logits map is added to another logits channel, e.g., another category, which is in the same pixel and has the maximum data among all categories except ''people.'' Z 0 can be obtained as follows: Z 0 x;y;C others ¼ Z x;y;C others ð7Þ where C others represents all categories except people.
where C max is the selected channel to be added the ''people'' data, it is calculated from following: With this modified Z 0 , we conduct the street scene attack whose optimization mechanism is similar to the our Lane attack. The termination condition of this task is that none of pixels in final prediction map belongs to ''people'' category. Figure 8 shows two adversarial instances. Their final output maps demonstrate that all the segmentation areas of ''people'' completely disappear, and they have been replaced by other category pixels. Hundred images with people, which are selected from validation datasets, were tested with two intensities: ordinary attack (lr = 0.001) and low-distortion attack (lr = 0.0001). Table 3 shows the mean MSE, MAE and L ! . It tends to the same conclusion that our people-hidden method can also imperceptibly fool the recognition of street scenes. With low-distortion attack, the mean MAE is only 1.574. All the experiments above are run on a workstation with NVIDIA RTX3000.

Conclusion
As a substitute for the human eye, DNNs is widely used in many recognition tasks. For some highly critical applications, such as self-driving cars, safety is a major concern. However, recent research has shown that DNNs are vulnerable. In our scheme, we can design arbitrary attack effects by using the proposed universal adversarial architecture of semantic segmentation. Our experimental results reveal DNNs' vulnerability from the following two points: (1) There is great flexibility in the design and implementation of the purpose of adversarial attacks. Attacks can be customized arbitrarily, which is a great threat to the security of system applications.
(2) The disturbing noise in the  The criteria are evaluated from two perspectives: Ordinary attack (lr = 0.001, iter = 300) and Low-distortion attack (lr = 0.0001, iter = 1000) adversarial example may be very slight, which is difficult to detect by human eyes or detection algorithm, and such attacks are more covert and dangerous. The ultimate goal of the research against attacks is to improve the robustness and security of the DNNs application system. Defense against imperceptible adversarial examples is a very challenging study in our future work, and we will focus the two directions: (1) Researching on the integration of signal acquisition and recognition, because reducing the transmission and preprocessing links can reduce the risk of attack. (2) Enhancing the DDNs defense performance by expanding the defense training, so as to improve the robustness and security of these application systems.
Author contributions Yinghui Zhu is responsible for the collection of experimental data, and Yuzhen Jiang is responsible for the writing of the paper. Ethics approval This article does not contain any studies with human participants or animals performed by any of the author.
Informed consent All authors agree to submit this version and claim that no part of this manuscript has been published or submitted elsewhere. We appreciate your consideration of our manuscript, and we look forward to receive comments from the reviewers as soon as possible.  The criteria are evaluated from two perspectives: Ordinary attack and Low-distortion attack. Their termination condition is that none of the pixels belong to the ''people''