When saliency goes off on a tangent: Interpreting Deep Neural Networks with nonlinear saliency maps

A fundamental bottleneck in utilising complex machine learning systems for critical applications has been not knowing why they do and what they do, thus preventing the development of any crucial safety protocols. To date, no method exist that can provide full insight into the granularity of the neural network's decision process. In the past, saliency maps were an early attempt at resolving this problem through sensitivity calculations, whereby dimensions of a data point are selected based on how sensitive the output of the system is to them. However, the success of saliency maps has been at best limited, mainly due to the fact that they interpret the underlying learning system through a linear approximation. We present a novel class of methods for generating nonlinear saliency maps which fully account for the nonlinearity of the underlying learning system. While agreeing with linear saliency maps on simple problems where linear saliency maps are correct, they clearly identify more specific drivers of classification on complex examples where nonlinearities are more pronounced. This new class of methods significantly aids interpretability of deep neural networks and related machine learning systems. Crucially, they provide a starting point for their more broad use in serious applications, where 'why' is equally important as 'what'.

This, however, does not have the desired effect. The masking itself introduces new information to the image, and that new information competes with the desired classification. Generally speaking, if more than 20% of the pixels have been masked, the image is invariably classified as a "piece of a puzzle" or equivalent, regardless of what is depicted in it.
We modify this approach two-fold, as described in the Methods section. First, we blur the non-selected pixels by subjecting them to a low-pass filter. We then enhance the selected in the direction of the gradient. The gradient is then re-calculated, and the procedure is repeated iteratively.
The motivation for the blurring step lies in the adverse effects of masking, as described above. Filtering is the simplest way to remove information content, without inadvertently introducing additional information, which can happen with masking. It is not the only such step; for instance, random perturbation would have a similar effect, but it would not be repeatable due to randomness.
The motivation for enhancement in the gradient direction comes from the simple gradient descent method; it is the simplest way to select a direction in which the selected class probability increases.
The key difference between our nonlinear saliency maps and linear saliency maps is the updating of the gradient at each step; modifying our method to re-use the same gradient, calculated at the original image, at each step, would revert to a linear saliency map. The difference between the linear and nonlinear saliency map at any given point is therefore directly linked to the local nonlinearity of the DNN at that point.
The specific optimization procedure described in this paper does not converge generally, due to irreversibility of the low-pass filtering operation. Once the local maximum is overshot by blurring salient pixels, it is impossible for the optimization to un-blur them in order to improve the approximation.
This lack of convergence does not pose a serious problem. Even though the maximum is never reached, the step parameters can be chosen so that the trajectory passes sufficiently close to it, before possibly subsequently diverging. Hence the only remaining step is to re-trace the trajectory and return the points on the trajectory that maximised the target class probability, as various approximations of the relevant maximum.
The results are striking. As shown in Figure 1, we can work with classes that score as low as 0.01% probability for a given image, and enhance the image to classify in such a class with probability higher than 95%. We thereby come as close to a pure class representation as is reasonably possible for a given image. By subsequently masking the non-highlighted pixels, we obtain the equivalent masked saliency map, but this time reflecting the full nonlinearity of the DNN, instead of only its tangent. Trajectories of nonlinear saliency maps are shown alongside of those applying the equivalent transformation according to the linear saliency map (where the gradient is calculated at the original image and never modified), in order to facilitate a direct comparison.
Looking at optimization trajectories in Figure 2, it is apparent that not all trajectories are equal. Some trajectories reach the pure class representation almost immediately from the original image, indicating that the initial gradient, calculated at the original image, is sufficient, and that nonlinearities of the DNN are relatively unimportant at that point. Other trajectories, on the other hand, only find a path to the pure class representation once 50% or 75% of the image has been blurred out. For these images, the nonlinearities of the DNN play a more important part in their classification, and the linear approximation fails to provide a correct interpretation.
More specifically, is apparent from the optimization trajectories that there is a threshold in the proportion of the image that has to be blurred before the maximum is reached. The blurring operator serves to destroy competing classes, thus enabling the target class to dominate the classification. The threshold corresponds to the proportion of the image contributing to the competing classification. Only when the competing classifications have been sufficiently annihilated by the blurring operator, the target class is able to come through.
The threshold value is not directly related to the class probability in the original classification. This is apparent in the examples of, on one hand, (h) goldfish (p = 0.27%; thr = 5 − 10%) and (b) whistle (p = 0.06%; thr = 5 − 10%) , and on the other (e) Border collie (p = 0.01%; thr = 50 − 75%) and (d) sombrero (p = 0.01%; thr = 75 − 95%) -different images with class probabilities of the order of 0.1% or less, but with thresholds ranging from 0 to 95%. Instead, as expected, the threshold value appears to be directly linked to the convexity of the DNN at the relevant point.
To verify this, we looked at direct comparisons between linear and nonlinear saliency maps in more detail. The comparison between the linear and nonlinear vanilla gradient saliency maps for two tentatively nonlinear examples (b) and (e) from Figures 1 and 2, is shown in Figure 3.
The image from example (e) is a fairly complex image containing two people and a dog in the front left of the image, another person and a dog to the back right, and several more people, parasols, cars and marquees in the background. The Xception DNN identified the image as being 99.98% English springer, 0.01% Border collie.
All the saliency maps correctly identified the two dogs in the image as being crucial to the classification. Linear saliency maps for either the English springer or Border collie classes select both dogs more or less equally at the masking ratio of 95%. According to the linear saliency maps, both dogs are roughly equally responsible for both classes, and the DNN is pretty clear that they are both English Springer Spaniels, or, significantly less likely, Border Collies.
Nonlinear saliency maps, however, show something quite different. While largely agreeing with linear saliency maps at low masking ratios, at 95% masking ratio they clearly show that the dog in the front left of the picture is identified as an English Springer Spaniel, while the dog in the back right is identified as a Border Collie. As it happens, the DNN is probably wrong on the latter point -to the untrained eye of the authors, the dog in the back right also appears to be an English Springer Spaniel. Whether the DNN is right or wrong is, however, in this particular context immaterial. The relevant point is that this classification can't be correctly interpreted without accounting for the local nonlinearity of the DNN through nonlinear saliency maps.
A similar pattern is seen in example (b). The image shows a hummingbird sat on a bird feeder. In this case, both linear and nonlinear saliency maps correctly identify the hummingbird, blurring out the bird feeder. On the second category, whistle, they, however, differ. The linear saliency map selects both the bird and the feeder as contributing to whistle, while the nonlinear saliency map selects only the feeder, masking the bird.
We can therefore conclude that nonlinear saliency maps do generally provide a different interpretation from that of linear saliency maps. In our randomly chosen examples, the interpretation given by nonlinear saliency maps appears to be more sensible than that given by linear saliency maps.
For comparison, we have included four further state-of-the art saliency map techniques for the same images and the same classes; namely, GradCAM [13], SmoothGrad [14] and Occlusion Sensitivity [15], all as implemented in the python package tf-explain [16].
Out the tested methods, GradCAM was the only one to identify separate parts of the images as pertaining to the hummingbird versus the whistle, and the English springer vs the border collie. Its specificity on the border collie was comparable to that of vanilla linear gradient, identifying the dog on the left as the English springer, and both dogs as border collies. On the hummingbird / whistle, GradCAM outperformed the vanilla linear gradient, and it was comparable to the nonlinear gradient. Neither SmoothGrad nor Occlusion Sensitivity showed a meaningful interpretation of the second-ranked class.
To further elaborate on this point, we have selectively blurred the 95%-salient pixels for each relevant class probabilities; that is, for example, in Figure 1, we have blurred the selected pixels that correspond to 95% saliency for swimming trunks, and looked at how this affects the class probability for the class swimming trunks. This was repeated for top two classes for each of the images. The results are shown in Figure 4.
As can be seen from Figure 4, selectively blurring nonlinearly salient pixels results in significantly lowering any class probability; the only exception to that are the classes swimming trunks and goldfish in Figure 1 g) and h), which are increased by such blurring. It is notable that neither of these two classifications rely on fine-scale detail, which might explain why blurring has the opposite of the desired effect.
Finally, we note that the optimization algorithm we used is a simple extension of gradient descent, which is a relatively unsophisticated optimizer. Different core optimization methods generally generate somewhat different trajectories. The direction of image enhancement is, however, only one part of the optimization procedure, with the other being the filtering operation. As noted above, the critical value governing the speed of convergence is the threshold in the proportion of image that is blurred, which is only weakly linked to the direction of image enhancement. Changing the optimizer therefore generally does not significantly affect the critical filtering threshold after which the maximum is reached, and it does not have a strong effect on the speed of convergence.
How important are nonlinear saliency maps for interpretability of machine learning models? That is, of course, at this stage still unclear. It is clear that fully accounting for nonlinearities in the model is of considerable importance for the task of interpretability. The class of methods that we describe is a step forward in this direction. It is able to do the same job as linear saliency maps where linear saliency maps are correct, and it provides an important, distinct change of direction where they are not.
The examples we presented come from computer vision and image classification, which is admittedly a task that often crosses the boundary into being to some extent subjective. Assigning complex images to simple, single classes is always somewhat arbitrary. The example image of two dogs could equally validly be classified as depicting people, a fair, or a lawn.
The true test of the methods presented here will come in real life applications which fall outside the scope of this study, such as medical image classification, material defect analysis and others.
Each image was classified by the DNN models. Instead of masking non-highlighted pixels, we subject them to a low-pass filter, effectively blurring the non-salient parts of the image.
We then enhance the highlighted pixels in the direction of the gradient. By iteratively repeating this step, we are able to selectively amplify the features that are important to classifying the image into any chosen class. The corresponding saliency map is then obtained by masking any blurred pixels.
Neither of those steps is sufficient in its own right; blurring pixels without gradient enhancement only increases the entropy of the image, effectively blurring the image to the extent that it can not be reliably classified as anything. Gradient enhancement without blurring, on the other hand, reaches the boundaries of the RGB box and stops there, without being able to add significantly to the classification.
The nonlinear saliency maps corresponding to any class was calculated by: 1. selecting the appropriate column for the desired class from the Jacobian of the output vecor with respect to the input tensor 2. for a given threshold, updating the image according to the logic next_img = np.clip( blur_below_level(prev_img, sensitivity_norm, L) + littleH * sensitivity, 0., 1.) 3. obtaining the saliency map using the related logic next_map = np.clip( mask_below_level(prev_img, sensitivity_norm, L) + littleH * sensitivity, 0., 1.) Here, sensitivity refers to the relevant column of the Jacobian, sensitivity norm to its pixel-wise norm (Euclidean norm over the RGB channel direction), blur below level performs the low pass filtering on pixels where sensitivity norm falls below the selected level L, mask below level covers the same pixels in a background colour (in all our examples, white) and littleH is a small number controling the size of the optimization step. The clipping operation ensures that all RGB values remain within the RGB range. The level parameter threshold L, and the distance parameter littleH control the step size in the optimization process.

Data Availability
The datasets analysed during the current study are available in the Imagenet ILSRVC 2017 repository, https://image-net.org.