Understanding Deep Networks via Extremal Perturbations and Smooth Masks

Published 18 Oct 2019 in cs.CV, cs.LG, and stat.ML | (1910.08485v1)

Abstract: The problem of attribution is concerned with identifying the parts of an input that are responsible for a model's output. An important family of attribution methods is based on measuring the effect of perturbations applied to the input. In this paper, we discuss some of the shortcomings of existing approaches to perturbation analysis and address them by introducing the concept of extremal perturbations, which are theoretically grounded and interpretable. We also introduce a number of technical innovations to compute extremal perturbations, including a new area constraint and a parametric family of smooth perturbations, which allow us to remove all tunable hyper-parameters from the optimization problem. We analyze the effect of perturbations as a function of their area, demonstrating excellent sensitivity to the spatial properties of the deep neural network under stimulation. We also extend perturbation analysis to the intermediate layers of a network. This application allows us to identify the salient channels necessary for classification, which, when visualized using feature inversion, can be used to elucidate model behavior. Lastly, we introduce TorchRay, an interpretability library built on PyTorch.

Abstract PDF Upgrade to Chat

Citations (389)

View on Semantic Scholar

Summary

The paper introduces extremal perturbations as a novel method to identify input regions that maximally affect network outputs.
It employs a robust area constraint with smooth masks to enforce fixed-size perturbations, enhancing the interpretability of saliency maps.
Quantitative evaluations on datasets like PASCAL VOC and COCO validate the method's improved precision in attributing deep network decisions.

Analysis of "Understanding Deep Networks via Extremal Perturbations and Smooth Masks"

In the paper "Understanding Deep Networks via Extremal Perturbations and Smooth Masks," the authors tackle the problem of attribution in deep learning models by introducing a novel concept termed extremal perturbations. Attribution here refers to identifying the parts of an input that are responsible for a model's output. Typically, attribution in neural networks is achieved using gradient-based methods that backtrack through a network's activations, resulting in saliency maps that highlight crucial regions in input images. However, these existing methods often validate the importance of regions a posteriori, lacking a solid theoretical foundation.

The authors identify shortcomings in conventional perturbation-based attribution methods and propose extremal perturbations as a solution. Unlike methods that rely on multiple energy terms, such as model response, mask area, and smoothness, extremal perturbations focus solely on the perturbation's effect. They disregard tunable hyper-parameters by fixing the perturbation area, which clearly delineates the region responsible for an output by maximizing the response within a defined, fixed-size region.

Technical Innovations and Methodology

The paper's key innovation lies in defining a perturbation as extremal if it maximally affects the network's output among all perturbations of a given size. The methodology introduces several techniques:

Area Constraint: A new ranking-based area loss function is developed to enforce perturbation size constraints robustly and efficiently. This is seen as a significant technical contribution because it allows a stable enforcement of area constraints in optimization.
Smooth Masks: The study proposes using a parametric family of smooth perturbations that guarantee a minimum level of smoothness, utilizing a smooth-max-convolution operator. This approach results in an interpretable optimization process where perturbation effects are studied as functions of their spatial extent.

Furthermore, the extremal perturbation framework extends traditional analysis to intermediate layers, offering insights into salient channels necessary for classification. Such insights can reveal network behavior through visualization techniques like feature inversion.

Numerical Results

The authors evaluate their method quantitatively using the pointing game evaluation metric, achieving compelling results on datasets such as PASCAL VOC and COCO when compared with other attribution methods. The empirical findings suggest that the extremal perturbation enables more precise region identification, leading to a better understanding of how evidence is integrated in neural network models.

Implications and Future Directions

The extreme perturbation approach presents an interpretable metatheory for assessing neural networks, potentially aiding domains such as model debugging, neural architecture search, and interpretability benchmarking. Moreover, it provides a foundation for future work exploring the use of optimized visualization techniques to discern both spatial and channel-level features in networks.

Future research could explore deeper relationships between perturbations at various network layers and output certainty, scale this work beyond the image domain, or leverage extremal perturbations in scenarios requiring transparency, such as in healthcare. As the machine learning community continues to strive for more transparent and interpretable models, approaches such as extremal perturbations may play a vital role in practical implementation and broader applications.

Markdown