Explaining Explanations: Axiomatic Feature Interactions for Deep Networks (2002.04138v3)

Published 10 Feb 2020 in cs.LG and stat.ML

Abstract: Recent work has shown great promise in explaining neural network behavior. In particular, feature attribution methods explain which features were most important to a model's prediction on a given input. However, for many tasks, simply knowing which features were important to a model's prediction may not provide enough insight to understand model behavior. The interactions between features within the model may better help us understand not only the model, but also why certain features are more important than others. In this work, we present Integrated Hessians, an extension of Integrated Gradients that explains pairwise feature interactions in neural networks. Integrated Hessians overcomes several theoretical limitations of previous methods to explain interactions, and unlike such previous methods is not limited to a specific architecture or class of neural network. Additionally, we find that our method is faster than existing methods when the number of features is large, and outperforms previous methods on existing quantitative benchmarks. Code available at https://github.com/suinleelab/path_explain

Citations (136)

View on Semantic Scholar

Summary

The paper introduces Integrated Hessians as a novel extension of Integrated Gradients to quantify pairwise feature interactions.
It builds on axioms like interaction and self-completeness to ensure that feature interactions are mathematically sound.
Empirical evaluations demonstrate that Integrated Hessians outperforms existing methods, offering robust scalability across various neural network architectures.

An Expert Evaluation of "Explaining Explanations: Axiomatic Feature Interactions for Deep Networks"

The paper "Explaining Explanations: Axiomatic Feature Interactions for Deep Networks" addresses a significant challenge in the interpretability of deep neural networks (DNNs): the explanation of feature interactions. Feature attribution methods have predominantly focused on identifying which input features are crucial for a model's prediction, without necessarily elucidating the interactions between these features that may drive such predictions. This paper introduces a novel method called Integrated Hessians, which extends the existing Integrated Gradients technique to explain pairwise feature interactions within neural networks.

Integrated Hessians leverages an axiomatic foundation to overcome certain limitations of previous approaches—most notably, the dependency on specific neural network architectures. Unlike methods such as Neural Interaction Detection that apply only to feed-forward networks, or others requiring Bayesian Neural Networks, Integrated Hessians is applicable to any differentiable neural network architecture. This generalizability, coupled with a theoretically rigorous basis, positions the technique to be a versatile tool for neural network interpretability.

Theoretical Foundation

The formulation of Integrated Hessians is grounded in several axioms, including interaction completeness and self-completeness. Interaction completeness ensures that the sum of pairwise interactions equals the difference in the model's output at the input and baseline, while self-completeness guarantees that the main effect of a feature is captured when there are no interactions. These axioms help quantify feature interactions in a way that relates directly to the model output, thereby facilitating meaningful interpretation.

Furthermore, by combining Integrated Gradients with a path integral method across input space, Integrated Hessians measures feature interactions as the second-order extension of feature attributions. This approach resolves theoretical issues with existing methods, like those based solely on input Hessians, which might predictably fail in non-differentiable regions commonly associated with ReLU activations. The paper addresses such cases by approximating ReLU using SoftPlus activation, shown to preserve network behavior while enabling Integrated Hessians to compute interactions accurately.

Empirical Validation

The authors provide a robust set of empirical evaluations to substantiate their claims. Notably, they compare Integrated Hessians against four methods: a Monte Carlo estimation of the Shapley Interaction Index, Contextual Decomposition, Neural Interaction Detection, and input Hessian-based methods. Integrated Hessians consistently demonstrates superior performance in identifying interactions across various simulated datasets, reflecting its efficacy in practice beyond theoretical guarantees.

In evaluating computational efficiency, Integrated Hessians shows a favorable scalability pattern. As the dimensionality of input features increases, it becomes computationally more tractable compared to other methods, owing largely to efficient parallelization options for gradient computations on modern GPUs.

Practical Implications and Future Trajectories

The introduction of Integrated Hessians presents practical implications for fields where understanding complex model behavior is crucial, such as healthcare and finance. By allowing more transparent insights into feature interactions, stakeholders can ensure that neural network-based decisions rely on reasonable and legally justifiable grounds, potentially enhancing trust and adoption of AI systems in critical applications.

The paper suggests directions for future research, including extending the technique to higher-order interactions (beyond pairs) and potentially integrating interaction-aware models into an advanced debugging framework during neural network training. Moreover, the adaptability of Integrated Hessians to a broad range of neural architectures resonates with the increasing complexity and diversity of models used in modern AI systems.

In conclusion, this paper provides a significant contribution to the field of model interpretability, offering a methodological advancement that promises to facilitate deeper insights into the otherwise opaque decision-making processes of deep networks. This advancement holds substantial promise for continued exploration into the nuanced mechanics of AI interpretability.

PDF Markdown

Related Papers

GitHub

GitHub - suinleelab/path_explain: A repository for explaining feature attributions and feature interactions in deep neural networks. (185 stars)

Tweets

https://twitter.com/joejanizek/status/1759694723185512469