Explaining by Removing: A Unified Framework for Model Explanation (2011.14878v2)

Published 21 Nov 2020 in cs.LG and stat.ML

Abstract: Researchers have proposed a wide variety of model explanation approaches, but it remains unclear how most methods are related or when one method is preferable to another. We describe a new unified class of methods, removal-based explanations, that are based on the principle of simulating feature removal to quantify each feature's influence. These methods vary in several respects, so we develop a framework that characterizes each method along three dimensions: 1) how the method removes features, 2) what model behavior the method explains, and 3) how the method summarizes each feature's influence. Our framework unifies 26 existing methods, including several of the most widely used approaches: SHAP, LIME, Meaningful Perturbations, and permutation tests. This newly understood class of explanation methods has rich connections that we examine using tools that have been largely overlooked by the explainability literature. To anchor removal-based explanations in cognitive psychology, we show that feature removal is a simple application of subtractive counterfactual reasoning. Ideas from cooperative game theory shed light on the relationships and trade-offs among different methods, and we derive conditions under which all removal-based explanations have information-theoretic interpretations. Through this analysis, we develop a unified framework that helps practitioners better understand model explanation tools, and that offers a strong theoretical foundation upon which future explainability research can build.

Authors (3)

Ian Covert (18 papers)
Scott Lundberg (18 papers)
Su-In Lee (37 papers)

Citations (220)

View on Semantic Scholar

Summary

Explaining by Removing: A Unified Framework for Model Explanation

The paper, "Explaining by Removing: A Unified Framework for Model Explanation," addresses the complex landscape of model explanation methods in ML. The authors, Ian Covert, Scott Lundberg, and Su-In Lee, propose a comprehensive framework that streamlines the understanding of numerous approaches to model interpretability. The central concept is "removal-based explanations," which involve assessing the impact of withholding features on a model's predictions. This paper is critical in navigating the varied methodologies, offering a systematic approach to unify and compare 26 prevalent methods, including SHAP, LIME, and others.

Overview of the Framework

The presented framework characterizes model explanation methods along three dimensions:

Feature Removal: How features are removed from the model, ranging from replacing them with default values to marginalizing them over a distribution.
Model Behavior: The aspect of the model analyzed, such as prediction probabilities or loss functions.
Summary Technique: How the influence of each feature is summarized, using techniques like Shapley values, which are derived from cooperative game theory.

Key Contributions

Unification of Methods: By systematically categorizing existing methods based on these three dimensions, the framework reveals inherent connections between seemingly disparate methodologies. This unification clarifies the literature and provides a basis for analyzing and comparing methods.
Connections to Related Fields: The paper bridges model explanation methods with cognitive psychology, cooperative game theory, and information theory. In cognitive psychology, the notion of "subtractive counterfactual reasoning" aligns with the feature removal principle. From an information-theoretic perspective, the authors illustrate that explanations rooted in conditional feature distributions can be interpreted in terms of mutual information, providing insights into how features convey information about responses.
Cooperative Game Theory: The systematic approach aligns many explanation techniques with cooperative games, particularly highlighting the value of Shapley values in fair allocation and feature importance determination. The paper elaborates on how different summarization techniques satisfy various axiom subsets, offering a deeper understanding of their theoretical implications.

Practical Implications and Future Directions

The paper implies practical ramifications for the design and evaluation of model explanation tools. By adopting a systematic framework, practitioners can more easily assess the suitability of explanation methods for specific applications, considering aspects like feature dependencies and the interpretability of derived attributions. Furthermore, the framework opens avenues for developing new methods by combining different choices from the three dimensions, thus directly contributing to the flexibility and adaptability of model explanations.

Looking forward, the authors suggest future developments might include more sophisticated approaches for approximating conditional distributions, which are crucial for making removal-based explanations more robust and interpretable. Additionally, exploring user-centred explanation methods could align better with human cognitive biases, enhancing the effectiveness of model explanations in real-world applications.

In sum, this framework not only elucidates existing methods but also sets the stage for future innovation in the field of ML interpretability, facilitating a methodical approach to understanding, leveraging, and advancing model explanation techniques.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - iancovert/removal-explanations: A lightweight implementation of removal-based explanations for ML models. (59 stars)

Tweets

https://twitter.com/Apoorva__Lal/status/1777106164599402954