The many Shapley values for model explanation (1908.08474v2)

Published 22 Aug 2019 in cs.AI, cs.LG, and econ.TH

Abstract: The Shapley value has become a popular method to attribute the prediction of a machine-learning model on an input to its base features. The use of the Shapley value is justified by citing [16] showing that it is the \emph{unique} method that satisfies certain good properties (\emph{axioms}). There are, however, a multiplicity of ways in which the Shapley value is operationalized in the attribution problem. These differ in how they reference the model, the training data, and the explanation context. These give very different results, rendering the uniqueness result meaningless. Furthermore, we find that previously proposed approaches can produce counterintuitive attributions in theory and in practice---for instance, they can assign non-zero attributions to features that are not even referenced by the model. In this paper, we use the axiomatic approach to study the differences between some of the many operationalizations of the Shapley value for attribution, and propose a technique called Baseline Shapley (BShap) that is backed by a proper uniqueness result. We also contrast BShap with Integrated Gradients, another extension of Shapley value to the continuous setting.

Citations (564)

View on Semantic Scholar

Summary

The paper challenges the traditional uniqueness of Shapley values by demonstrating diverse attribution outcomes across different computation methods.
It introduces BShap, a model explanation technique rooted in axiomatic properties, offering more consistent and interpretable attributions than CES.
Empirical validation using a diabetes prediction model highlights BShap's effectiveness in reducing inconsistencies from data sparsity and noise.

Analysis of "The Many Shapley Values for Model Explanation"

Mukund Sundararajan and Amir Najmi's paper, "The Many Shapley Values for Model Explanation," presents a significant examination of the Shapley value's application in model explanations within machine learning. This paper explores the complexities and multiplicity inherent in deriving model attributions using Shapley values, particularly when faced with varying contexts concerning model training data and explanation frameworks.

Key Contributions and Findings

The authors identify a central issue: the existence of multiple operational methods to compute Shapley values for model attributions, leading to diverse outcomes. These varying methods raise concerns about the validity of the Shapley value's uniqueness, as traditionally advocated in cooperative game theory.

The paper offers a critical assessment of these diverse Shapley frameworks by highlighting their potential for generating counterintuitive attributions. A notable example from their analysis is the assignment of non-zero attributions to features not referenced by the model. The authors scrutinize several existing approaches and propose the Baseline Shapley (BShap) method as a robust alternative backed by a well-founded uniqueness result.

Theoretical Insights

The paper provides an axiomatic approach to understanding the distinct Shapley operationalizations by examining properties such as Dummy, Efficiency, Linearity, and Symmetry. The authors present instances where certain operationalizations, like CES with training data distribution, fail to maintain these axioms, leading to inconsistent and potentially misleading attributions.

Furthermore, an axiomatic framework is established for BShap, demonstrating its compliance with essential properties such as Demand Monotonicity and Affine Scale Invariance. This positions BShap as a more theoretically sound option compared to some existing methods.

Empirical Validation

The authors illustrate their theoretical claims through an empirical case paper using a diabetes prediction model, employing techniques such as Lasso regression. Here, they observe marked variability in attributions across explicand samples for different Shapley methods. Notably, BShap provides more consistent attributions in contrast to CES, which exhibited sensitivity to data sparsity and noise.

Implications and Future Directions

The investigation into Shapley value operationalizations has substantial implications for the field of interpretable AI. It calls for a more nuanced understanding of how attributions are computed and the interpretability of these attributions.

Practically, the introduction of BShap offers a method that respects key desiderata for attribution methods, potentially guiding the development of more reliable AI explanations. The interplay between BShap and CES also suggests rich avenues for further research, particularly in developing hybrid methods that balance the strengths of both techniques.

Future work should focus on refining the theoretical underpinnings of model explanation methods and exploring their application in complex systems, such as deep learning ensemble models. Moreover, the impact of choice of baseline and distribution in CES, as highlighted, warrants further exploration to fully understand its implications under various real-world scenarios.

Conclusion

Sundararajan and Najmi’s work provides a comprehensive exploration into the nuances of Shapley values in model explanation. By proposing and supporting BShap, they pave the way for more coherent and dependable model attributions, crucial for advancing fairness and transparency in AI systems. Their work also instigates a dialogue on the evolving theoretical landscape of model interpretability, encouraging future research to build on their foundational insights.

PDF Markdown