Sampling Permutations for Shapley Value Estimation (2104.12199v2)

Published 25 Apr 2021 in stat.ML, cs.LG, and math.CO

Abstract: Game-theoretic attribution techniques based on Shapley values are used to interpret black-box machine learning models, but their exact calculation is generally NP-hard, requiring approximation methods for non-trivial models. As the computation of Shapley values can be expressed as a summation over a set of permutations, a common approach is to sample a subset of these permutations for approximation. Unfortunately, standard Monte Carlo sampling methods can exhibit slow convergence, and more sophisticated quasi-Monte Carlo methods have not yet been applied to the space of permutations. To address this, we investigate new approaches based on two classes of approximation methods and compare them empirically. First, we demonstrate quadrature techniques in a RKHS containing functions of permutations, using the Mallows kernel in combination with kernel herding and sequential Bayesian quadrature. The RKHS perspective also leads to quasi-Monte Carlo type error bounds, with a tractable discrepancy measure defined on permutations. Second, we exploit connections between the hypersphere $\mathbb{S}^{d-2}$ and permutations to create practical algorithms for generating permutation samples with good properties. Experiments show the above techniques provide significant improvements for Shapley value estimates over existing methods, converging to a smaller RMSE in the same number of model evaluations.

Citations (96)

View on Semantic Scholar

Summary

Sampling Permutations for Shapley Value Estimation: An Analysis

The paper "Sampling Permutations for Shapley Value Estimation" by Rory Mitchell et al. addresses the complex challenge of estimating Shapley values, a game-theoretic approach widely used for interpreting machine learning models. Shapley values, originating from cooperative game theory, allocate payoffs equitably among players based on their contribution, and their exact computation is NP-hard. This necessitates the development of approximation methods to make Shapley value estimation feasible for intricate models.

Core Contributions

The paper makes several significant contributions to the field:

Application of RKHS in Permutation Space: The authors extend reproducing kernel Hilbert space (RKHS) methodologies, typically used in continuous domains, to the discrete domain of permutations. They employ several kernels over permutations, notably the Kendall, Mallows, and Spearman kernels. This innovative approach helps characterize good sample sets and optimizes their selection through kernel-based algorithms.
Sampling via Hypersphere Connections: The authors exploit the relationship between permutations and hyperspheres $\mathbb{S}^{d-2}$ to generate high-quality permutation samples. They introduce orthogonal spherical codes and Sobol sequence-based methods as practical and efficient sampling techniques.
Experimental Evaluation: The paper conducts empirical evaluations using tabular datasets, where gradient boosted decision trees and neural networks are analyzed. The novel sampling methods, especially those involving kernel herding and orthogonal spherical sampling, show improved convergence to smaller RMSEs compared to standard methods. This is further corroborated by experiments on image data using convolutional networks, wherein the proposed techniques provide competitive accuracy while maintaining computational efficiency.

Key Findings and Implications

Improvement over Monte Carlo Methods: The research demonstrates that novel approaches such as orthogonal spherical codes and kernel herding deliver significant improvements over traditional Monte Carlo methods, which feature intractable convergence characteristics in the permutation space.
Discrepancy and Optimization: By defining a discrepancy in permutation-based RKHS, the paper offers a quantitative measure for evaluating permutation sample quality, yielding a versatile application across different machine learning models and datasets.
High-dimensional Problem Handling: Sampling strategies like Sobol sequences show particular promise in handling high-dimensional Shapley value estimations, thus expanding the applicability of these methods to more complex machine learning models.

Future Directions

This work opens several avenues for future research:

Parameter Tuning for Kernels: While the Mallows kernel is shown to be effective due to its universality, further research could explore automatic tuning methods for $\lambda$ , making kernel herding and SBQ more adaptive and reducing the need for manual hyperparameter tuning.
Expanding Hypersphere Utilization: The innovative connection between permutations and hyperspheres could be explored further to develop even more efficient sampling algorithms.
Integration with Other Interpretability Frameworks: The methods developed here could be integrated with other interpretability approaches, potentially enhancing their accuracy and computational efficiency.

In conclusion, this paper significantly advances the state of Shapley value estimation by developing sophisticated sampling methods grounded in kernel theory and geometric insights. These innovations not only enhance Shapley value computations for model interpretation but also broaden the groundwork for future research in this critical field of algorithmic interpretability.

Related Papers

YouTube

Show All Videos