Shapley-Value Explanations in ML

Updated 31 March 2026

Shapley-value explanations are a game-theoretic method that assigns fair attribution scores to features based on their marginal contributions and axiomatic guarantees.
They use estimation algorithms like KernelSHAP, TreeSHAP, and amortized learning to efficiently approximate attributions in complex machine learning models.
Their application spans tabular, deep, tree-based, graph, and quantum models, highlighting both the versatility of the method and challenges with feature dependencies and causality.

Shapley-value-based explanations constitute a central paradigm for feature attribution, interpretability, and transparency in machine learning and artificial intelligence. Rooted in cooperative game theory and axiomatized for fairness, their adoption spans black-box models, tree ensembles, deep neural networks, Gaussian processes, and reinforcement learning. Shapley-value explanations allocate to each input variable an importance score quantifying its average marginal contribution to a model’s prediction, subject to properties including efficiency, symmetry, dummy (nullity), and additivity. Their conceptual and algorithmic scope encompasses rigorous theoretical guarantees, practical estimation schemes, model-specific optimizations, algorithmic speedups, application-aware decompositions, and quantum-accelerated computation.

1. Mathematical Foundations and Axiomatic Characterization

The classical Shapley value arises in cooperative game theory as the unique solution for dividing a total payoff among $M$ players, based on the characteristic function $v:2^M \rightarrow \mathbb{R}$ that assigns a coalition value to each subset of players. For each $i\in M$ ,

$\phi_i(v) = \sum_{S\subseteq M\setminus\{i\}} \frac{|S|!(M-|S|-1)!}{M!}\left[v(S\cup\{i\})-v(S)\right]$

Key axioms proven to uniquely determine this formula include:

Efficiency: $\sum_{i}\phi_i = v(M)$ .
Symmetry: If $v(S\cup\{i\}) = v(S\cup\{j\})$ for all $S$ , then $\phi_i = \phi_j$ .
Dummy: If $v(S\cup\{i\})=v(S)$ for all $S$ , then $\phi_i=0$ .
Additivity: For two games $v$ and $w$ , $\phi_i(v+w)=\phi_i(v)+\phi_i(w)$ .

In the context of local explanatory modeling, this framework is rigorously extended: van Batenburg formally proves that local attributions $\phi(f,x)$ satisfying local accuracy, missingness, symmetry, and consistency map identically to the cooperative-game Shapley value formula, and that the symmetry axiom is necessary (contrary to earlier claims of redundancy) (Batenburg, 28 Sep 2025). Furthermore, the Shapley value is characterized as the unique solution to a weighted least-squares regression problem, a formulation underpinning KernelSHAP and its variants (Batenburg, 28 Sep 2025).

2. Shapley-form Explanation Workflows and Game Design

Applying Shapley values to model explanations entails defining an appropriate “explanation game,” where features correspond to players, and the characteristic function $v(S)$ reflects the predictive contribution of feature subset $S$ via marginalization, conditional expectation, or intervention. Canonical choices include:

Conditional (observational): $v(S)=\mathbb{E}[f(X)\mid X_S=x_S]$ (Kumar et al., 2020, Batenburg, 28 Sep 2025, Heskes et al., 2020, Michiels et al., 2023).
Interventional (marginal): $v(S) = \mathbb{E}_{X_{\bar{S}}}[f(x_S, X_{\bar{S}})]$ (Kumar et al., 2020, Michiels et al., 2023).
Do-interventional (causal): $v^{\text{do}}(S) = \mathbb{E}[f(X)\mid \text{do}(X_S=x_S)]$ , where do-calculus enforces structural constraints from a causal graph (Heskes et al., 2020).
Retraining: For $S$ , model is retrained/fitted on those features only (Campbell et al., 2021).

Each variant preserves the classical Shapley axioms, but actual attributions can diverge dramatically depending on the imposed data or causal structure, as observed in the “hireMales” and interventional/conditional decomposition scenarios (Merrick et al., 2019, Michiels et al., 2023).

3. Estimation Algorithms, Approximations, and Model-Specific Schemes

The combinatorial blowup of $2^M$ coalitions severely limits direct Shapley value computation. This has precipitated a line of estimation and approximation methods:

Weighted least-squares surrogates: KernelSHAP recasts Shapley value computation as solving a weighted regression problem over sampled coalitions, with permutation weights as kernel (Batenburg, 28 Sep 2025, Jethani et al., 2021).
Amortized learning: FastSHAP replaces on-the-fly regression with a neural explainer trained to approximate Shapley values with one forward pass, matching KernelSHAP accuracy with 200–1000× speedup (Jethani et al., 2021).
Tree-based optimizations: TreeSHAP leverages tree structure for polynomial-time exact solutions in tree ensembles, under independence assumptions (Amoukou et al., 2021). The Eject method provides “model-true” Shapley attributions, ensuring that unused-path features receive zero credit and reducing computational cost to $O(2^k)$ per instance (where $k$ is the depth of instance’s decision path) (Campbell et al., 2021).
Feature dependency graphs: ShapG constructs a sparse feature-correlation graph to restrict sampling to local “neighborhoods,” cutting runtime by orders of magnitude relative to global Shapley enumeration and yielding more accurate explanations in high-dimensional settings (Zhao et al., 2024).
Graph neural networks: GraphSVX extends Shapley games to coalitions over node features and neighbor subgraphs, and recovers attributions via a surrogate regression model, maintaining locality and efficiency through smart sampling (Duval et al., 2021).
Gaussian process models: For GP predictors, the entire Shapley value is Gaussian-distributed, yielding quantifiable uncertainty in explanations and tractable covariance quantification over attributions (Chau et al., 2023).
Quantum algorithms: Quantum mean estimation provides a near-quadratic speedup for Shapley value estimation by encoding the required sum in quantum amplitudes and leveraging quantum amplitude estimation (Burge et al., 2024).

For deep architectures and pipeline compositions, DeepSHAP enables efficient backpropagation of Shapley attributions through layered transformations, yielding fast, group-compliant explanations for composite models (Chen et al., 2021). Shapley Explanation Networks shift Shapley transforms inside the model, enabling intrinsic explanations, explanation regularization, and rapid evaluation (Wang et al., 2021). In attention-based transformers, attention flow outflows can be formally shown to satisfy the Shapley axioms (at the layerwise level), providing a class of theoretically justified explanations (Ethayarajh et al., 2021).

4. Extensions: Data Structure, Causality, and Decomposition

Key limitations of classical Shapley explanations are their dependence on data manifold coverage and sensitivity to structural and observation bias:

Causal Shapley: By replacing conditional expectations with do-intervention, causal Shapley values realign attributions with direct and indirect effects propagated via known causal graphs, correcting for assignation failures in the presence of mediation or confounding (Heskes et al., 2020). This approach allows for decomposing total feature contributions into direct and indirect causal components.
Model–data dependence decomposition: Conditional (observational) attributions conflate model logic and data dependencies; the decomposition framework isolates “interventional” and “dependent” components, revealing the spectrum from direct model effects to attributions induced purely by statistical dependency among features (Michiels et al., 2023).
Subgroup and coalition handling: Coalition-Shapley values correctly aggregate attributions for multi-level categorical variables, avoiding the widespread but invalid practice of summing across dummy encodings (Amoukou et al., 2021).
Error and informativeness analysis: Explanation error is dissected into observation bias (finite-sample, surrogate overfitting) and structural bias (distributional/modeling assumptions), formalizing the over-informative and under-informative regimes, and measuring distributional drift through OOD-detection and total-variation metrics (Zhao et al., 2024).

Precision of individual attributions degrades rapidly in regions sparse in the feature space, a phenomenon systematically analyzed and quantified for conditional Shapley methods (Olsen, 2023).

5. Applications: Scope, Limitations, and Model Classes

Shapley-value-based explanations are broadly applicable to:

Tabular regression/classification: KernelSHAP, sampling-based methods, and graph-local approximations (ShapG) enable scalable and accurate global and local explanations (Zhao et al., 2024, Jethani et al., 2021).
Tree-based models: Ensemble-specific methods (TreeSHAP, Eject, leaf/discrete estimators) offer bias reductions and computational gains, with precise behaviors for categorical and dependent features (Amoukou et al., 2021, Campbell et al., 2021).
Deep learning and vision: Heatmap attribution employs game-theoretic and Taylor-approximate Shapley values (ShapleyCAM), revealing the connection between heuristic CAMs and Shapley theory (Cai, 9 Jan 2025). DeepSHAP and Shapley Explanation Networks operationalize rapid, layerwise-exact explanations (Chen et al., 2021, Wang et al., 2021).
GNNs and structured data: GraphSVX demonstrates the extension of Shapley axiomatics to joint node-feature attributions on graphs (Duval et al., 2021).
Reinforcement learning: Three classes of Shapley games distinguish contributions to agent behavior, expected return, or value estimate, with explicit constructs and guarantees (Beechey et al., 12 May 2025).
Quantum machine learning: Provably quantum-accelerated estimation enables tractable Shapley computations in computationally difficult domains (Burge et al., 2024).

Nevertheless, these explanations are not without fundamental limitations:

Global averaging: Classical Shapley explanations can assign non-zero importance to locally-unused or globally spurious features due to their averaging over all permutation orderings (Amoukou et al., 2021, Campbell et al., 2021).
Counterintuitive attributions: In the presence of collinearity, proxy features, or causally related variables, Shapley attributions can violate intuitive credit assignment unless causal structure is enforced (Kumar et al., 2020, Heskes et al., 2020).
Interface with human explanations: Shapley methods are often not contrastive, actionable, or robust to user interpretation requirements; their use is therefore best motivated for well-specified, fairness-axiomatized tasks rather than as universal explanation tools (Kumar et al., 2020, Merrick et al., 2019).

6. Best Practices, Evaluation, and Future Directions

Practical deployment of Shapley-value-based explanations demands careful articulation of the explainer’s goal, selection and justification of conditional/interventional/causal games, and sensitivity analysis to both bias sources and data region (Zhao et al., 2024, Olsen, 2023). For model auditing, recourse, and policy evaluation, interventional and causal Shapley provide transparent decompositions. If only ranking of features is sought, a range of estimators suffice, but raw attribution magnitudes should not be trusted in low-density or OOD domains (Olsen, 2023).

Emerging directions include quantum speedups (Burge et al., 2024), amortized and intrinsic explanations (Jethani et al., 2021, Wang et al., 2021), robust uncertainty quantification (Chau et al., 2023), graph/sequence extensions (Zhao et al., 2024, Duval et al., 2021), and causal-structure-informed attributions (Heskes et al., 2020, Michiels et al., 2023).

Shapley-value-based explanations have developed into a rigorous, multi-faceted interpretability toolkit. Their theoretical optimality under axiomatic properties is balanced by significant modeling and computational challenges and inherent limitations in human-aligned explanation. Contemporary research continues to expand their capacity for principled, context-aware, and scalable model interrogations across the machine learning spectrum.