Shapley Value in Machine Learning

Updated 3 May 2026

Shapley value is a fair attribution method in ML that assigns contributions to features, data points, or models based on efficiency, symmetry, null-player, and additivity.
It underpins key applications such as feature selection, model explainability (e.g., via SHAP), data valuation, and ensemble evaluation in various ML settings.
Approximation algorithms like Monte Carlo sampling and KernelSHAP enable scalable estimation of Shapley values even in high-dimensional or computationally intensive scenarios.

The Shapley value, originating in cooperative game theory, has become a foundational tool for attributing value to elements—such as data points, features, or models—in a variety of ML settings. Building upon rigorous axiomatic foundations, the Shapley value supplies a unique allocation scheme satisfying fairness properties (efficiency, symmetry, null-player, additivity) and forms the backbone of numerous algorithms for data valuation, model explainability, and resource sharing in ML workflows. This article delineates the mathematical formalism, computational strategies, application domains, and recent innovations—including order-sensitive extensions and scalable estimation—of the Shapley value in machine learning.

1. Shapley Value: Formal Definition and Axiomatic Foundations

In the canonical setup, let $N = \{1, \ldots, n\}$ denote the collection of "players" (which may be data points, features, models, or agents), and $v: 2^N \rightarrow \mathbb{R}$ a characteristic or utility function satisfying $v(\emptyset) = 0$ . The Shapley value attributed to player $i$ is defined as

$\phi_i(v) = \sum_{S \subseteq N \setminus \{i\}} \frac{|S|!\,(n-|S|-1)!}{n!} (v(S \cup \{i\}) - v(S)).$

An equivalent permutation-based form is

$\phi_i(v) = \frac{1}{n!} \sum_{\pi \in S_n} (v(P_i^\pi \cup \{i\}) - v(P_i^\pi)),$

where $P_i^\pi$ is the set of players preceding $i$ in permutation $\pi$ . This scoring procedure uniquely satisfies the following four axioms:

Efficiency: $\sum_{i} \phi_i(v) = v(N)$ (total value is distributed)
Symmetry: If $v: 2^N \rightarrow \mathbb{R}$ 0 for all $v: 2^N \rightarrow \mathbb{R}$ 1, then $v: 2^N \rightarrow \mathbb{R}$ 2
Null-player: If $v: 2^N \rightarrow \mathbb{R}$ 3 for all $v: 2^N \rightarrow \mathbb{R}$ 4, then $v: 2^N \rightarrow \mathbb{R}$ 5
Additivity: For any $v: 2^N \rightarrow \mathbb{R}$ 6, $v: 2^N \rightarrow \mathbb{R}$ 7

These axioms guarantee that the Shapley value is the fair and linear attribution scheme for cooperative scenarios, with applications in ML including model attribution, data valuation, and ensemble evaluation (Rozemberczki et al., 2022).

2. Core Machine Learning Applications

The Shapley value's axiomatic attributes directly enable several central ML applications:

Feature Selection and Attribution: Features are the players; $v: 2^N \rightarrow \mathbb{R}$ 8 is model performance using only subset $v: 2^N \rightarrow \mathbb{R}$ 9.
Model Explainability (SHAP): Each feature or feature-value is separately attributed for a given prediction. The SHAP family of methods operationalizes this using weighted regression schemes (KernelSHAP) to efficiently estimate attribution at a local or global level (Musco et al., 2024, Mayer et al., 18 Aug 2025).
Data Valuation: Each training data point is a player; $v(\emptyset) = 0$ 0 is performance (accuracy, loss, AUC) on a validation set when the model is trained on $v(\emptyset) = 0$ 1. Data Shapley ranks, prices, or selects examples for denoising, acquisition, or market design (Rozemberczki et al., 2022, Kwon et al., 2021, Li et al., 2023).
Ensemble Member Valuation: Individual models in ensembles are players; $v(\emptyset) = 0$ 2 is ensemble accuracy for subset $v(\emptyset) = 0$ 3, supporting methods for ensemble pruning or adversarial detection (Rozemberczki et al., 2021).
Multi-agent Reinforcement Learning: Agents are players; $v(\emptyset) = 0$ 4 is the global reward for subset $v(\emptyset) = 0$ 5 acting. Shapley credit assignment mediates decentralized policy learning.
Model-Independent Variable Importance: Using dependence measures (e.g., distance correlation, HSIC) in the value function, "Sunnies" attribute non-model-specific variable importance for exploratory analysis (Fryer et al., 2020).

3. Computational Techniques and Scalability

Exact computation of Shapley values requires evaluating $v(\emptyset) = 0$ 6 subsets or $v(\emptyset) = 0$ 7 permutations, impractical for $v(\emptyset) = 0$ 8. The field has developed a range of approximate algorithms:

Monte Carlo Permutation Sampling: Draw random permutations, compute marginal contributions in insertion order, average (Rozemberczki et al., 2022, Watson et al., 2023). Hoeffding and Chebyshev bounds give error guarantees.
KernelSHAP Weighted Regression: Solve a constrained least-squares regression to fit Shapley values to linearized coalition outputs, using kernel-based importance weighting (Musco et al., 2024, Mayer et al., 18 Aug 2025).
Leverage Score Sampling: Provably accurate and efficient alternative, requiring $v(\emptyset) = 0$ 9 model evaluations, guarantees with high probability that sampled normal equations yield $i$ 0-approximate Shapley values (Musco et al., 2024).
Delta-Shapley and Diminishing Marginals: Exploiting the $i$ 1 decay of marginal contributions (coalition size $i$ 2), algorithms restrict sampling to medium-sized coalitions, reducing cost by up to one order of magnitude with negligible rank-order loss (Watson et al., 2023, Watson et al., 2022).
Coalitional and Partial Ordinal Values: Modify the sampling or weighting to account for coalitions of logically-dependent features (Amoukou et al., 2021), or to respect the order in which players "join" (curriculum-learning, recommendation order) via Partial Ordinal Shapley Value (POSV) and order-aware permutation strategies (Liu et al., 2023).
Efficient Closed-Form Solutions: For certain models (e.g., Naive Bayes with additive log-odds, or regression with Gaussian data), analytic integration yields exact or tight-bounded forms for Shapley values, further reducing computational load (Lemaire et al., 2023, Kwon et al., 2020).

Method	Complexity	Use Case	Error Bounds
Permutation MC	$i$ 3	General, small/medium $i$ 4	Hoeffding
KernelSHAP	$i$ 5	Local model explainability	CLT, empirical
Leverage SHAP	$i$ 6	Provable global accuracy	Matrix Chernoff
Delta-Shapley	$i$ 7	Data valuation	Stability/SGD-based
Analytic	$i$ 8	Special model classes	Exact or bounded

4. Extensions, Generalizations, and Variants

Recent work has expanded the Shapley value concept to address modern ML use cases:

Order-Sensitive and Partial Ordinal Schemes: To account for the sequential utility of data in order-sensitive ML pipelines (curriculum learning, federated learning with ordered clients), the Partial Ordinal Shapley Value (POSV) replaces classic set-based symmetry with group-theoretic axioms over ordered tuples, with order-based marginal contributions (Liu et al., 2023). Sampling strategies (TMC, CMC, CTMC) deliver scalable, provably unbiased POSV estimates, with error bounds and efficient approximations for class-structured problems.
Probabilistic Classifier Adaptation: For probability-calibrated models, the classical binary-accuracy value function is replaced with a calibration-sensitive function using activation ( $i$ 9) on correctness-confidence pairs; the resulting "P-Shapley" gives more discriminative data valuations (Li et al., 2023).
Beta Shapley and Semivalues: By relaxing the efficiency axiom and explicitly re-weighting coalition sizes, Beta Shapley and generalized semivalue frameworks reduce estimation variance and adapt the attribution to signal-rich coalition regimes (Kwon et al., 2021).
Distributional and Probabilistic Shapley: Extending to settings in which data is drawn from a distribution, the distributional Shapley value and its efficient analytic solutions allow out-of-sample valuation, stability under dataset perturbations, and principled decomposition into expected value and variance under stochastic sampling (Ghorbani et al., 2020, Kwon et al., 2020, Jia et al., 20 Jan 2026).
Class-Wise Valuation: CS-Shapley introduces a value function separable with respect to in-class and out-of-class dev-set accuracy, fundamentally enhancing the detection of mislabeled or harmful examples in supervised classification (Schoch et al., 2022).
Differentially Private Valuation: Layered Shapley algorithms stratify sampling by coalition size and, using bounded marginal gains, support differentially private value queries with additive Laplace noise (Watson et al., 2022).
Fairness under Approximation: Probably Approximate Shapley Fairness formalizes fidelity of approximate solutions, quantifying the probability and magnitude of fairness violations under stochastic or budgeted computation, and supplies a greedy active estimation (GAE) algorithm optimizing fidelity under a fixed budget (Zhou et al., 2022).

5. Empirical Performance and Diagnostics

Multiple studies have established the practical competitiveness of Shapley value methods, across regression, classification, ensemble construction, and exploratory analysis:

Data Valuation: Removal/retention experiments show that removing high-Shapley-valued examples first maximally reduces accuracy, while adding low-valued examples slows recovery (Watson et al., 2023, Schoch et al., 2022).
Noisy Label Detection: Shapley-based rankings outperform leave-one-out and uncertainty-baseline methods in identifying label noise, particularly in Beta Shapley, CS-Shapley, and P-Shapley settings (Kwon et al., 2021, Schoch et al., 2022, Li et al., 2023).
Ensemble Pruning and Adversarial Detection: Attribution via Troupe or model-valuation Shapley discriminates weak, redundant, or adversarial models, rationalizes pruning, or guides reward splitting in collaborative ML (Rozemberczki et al., 2021).
Interpretability and Faithfulness: Comparative studies versus heuristic alternatives (e.g., Weight of Evidence, KernelSHAP, TreeSHAP) demonstrate that analytic or coalitional Shapley approaches maintain consistency, avoid double-counting, and align closely with human-inferable attributions (Lemaire et al., 2023, Amoukou et al., 2021).
Model Diagnostics and Sensitivity Analysis: Model-independent schemes ("Sunnies") reveal non-linear dependence structure among features, identifying discrepancies between data-driven and model-driven attributions (Fryer et al., 2020).

6. Limitations, Open Problems, and Future Directions

Despite its rigorous foundations and growing toolbox of computational methods, several challenges and open questions remain:

Computational Tractability: Even with variance reduction and importance sampling, full coalition enumeration is infeasible for high-dimensional feature spaces or large datasets. Ongoing research addresses further sample complexity reductions, surrogate/influence-based proxies, or architectures with analytic collapse (Rozemberczki et al., 2022, Lemaire et al., 2023).
Robustness of Axioms Under Approximation: Approximate solutions may violate symmetry or efficiency, especially under tight computational budgets or for high-variance data; formalizing and controlling the probability of fairness violations is an active area (Zhou et al., 2022).
Choice of Value Function: Shapley attributions are only as interpretable as the coalition utility function; design must reflect intended notion of value, such as order, class-structure, or dependence, and avoid artifacts in metric selection.
Extensions to Structured/Dependent Inputs: Correlated or highly collinear features, categorical variables, or grouped data challenge naive applications and benefit from coalitional or hierarchical extensions (Amoukou et al., 2021).
Stochastic/Probabilistic Settings: When contributors supply samples from underlying distributions (rather than fixed deterministic objects), value and its variance must be estimated jointly—pooling and stratified resampling are under exploration (Jia et al., 20 Jan 2026).
Privacy and Federated Valuation: Shapley estimation protocols must increasingly provide guarantees under privacy constraints, limited access, or distributed settings (Watson et al., 2022).

Potential directions include adaptive allocation schemes, distributed or federated value estimation, theoretically optimal sampling, and more expressive value functions (e.g., for calibrated probabilities or non-i.i.d. data).

References:

"The Shapley Value in Machine Learning" (Rozemberczki et al., 2022);
"Data valuation: The partial ordinal Shapley value for machine learning" (Liu et al., 2023);
"Accelerated Shapley Value Approximation for Data Evaluation" (Watson et al., 2023);
"Shapley Value on Probabilistic Classifiers" (Li et al., 2023);
"An Efficient Shapley Value Computation for the Naive Bayes Classifier" (Lemaire et al., 2023);
"DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation" (Garrido-Lucero et al., 2023);
"Differentially Private Shapley Values for Data Evaluation" (Watson et al., 2022);
"Shapley Value on Uncertain Data" (Jia et al., 20 Jan 2026);
"CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification" (Schoch et al., 2022);
"Beta Shapley: a Unified and Noise-reduced Data Valuation Framework for Machine Learning" (Kwon et al., 2021);
"A Distributional Framework for Data Valuation" (Ghorbani et al., 2020);
"Efficient computation and analysis of distributional Shapley values" (Kwon et al., 2020);
"Probably Approximate Shapley Fairness with Applications in Machine Learning" (Zhou et al., 2022);
"Provably Accurate Shapley Value Estimation via Leverage Score Sampling" (Musco et al., 2024);
"Shapley Values: Paired-Sampling Approximations" (Mayer et al., 18 Aug 2025);
"The Shapley Value of coalition of variables provides better explanations" (Amoukou et al., 2021);
"Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies" (Fryer et al., 2020).