Shapley Value Regression Explained

Updated 23 March 2026

Shapley value regression is a methodology that applies cooperative game theory to assign fair contributions of individual predictors to overall model performance.
It leverages techniques like Monte Carlo sampling, variance reduction, and linear-algebraic optimizations to efficiently estimate feature contributions in high-dimensional and nonlinear settings.
Extensions include local, group-level, Bayesian, and robust adaptations that enable transparent model explanations and improved interpretability in diverse applications.

Shapley value regression refers to the family of methodologies that apply Shapley values from cooperative game theory—originally formulated for attributing payoffs to individual players—to regression modeling and model explanation. In this context, Shapley values are used to fairly attribute model fit (e.g., $R^2$ , KL- $R^2$ ) or predictions to individual features, feature groups, training data points, or structural parameters, all in a way that respects feature dependencies and the contribution of each in complex, possibly correlated, and nonlinear regression frameworks.

1. Foundations: Shapley Values for Regression Attribution

The canonical Shapley value for a feature $j$ is defined as the average marginal gain in a model's value function (such as out-of-sample $R^2$ , log-likelihood, deviance, classification accuracy, or direct prediction) when adding feature $j$ to all possible subsets $S$ of features that do not already contain $j$ . For a regression task over $d$ features, this is given by

$\varphi_j = \sum_{S \subset \{1, \ldots, d\} \setminus \{j\}} \frac{|S|!\,(d-|S|-1)!}{d!}\,[v(S \cup \{j\}) - v(S)],$

where $v(S)$ is the model value associated with the coalition $S$ (such as $R^2(S)$ for the least-squares setting) (Bell et al., 2023, Acemoglu et al., 2 Jan 2026, Alkhatib et al., 7 May 2025, Tang et al., 2024). This combinatorial definition satisfies the efficiency, symmetry, and dummy axioms. For regression, $v(S)$ may be the out-of-sample $R^2$ (Bell et al., 2023), the KL-divergence–based pseudo- $R^2$ for GLMs (Acemoglu et al., 2 Jan 2026), or the prediction itself for instance-level attribution (Joseph, 2019).

2. Linear, Generalized Linear, and Structural Models

In least-squares regression and GLMs, Shapley value regression decomposes variance explained or deviance into additive feature components. In the classical OLS case, $R^2(S)$ denotes the out-of-sample $R^2$ of the submodel using features in $S$ only, and the marginal increase in $R^2$ when adding feature $j$ is attributed according to the Shapley formula above (Bell et al., 2023). For GLMs, the recommended value function is the normalized KL-divergence $R^2_{KL}$ , interpreted as the fraction of total possible KL-explanatory power explained by the model, ensuring nonnegativity, monotonicity, and normalization (Acemoglu et al., 2 Jan 2026).

Structural models and group-level counterfactual decompositions employ group Shapley values, extending classical feature-level attribution to importance decomposition over parameter blocks. These group Shapley values are formulated either by subset-averaging or, equivalently, as the solution to a constrained weighted least squares (WLS) regression, providing closed-form, fair decompositions for arbitrary groupings (Kwon et al., 2024).

3. Efficient Estimation: Algorithms and Computational Innovations

The exponential subset complexity of exact Shapley computation motivates numerous algorithmic strategies:

Permutation-based Monte Carlo: Randomly samples feature orders and averages marginal lifts (Bell et al., 2023).
Variance-reduction techniques: Antithetic sampling (including permutations and their reverses) and Quasi-Monte Carlo approaches (e.g., argsort of Sobol’ points for permutation sampling) reduce estimator variance (Bell et al., 2023, Covert et al., 2020).
Linear-algebraic optimizations for OLS (LS-SPA): High-dimensional efficiency is achieved by leveraging a single QR or Cholesky decomposition for the entire training data. Per-permutation update chains are solved in $O(d^3)$ time, and compressed representations are used for test data. This makes full-resolution computation practical for thousands of features and millions of samples, yielding orders-of-magnitude speedup over naïve approach (Bell et al., 2023).
Weighted least squares regression (KernelSHAP, Leverage SHAP, unified sketch-based methods): The Shapley value vector is the solution to a sum-to-constraint weighted least squares regression, with appropriate kernel weights enforcing the Shapley axioms (Covert et al., 2020, Musco et al., 2024, Chen et al., 5 Jun 2025). Advanced row-sampling schemes (e.g., leverage-score sampling, bucketized subsets, or modified weights) offer improved convergence and provable sample complexity, notably $O(d\log d)$ for leverage-based approaches.
Regression-adjusted Monte Carlo (RegMSR): Combines Monte Carlo estimates for the residual (after subtracting a fitted surrogate model) with closed-form calculation for the surrogate, achieving unbiased, low-variance estimation and supporting any function family with efficiently computable probabilistic values (e.g., trees via path decompositions) (Witter et al., 13 Jun 2025).

4. Conditional and Local Shapley Value Regression

Conditional Shapley values explain predictions at the instance level by solving for the “local” attributions such that

$f(x^*) = \phi_0 + \sum_{j=1}^d \phi_j(x^*).$

Estimation requires the conditional expectation $v(S;x^*) = \mathbb{E}[f(X)|X_S = x_S^*]$ , which is computationally expensive. Regression-based approaches replace conditional sampling with supervised regression of $v(S;x_S)$ on $x_S$ , proceeding via

Separate-model regression: Fitting one regression per coalition (Olsen et al., 2023).
Surrogate-model regression: A single model with masked inputs encoding coalition membership (Olsen et al., 2023). Recent work provides efficient linear-algebraic approaches to simultaneously approximate all $2^p$ conditional submodels for linear (or polynomial/spline) regressors using a block-diagonal “soft constraint” precision matrix and sparse Cholesky factorization, reducing computation from hours to minutes with negligible accuracy loss compared to full submodel enumeration (Aanes, 25 Apr 2025).

5. Learning Shapley Prediction Rules

Prediction via Shapley Value Regression (ViaSHAP) is a paradigm in which models are trained to produce their own feature-wise Shapley values as outputs. The prediction is literally the sum over these per-feature contributions, guaranteeing the local accuracy property. This approach leverages universal approximation architectures (MLP or Kolmogorov-Arnold Networks) and is trained with a dual-term loss combining standard prediction error and Shapley value consistency (i.e., efficiency constraint on masked coalitions). ViaSHAP achieves state-of-the-art predictive and explanation fidelity while offering inference-time efficiency, with a single forward evaluation producing both prediction and full Shapley attribution (Alkhatib et al., 7 May 2025).

6. Extensions: Bayesian, Nonparametric, and Robust Treatment

Bayesian models such as Bayesian Additive Regression Trees (BART) allow closed-form calculation of Shapley effects for high-dimensional, potentially nonlinear regression functions. Posterior consistency can be established under standard sparsity and design assumptions, and scalable algorithms exploit piecewise constant conditional means on tree leaves to compute Shapley estimates for $p\sim 500$ dimensions (Horiguchi et al., 2023).

Robustness in the presence of concept drift and multicollinearity is addressed by:

Local Shapley-based feature selection robust against concept shift, using groupwise partitioning of error types (over/under/correctly predicted) to identify and eliminate features whose local contributions systematically degrade under shift (Sebastián et al., 2023).
Covariance-adjusted (whitened) Shapley values: Multicollinearity correction is implemented by linearly decorrelating absent features with respect to those in the coalition under consideration, replacing marginal draws with adjusted ones to ensure unbiasedness with respect to feature dependencies (Basu et al., 2020).

7. Practical Applications and Interpretive Tables

In applied contexts such as marketing mix modeling, Shapley value regression provides a fully interpretable decomposition of aggregate $R^2$ among correlated predictors (e.g., channel partners), supporting non-negative shares and resolving multicollinearity for business-relevant attribution tables. Adjusted regression coefficients proportional to Shapley shares guarantee recapitulation of the total signal and maintain interpretability, outperforming alternative post-hoc coefficient adjustment schemes in terms of attribution clarity (Tang et al., 2024).

Application	Value Function	Computational Approach
OLS / Ridge Regression	$R^2$ , $R^2(S)$	LS-SPA, QR linear-algebra
GLM (Poisson/Binomial/Etc.)	KL-based $R^2_{KL}$	Submodel enumerations
Feature Group Attribution	$v(S)$	WLS, group Shapley table
Instance Prediction	$f(x)$	Surrogate regression, ViaSHAP
Large p, nonlinear	Shapley Effects	BART, post-hoc treeexact

8. Theoretical Properties, Consistency, and Convergence

Recent developments provide a unified theoretical framework for the entire regression-based Shapley value estimation landscape (Chen et al., 5 Jun 2025, Musco et al., 2024). Key results include:

Equivalence between the combinatorial Shapley definition and sum-to-constraint weighted least squares regression.
Non-asymptotic sample complexity and error bounds for leverage-score–based designs and row sampling.
Optimal variance reduction via paired (antithetic) sampling, as explained by odd/even decompositions of the set function—the Shapley value depends only on its odd part, and paired sampling orthogonalizes away the even part (Fumagalli et al., 1 Feb 2026).
Modularity to generalize from classical Shapley values to arbitrary probabilistic values (e.g., beta-weighted, Banzhaf) by altering regression weights and constraints (Witter et al., 13 Jun 2025).
Diagnostic tools for uncertainty quantification and convergence stopping based on the explicit CLT of the regression estimator (Covert et al., 2020).

References

"Efficient Shapley Performance Attribution for Least-Squares Regression" (Bell et al., 2023)
"Prediction via Shapley Value Regression" (Alkhatib et al., 7 May 2025)
"Variable Importance in Generalized Linear Models -- A Unifying View Using Shapley Values" (Acemoglu et al., 2 Jan 2026)
"Group Shapley Value and Counterfactual Simulations in a Structural Model" (Kwon et al., 2024)
"Fast approximative estimation of conditional Shapley values when using a linear regression model or a polynomial regression model" (Aanes, 25 Apr 2025)
"A Comparative Study of Methods for Estimating Conditional Shapley Values and When to Use Them" (Olsen et al., 2023)
"Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values" (Witter et al., 13 Jun 2025)
"A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values" (Chen et al., 5 Jun 2025)
"An Odd Estimator for Shapley Values" (Fumagalli et al., 1 Feb 2026)
"Provably Accurate Shapley Value Estimation via Leverage Score Sampling" (Musco et al., 2024)
"Efficient computation and analysis of distributional Shapley values" (Kwon et al., 2020)
"Establishing Shapley Effects in Big-Data Emulation and Regression Settings using Bayesian Additive Regression Trees" (Horiguchi et al., 2023)
"Multicollinearity Correction and Combined Feature Effect in Shapley Values" (Basu et al., 2020)
"Quantifying Marketing Performance at Channel-Partner Level by Using Marketing Mix Modeling (MMM) and Shapley Value Regression" (Tang et al., 2024)
"From interpretability to inference: an estimation framework for universal approximators" (Joseph, 2019)