Shapley-based Approx Attributions

Updated 20 May 2026

Shapley-based approximate attributions are methods that efficiently estimate individual feature contributions by approximating the Shapley value, thus overcoming exponential computation challenges.
They leverage techniques like permutation sampling, regression surrogates (e.g., Kernel SHAP), and structure exploitation to reduce computational costs while maintaining theoretical fairness guarantees.
These methods are practically applied in high-dimensional models and instance-level attributions, providing actionable insights for model explanation and fairness evaluation.

Shapley-based approximate attributions are a class of methods that estimate individual feature, instance, or source contributions to the output of a model according to the unique allocation defined by the Shapley value, but at a computational cost significantly reduced from exact Shapley value computation. These approximations are crucial in high-dimensional or expensive-to-evaluate settings, where the combinatorial explosion of the Shapley value formula (requiring $2^n$ model evaluations for $n$ features) renders direct computation infeasible. Approximate methods leverage sampling, regression, structure exploitation, and algorithmic innovations to achieve accurate, principled, and statistically controlled attributions in practice.

1. Formal Basis of Shapley Attributions and the Computational Challenge

For a set of $n$ features $F = \{1,\dots, n\}$ , the Shapley value for feature $i$ given a value function $v:2^F\to\mathbb{R}$ is

$\phi_i = \sum_{S\subseteq F\setminus\{i\}} \frac{|S|! \,(n-|S|-1)!}{n!} [v(S\cup\{i\})-v(S)]$

This formulation uniquely satisfies the axioms of efficiency, symmetry, dummy (null feature), and linearity. The exact computation of $\phi_i$ is exponential in $n$ , both in the number of function evaluations and in arithmetic operations, making it intractable for all but very small $n$ (Moehle et al., 2021, Musco et al., 2024).

Approximation is therefore necessary. Approximations can be grouped into (i) randomized sampling—either over permutations or over subsets, (ii) regression-based surrogates (notably Kernel SHAP and related weighted least-squares approaches), and (iii) structure-exploiting algorithms for special classes of $n$ 0 (e.g., linear models, tree ensembles).

2. Permutation and Coalition Sampling Schemes

One widespread family of Shapley approximators uses permutation or coalition sampling. Permutation-based sampling ("Monte Carlo over permutations") draws $n$ 1 random orderings of $n$ 2 and, for each, computes the marginal contribution of each feature at its point of entry. Averaging these yields an unbiased estimator of the true $n$ 3. The total complexity is $n$ 4 model evaluations (Moehle et al., 2021, Mayer et al., 18 Aug 2025).

Alternatively, coalition (lift) sampling draws random subsets $n$ 5 with feature $n$ 6 absent, according to the Shapley weighting, and estimates $n$ 7 by Monte Carlo averaging the marginal contributions $n$ 8; again, the expectation is unbiased and satisfies the full attribution property per sample (Moehle et al., 2021).

Paired sampling—drawing subsets in $n$ 9 pairs—can halve asymptotic variance and, in purely additive or second-order interaction settings, recover the exact Shapley values in a single sample (Mayer et al., 18 Aug 2025, Fumagalli et al., 1 Feb 2026). Stratified sampling across subset sizes and importance weighting further improves estimation accuracy by balancing sampling noise across all subset cardinalities.

Statistical efficiency is governed by Hoeffding, Bernstein, or Chebyshev bounds: for marginal contribution range $n$ 0, to achieve $n$ 1 with probability $n$ 2, the required $n$ 3 is $n$ 4 (Moehle et al., 2021, Zhou et al., 2022).

3. Regression-based and Leverage Score Methods: Kernel SHAP and Leverage SHAP

Regression-based methods such as Kernel SHAP recast the attribution task as a weighted least-squares problem, fitting a linear model $n$ 5 over random subsets $n$ 6 as binary vectors $n$ 7 (Musco et al., 2024, Merrick et al., 2019, Mayer et al., 18 Aug 2025). The Shapley kernel assigns weights $n$ 8 proportional to $n$ 9. This approach supports efficient least-squares solution with $F = \{1,\dots, n\}$ 0 for $F = \{1,\dots, n\}$ 1 subsampled coalitions. Kernel SHAP is highly effective, but until recently lacked finite-sample guarantees.

Leverage SHAP improves Kernel SHAP by showing that the optimal sampling distribution is proportional to the leverage scores of the regression design matrix rows, resulting in $F = \{1,\dots, n\}$ 2 sample complexity to achieve $F = \{1,\dots, n\}$ 3-norm error at most $F = \{1,\dots, n\}$ 4 with high probability (assuming a fit parameter $F = \{1,\dots, n\}$ 5 close to zero) (Musco et al., 2024). In this approach, all subset sizes are sampled proportionally, and paired sampling is again used for additional variance reduction.

Table: Sampling complexity of major regression-based SHAP approximators

Algorithm	Model Calls Needed	Error Guarantee (for $F = \{1,\dots, n\}$ 6)
Kernel SHAP	$F = \{1,\dots, n\}$ 7	No non-asymptotic bound
Leverage SHAP	$F = \{1,\dots, n\}$ 8	$F = \{1,\dots, n\}$ 9 loss $i$ 0

Empirically, Leverage SHAP consistently achieves lower error per sample compared to Kernel SHAP on standard datasets (Musco et al., 2024).

4. Fairness, Confidence Intervals, and Robustness

Approximate Shapley attributions can fail to inherit the full fairness guarantees of the exact value. Probably-approximate fairness formalizes this: for a random estimate $i$ 1, null-feature, symmetry, and desirability axioms may only hold up to additive and multiplicative error with high probability $i$ 2 (Zhou et al., 2022). Fidelity score $i$ 3 controls the risk that any fairness axiom is violated.

Confidence intervals can be constructed by bootstrapping or by the central limit theorem, treating estimates across independent sampling runs as i.i.d. (Merrick et al., 2019). Greedy active estimation (GAE) maximizes the minimum fidelity score across features, optimally allocating the query budget to hardest-to-estimate Shapley values (Zhou et al., 2022).

In the context of instance-level attribution, the Shapley value attains superior sign-robustness (i.e., probability the sign of the contribution is consistent under dataset resampling) to leave-one-out methods (Wang et al., 2024).

5. Advanced and Domain-Specific Approximations

Several domain-adapted approximators have been devised:

DeepSHAP employs model-specific local propagation rules (e.g., DeepLIFT for neural nets, TreeSHAP for ensemble trees) composed layerwise to yield a fast, full attribution with per-baseline efficiency (Chen et al., 2021, Chen et al., 2019).
ViaSHAP amortizes Shapley-value computation by regressing, during model training, from inputs to a prediction and associated Shapley attributions, so that inference time is trivial (Alkhatib et al., 7 May 2025).
EmSHAP uses energy-based generative models and a GRU proposal network to estimate all relevant conditional distributions required for observational Shapley values, with theoretical $i$ 4 convergence per sample and empirical performance exceeding VAE and KernelSHAP surrogates (Lu et al., 2024).
OddSHAP isolates the Fourier-odd component of the set function, showing that only the odd subspace affects Shapley values and leveraging proxy-models and odd-Fourier regression for consistent, variance-reduced estimation (Fumagalli et al., 1 Feb 2026).

In model classes with special structure—linear models, tree ensembles—exact Shapley values can be computed in polynomial or linear time via closed-form or dynamic programming (Bell et al., 2023, Muschalik et al., 2024).

6. Extensions: Interactions, Group Attribution, and New Objectives

Shapley-based attributions extend naturally to higher-order group (interaction) indices. Faith-Shap, for example, formulates the unique polynomial approximation satisfying multilinear efficiency and interaction-symmetry axioms (Tsai et al., 2022), and the shapiq library provides efficient algorithms for pairwise and $i$ 5-wise interactions, combining stratified Monte Carlo and regression tricks (Muschalik et al., 2024).

WeightedSHAP generalizes Shapley by replacing the uniform averaging over subset sizes with a learnable weighting, optimizing for downstream criteria such as rapid model recapitulation, and can give more faithful or less noisy attributions when contributions shift significantly with subset size (Kwon et al., 2022).

On-manifold Shapley, using Aumann–Shapley line integrals along Wasserstein-2 geodesics, addresses off-manifold artifacts of heuristic baselines and yields attributions strictly supported on the data manifold, with closed-form stability bounds (Zhang et al., 5 Mar 2026).

7. Empirical Guidance and Best Practices

Best practices depend on model and computational constraints:

Permutation or lift-sampling is robust and preferred when $i$ 6 is moderate and function evaluation cost is manageable; employ caching and stratification for efficiency (Moehle et al., 2021).
KernelSHAP or Leverage SHAP is recommended for high- $i$ 7 black-box problems, especially as Leverage SHAP offers explicit non-asymptotic error bounds and improved sample efficiency (Musco et al., 2024).
Paired sampling is advised whenever interaction order is low or structure is approximately additive or quadratic (Mayer et al., 18 Aug 2025, Fumagalli et al., 1 Feb 2026).
Domain-specific algorithms (TreeSHAP, DeepSHAP, LS-SPA) are dominant in their native domains due to orders-of-magnitude improvement in runtime and accuracy.
Confidence and fairness: Always report empirical variance or fidelity of $i$ 8 and validate with small- $i$ 9 exact runs when scaling (Zhou et al., 2022).

For source attribution in RAG or explainability in LLMs, kernel-based regression surrogates and windowed local Shapley estimators offer practical trade-offs given the high cost of model calls (Nematov et al., 6 Jul 2025, Naudot et al., 3 Nov 2025).

Overall, the field continues to advance toward sample-optimal, structure-aware, and robust approximations, with rigorous empirical benchmarks and formal error guarantees now guiding method selection and deployment.