Shapley Value-Based Decomposition

Updated 5 January 2026

Shapley value-based decomposition methods are approaches that leverage cooperative game theory to fairly attribute system-level quantities among components.
They reduce computational complexity by decomposing high-dimensional problems into tractable subgames, as demonstrated in multiterminal data compression.
These methods are widely applied in machine learning, econometrics, and portfolio attribution, providing fair, interpretable, and scalable model explanations.

A Shapley value-based decomposition method is any approach that leverages the axiomatic, coalition-based Shapley value from cooperative game theory to obtain additive, fair, and interpretable attributions of a system-level quantity (e.g., cost, utility, explained variance) among components (players, features, data points, subproblems). Such methods are pervasive across information theory, machine learning interpretability, econometrics, portfolio attribution, and combinatorial optimization. This article provides a rigorous survey of general theory, representative methodologies, key structural and computational insights, and domain-specific applications, with an emphasis on recent advances in decomposition techniques as exemplified by multiterminal data compression and tractable variants in high-dimensional settings.

1. Theoretical Foundation: Shapley Value and Decomposition Principles

The Shapley value is defined for a cooperative game specified by a finite ground set of players $N=\{1,\dots,n\}$ and a characteristic function $v:2^N\to\mathbb{R}$ assigning value to each coalition. The canonical Shapley value for player $i$ is given by:

$\phi_i(v) = \sum_{S\subseteq N\setminus\{i\}} \frac{|S|!\,(n-|S|-1)!}{n!} [v(S\cup \{i\}) - v(S)].$

This allocation uniquely satisfies efficiency ( $\sum_i \phi_i = v(N)$ ), symmetry, dummy, and additivity axioms.

Shapley value-based decomposition methods leverage this allocation at the level of the grand coalition (full system) to enforce additive fairness. For sub-additive or decomposable problems, the Shapley formula often admits further structure, allowing the overall decomposition to inherit favorable properties of the underlying system, such as modular independence.

2. Multiterminal Data Compression: Entropy-Game Decomposition

In multiterminal source coding, the Slepian–Wolf region $R(V)$ for sources $Z_V=(Z_i: i\in V)$ is the polymatroid base polyhedron of the entropy function $H$ , representing all achievable rate vectors for lossless recovery. This setup yields a convex cooperative game $(V, v)$ with $v(S) = H(S)$ for coalitions $S$ , whose core corresponds to all feasible allocations, and whose Shapley value $\phi_i(v)$ uniquely determines a symmetrically fair achievable rate vector lying in the core (Ding et al., 2018).

A key structural result is the decomposition theorem for entropy games:

If the sources factorize into $k$ independent clusters $P = \{C_1,\dots, C_k\}$ such that $H(V) = \sum_{j=1}^k H(C_j)$ , then both the core and the Shapley value decompose:

$\phi(V) = \bigoplus_{j=1}^k \phi(C_j),$

with each $\phi(C_j)$ computed only for the subgame $(C_j, H|_{2^{C_j}})$ .

This reduces the exponential cost $O(2^n)$ of a direct Shapley computation to $O\big((n/m)2^m\big)$ , where $m = \max_j |C_j|$ , yielding orders-of-magnitude complexity reduction when $m \ll n$ . Determining the finest decomposable partition $P^*$ can be achieved in $O(n^2)$ entropy queries. Empirical studies on synthetic wireless sensor networks confirm the drastic efficiency gains, with the decomposition method remaining accurate and scalable as problem size grows (Ding et al., 2018).

3. Shapley-based Decompositions in Machine Learning and Data Systems

Variants and Algorithmic Strategies

Numerous machine learning valuation, attribution, and interpretability tasks employ Shapley decompositions with domain-specific structure, leading to specialized methods:

Absolute Shapley Value: For data valuation when utility functions allow negative marginal contributions, the traditional Shapley value (ORI) can be contrasted with Zero Shapley (clipping negatives) and Absolute Shapley (using $|\Delta|$ ). While only ORI satisfies all classical axioms, the Absolute variant, though violating efficiency/additivity, provides a magnitude-centric ranking more suitable for identifying highly influential or deleterious points; in experiments on the Iris dataset, Absolute Shapley yielded clearer separation between influential and low-value samples (Liu, 2020).
Independent Utility Decomposition: In large data assemblage settings with additive non-interacting utility, the Shapley value for each owner reduces to a sum over their recordwise Shapley contributions. This structure enables decomposition into independent small subproblems, permitting scalable, exact computation in essentially linear time when records are synthesizable from few owners (Luo et al., 2022).
DU-Shapley Proxy: When utility depends only on the dataset size, the $2^n$ -term Shapley sum simplifies to a closed-form discrete-uniform sum over possible coalition sizes, dramatically accelerating computation without loss of accuracy (for large $n$ ) compared to MC-based estimators (Garrido-Lucero et al., 2023).

Feature Importance, Attribution, and Variance Decomposition

Shapley Value for Dependent Inputs: The classical Sobol ANOVA decomposition fails under input dependence; the Shapley-variance effect, defined as $\phi_j = \sum_{S \subseteq N\setminus\{j\}}$ (Shapley weight) $\cdot$ $[ \Var(\mathbb{E}[f(x)|x_S\cup\{j\}]) - \Var(\mathbb{E}[f(x)|x_S]) ]$, provides a rigorously justified, non-negative, and total-variance-summing variable-importance metric, extending gracefully to general distributions and complex dependencies (Owen et al., 2016).
R-Squared (Explained Variance) Decomposition: Shapley-variance decompositions for $R^2$ (in both regression and machine learning settings) allocate the explained variance of a model among predictors/features. For any variance-explained function, the Shapley decomposition is uniquely fair, robust to collinearity, and model-agnostic. Recent advances allow efficient calculation using SHAP frameworks and support confidence intervals via central limit theory under pseudo-elliptical data (Redell, 2019, Fryer et al., 2020).

4. Algorithmic Complexity and Approximation

The primary obstruction to the practical use of Shapley decompositions is exponential computational complexity in the number of features/components ( $O(2^n)$ subsets or $n!$ permutations). Domain-specific independence or modularity conditions (as in entropy games or independent-utility dataset valuation) enable exact decompositions into many small subgames, rendering the computation feasible even for thousands of components (Ding et al., 2018, Luo et al., 2022).

For intractable general cases, polynomial-time approximation techniques are critical:

Randomized permutation sampling with variance or FPRAS (fully polynomial random approximation scheme) guarantees are standard in both portfolio attribution and allocation problems (Moehle et al., 2021, Lupia et al., 2017).
ANOVA-based and functional surrogate models (PDD-SHAP) allow rapid amortized inference for Shapley values by learning low-order function expansions with theoretically controlled error; this is practical for black-box model explanation when high-order interactions are limited (Gevaert et al., 2022).

5. Generalizations: Hodge, Group, and Fairness Decompositions

Recent research extends traditional scalar Shapley decompositions to richer structures:

Hodge Decomposition: The Shapley value is the value at the grand coalition of each player's component game via combinatorial Hodge-theoretic projection in the coalition hypercube; this perspective allows axiomatization of component functions (not just point-attributions), solutions for restricted cooperation, weights, and a least-squares/Markov chain path-integral interpretation (Stern et al., 2017, Lim, 2021).
Group Shapley Values: In settings with natural groupings of parameters or features (e.g., structural models in economics), the group Shapley framework decomposes total change among pre-specified groups, yielding an efficiency-respecting, unique, and interpretable allocation. Group-level values solve a constrained weighted least squares problem and connect to regression "importance tables" (Kwon et al., 2024).
Spectral Approaches and Polynomial Chaos: For variance-based Shapley-Owen effects, spectral representations using polynomial chaos expansions allow separation of model-specific and model-independent computation, with sparse approximation and error control, further elevating scalability for high-dimensional sensitivity analysis (Ruess, 2024).

6. Domain-Specific Applications and Illustrative Examples

Phylogenetic Trees: In biodiversity, the Shapley value of a phylogenetic diversity game on an unrooted tree is a linear function of edge lengths (via a split-count matrix); although theoretically interpretable as marginal contribution, it may not always yield optimal or robust prioritizations for maximizing total diversity, highlighting limitations when the decomposition structure misaligns with practical selection tasks (Wicke et al., 2017).
Information Decomposition: Modern approaches to splitting mutual information into uniquely-attributed, nonnegative components utilize Shapley-style values on posets/Boolean algebras of predictors. Random-order values, sharing values, and hierarchical (Faigle–Kern) values offer flexible, axiomatic decompositions, though computational complexity can be prohibitive beyond moderate size (Kroupa et al., 2022).

7. Conclusion and Outlook

Shapley value-based decomposition methods provide a principled, axiomatically unique framework for allocating credit, cost, or utility in cooperative systems across information theory, statistics, economics, machine learning, and combinatorial optimization. The recent literature has furnished sophisticated, structure-exploiting decompositions that vastly improve tractability, fairness, and interpretability for high-dimensional and practical systems. Ongoing challenges concern further scaling via finer-grained surrogates, quantifying uncertainty (confidence intervals), extensions to nonadditive or hierarchical domains, and domain-informed regularization in group or fairness-constrained decompositions.

Key references: (Ding et al., 2018, Owen et al., 2016, Liu, 2020, Luo et al., 2022, Kwon et al., 2024, Wicke et al., 2017, Stern et al., 2017, Garrido-Lucero et al., 2023, Gevaert et al., 2022, Michiels et al., 2023, Fryer et al., 2020, Redell, 2019, Ruess, 2024, Kroupa et al., 2022).