Shapley-Owen Attribution

Updated 26 February 2026

Shapley-Owen Attribution is a generalization of the Shapley value that allocates contributions to individual features and groups with axiomatic fairness and combinatorial weights.
It employs advanced techniques such as Möbius inversion, spectral decomposition, and hierarchical strategies to efficiently compute attributions in structured domains.
The framework is applied in sensitivity analysis, federated learning, and reinforcement learning, offering scalable, robust, and rigorously convergent insights.

Shapley-Owen Attribution is a generalization of cooperative game-theoretic attribution methods, uniquely extending Shapley value–based decomposition to systematically allocate contributions not only to individuals (features, variables, model components), but to arbitrary coalitions, groups, and structured subsets. Underpinned by axiomatic fairness, spectral decomposition, and scalable computation via hierarchical or group-wise strategies, Shapley-Owen attribution has become pivotal in model interpretability, global sensitivity analysis, federated optimization, hierarchical explainability, and reinforcement learning with structured action spaces. It is characterized formally by unique combinatorial weights, coalition-structure awareness, and rigorous convergence guarantees.

1. Formal Definition and Axiomatic Characterization

Let $D = \{1,\ldots, d\}$ denote a finite set of "players" (features, model inputs, or other entities), and let $\mathrm{val} : 2^D \to \mathbb{R}$ be any set function with $\mathrm{val}(\varnothing) = 0$ . For any nonempty coalition $u \subseteq D$ , the Shapley–Owen interaction index (or "Shapley–Owen effect" for $u$ ) is defined as

$Sh_u(\mathrm{val}) = \frac{1}{d - |u| + 1} \sum_{v \subseteq D \setminus u} \binom{d - |u|}{|v|}^{-1} \sum_{w \subseteq u} (-1)^{|u| - |w|} \; \mathrm{val}(v \cup w).$

For singleton $|u| = 1$ , this reduces to the classic Shapley value:

$Sh_{\{i\}}(\mathrm{val}) = \frac{1}{d} \sum_{v \subseteq D \setminus \{i\}} \binom{d-1}{|v|}^{-1} \left[\mathrm{val}(v \cup \{i\}) - \mathrm{val}(v)\right].$

These indices satisfy the following axioms:

Additivity: $f_u(\mathrm{val}_1 + \mathrm{val}_2) = f_u(\mathrm{val}_1) + f_u(\mathrm{val}_2)$ .
Dummy (Null-player): If $\mathrm{val}(S \cup \{i\}) = \mathrm{val}(S)$ for all $S$ , then $f_i = 0$ and $f_{S \cup \{i\}} = 0$ for all higher order indices.
Symmetry: Permuting feature indices permutes attributions identically.
Recursivity (Balanced contributions): Interactions partition coherently when any player is held fixed.

In sensitivity-analysis and variance-decomposition, $\mathrm{val}(u) = \mathrm{Var}[E[M(X) \mid X_u]]$ quantifies the proportion of output variance explained by the subset $X_u$ . Adhering to these axioms uniquely determines all Shapley-Owen effects (Ruess, 2024).

2. Algorithmic Computation: Permutation, Möbius, and Spectral Strategies

Classical computation of Shapley (and Shapley–Owen) effects involves enumeration over $d!$ permutations, which is infeasible in moderate to large dimensions. Combinatorial reformulations and spectral decompositions have enabled scalable alternatives.

Möbius-Inclusion-Exclusion Formulation: For input group $S \subseteq D$ , define the Möbius inverse $m(\alpha)$ of $\mathrm{val}$ by

$m(\alpha) = \sum_{\beta \subseteq \alpha} (-1)^{|\alpha| - |\beta|} \mathrm{val}(\beta).$

The Shapley–Owen index of $S$ is then

$\Phi_S = \sum_{\beta: S \subseteq \beta \subseteq D} \frac{m(\beta)}{|\beta| - |S| + 1}.$

This approach reduces computational complexity from $O(d!\cdot d)$ to $O(2^d)$ , with simultaneous estimation of all marginal and interaction effects in a single pass using "pick-n-freeze" sampling and quasi-Monte Carlo (Plischke et al., 2020).

Spectral Decomposition via Polynomial Chaos Expansion (PCE): For a model $M(X)$ admitting an $L^2$ -orthonormal PCE,

$M(X) = \sum_{\alpha \in \mathbb{N}^d} y_\alpha \Psi_\alpha(X),$

Shapley–Owen indices decompose as

$Sh_u(\mathrm{val}) = \sum_{\alpha \ne 0} y_\alpha^2 \; Sh_u\left(\mathrm{val}^\alpha\right),$

where $Sh_u(\mathrm{val}^\alpha)$ is elementary for the monomial indexed by $\alpha$ and can be precomputed. The model-specific part consists of the $y_\alpha^2$ coefficients, which are sparse in practice (Ruess, 2024).

A two-stage algorithm—(I) sparse PCE construction by greedy regression or quadrature until the residual variance falls below a set tolerance, and (II) application of the precomputed Shapley-Owen tables—yields practical computation with quantifiable error bounds.

3. Hierarchical and Structured Attribution: The Owen Value and O-Shap

While the standard Shapley value assumes player independence and unstructured feature sets, many real-world domains present natural groupings or hierarchies (e.g., superpixels in images, syntactic segments in language). The Owen value generalizes Shapley to partitioned sets $N = \bigcup_k G_k$ , supporting symmetry, dummy, and efficiency axioms within groups.

Formally, for $i \in G_k$ of a group partition $\mathcal{G}$ , the Owen value is:

$\varphi_i^O(v) = \sum_{T \subseteq \mathcal{G} \setminus \{G_k\}} \frac{|T|!(K-|T|-1)!}{K!} \sum_{S \subseteq G_k \setminus \{i\}} \frac{|S|!(|G_k| - |S| - 1)!}{|G_k|!} [v(\bigcup T \cup S \cup \{i\}) - v(\bigcup T \cup S)].$

In XAI, O-Shap leverages this formalism with a segmentation hierarchy that satisfies the $\mathcal{T}$ -property: semantic consistency of segment importance across hierarchy levels. The O-Shap pipeline applies edge-based segmentation (Canny) and semantics-aware graph merging at each level, ensuring top-down invariance and preventing "semantic flipping." This structure reduces required model evaluations from exponential in the feature count to polynomial in the number of groups per hierarchy level, dramatically improving scalability and attribution coherence (Zhou et al., 19 Feb 2026).

Empirical results on imaging (e.g., ImageNet-S50, PASCAL-VOC-2012) and tabular (Adult Census Income) datasets show O-Shap outperforms conventional SHAP and hierarchical baselines (AA-SHAP, SLIC-SHAP) in attribution precision, semantic coherence, energy-based pointing game (EBPG), mIoU, and computational cost.

4. Applications: Sensitivity Analysis, Federated Learning, and RL

Global Sensitivity and Variance Decomposition

Shapley–Owen attribution extends variance-based sensitivity analysis from main effects to all possible higher-order (group, interaction) effects. For black-box models with random inputs, these indices attribute total output variance $\mathrm{Var}(Y)$ to any subset $S\subseteq D$ according to how much variance is explained by knowing $X_S$ :

$\Phi_S = \sum_{T \subseteq D \setminus S} \frac{|T|! (d - |T| - |S|)!}{d!} [V(T \cup S) - V(T)],$

with $V(S) := \mathrm{Var}[E(Y | X_S)]$ or its superset analogue. All indices are computable in one pass with efficient sample design (Plischke et al., 2020).

Federated Learning (FedOwen)

In federated learning, estimating individual client contributions to global model performance is critical for reward distribution and accelerated convergence. The Shapley value is grounded but intractable at federation scale. FedOwen introduces an unbiased Owen sampling estimator, pairing antithetic twins to reduce variance, discretizing inclusion probability ( $Q$ levels), and employing adaptive client selection via multi-armed bandits. With $Q=2-5$ and $M=4-16$ samples per round, FedOwen achieves up to 23% higher final accuracy under fixed budget constraints compared to Monte Carlo Shapley or alternative methods (KhademSohi et al., 28 Aug 2025).

Hierarchical RL Credit Assignment (OSPO)

Sequence-level generative RL (notably in LLMs) suffers from credit assignment gaps. Owen-Shapley Policy Optimization (OSPO) extends Shapley attribution to contiguous blocks (phrases, sentences) in generated sequences, requiring only $O(n w_\mathrm{max})$ reward queries. This enables reward redistributions that preserve optimal policy invariance (via potential-based shaping), robust accelerating optimization, and improved generalization, especially in out-of-distribution retriever settings (Nath et al., 13 Jan 2026).

5. Error Analysis, Convergence Guarantees, and Practical Limits

The error in sparse, truncated PCE-based Shapley–Owen attribution is controlled by the residual variance $\epsilon_\ell$ :

$\|M - R\|_{L^2}^2 = \epsilon_\ell, \quad |\sigma_u^2 - \widehat{\sigma}_u^2| \leq \epsilon_\ell,$

and

$|Sh_u - \widehat{Sh}_u| \leq \kappa_u \epsilon_\ell, \quad \lim_{\ell \to \infty} \widehat{Sh}_u = Sh_u,$

with $\kappa_u$ determined by $(u, d)$ (Ruess, 2024). In sample-based estimators, block sharing induces positive estimator correlation and variance reduction. Memory scales exponentially only with the largest group order treated (or hierarchy breadth), making high-dimensional analysis tractable for $k \lesssim 20$ by exploiting problem sparsity (Plischke et al., 2020).

6. Illustrative Examples and Empirical Insights

Explicit enumeration (e.g., for $M(X_1,X_2,X_3)=(X_1\wedge X_2)\vee(\neg X_1\wedge X_3)$ ) reveals how individual and pairwise Shapley–Owen attributions partition variance fairly: $Sh_1=1/32$ , $Sh_2=Sh_3=3/32$ , with the two-factor interactions $Sh_{1,2}=Sh_{1,3}=1/16$ , $Sh_{2,3}=0$ (Ruess, 2024).

Application studies report:

O-Shap on ImageNet-S50: EBPG = 0.5692 for O-Shap vs 0.5659 for SHAP; mIoU improvement from 0.2780 (SHAP) to 0.3375 (O-Shap); and reduction in execution time (e.g., 2.52s for O-Shap vs 3.65s for SHAP on ResNet50/224×224).
FedOwen in federated learning settings: Up to +23% improved accuracy under fixed utility call budgets, stable under parameter choices, and resilient to client selection bias.
OSPO in generative RL: On Amazon ESCI (NDCG=0.522 for OSPO vs 0.418 for GRPO), robust to retriever shifts, and with more stable, informative responses.

7. Limitations, Best Practices, and Domain Recommendations

Hierarchy/grouping selection is critical: Poor or appearance-based segmentation in O-Shap degrades the $\mathcal{T}$ -property, masking interactions or inflating artifacts (Zhou et al., 19 Feb 2026).
Computational overhead is dominated by the breadth of hierarchy/group structure rather than raw feature count: For $\sim10$ groups per $4-5$ levels, O-Shap and related methods run in $O(N^2)$ – $O(N^3)$ time.
Sample-based methods scale to $k \sim 20$ : Further scaling requires focusing on low-order interactions or exploiting problem sparsity (Plischke et al., 2020).
Parameter tuning (e.g., $Q$ , $M$ , truncation tolerance $\eta$ , confidence weights) in federated/online settings is necessary for maintaining performance and fairness (KhademSohi et al., 28 Aug 2025).

In summary, Shapley–Owen attribution offers a unique, axiomatic, and computationally viable framework for group and structured feature importance, undergirded by spectral decompositions and scalable sampling. It provides theoretical and empirical guarantees for meaningful decomposition in sensitivity analysis, interpretable machine learning, federated settings, and credit assignment in hierarchical or sequential domains. Its success depends critically on principled group/hierarchy construction, judicious use of computational approximations, and application-aligned parameterization.