Data Shapley Interaction Calculation

Updated 12 October 2025

Data Shapley Interaction Calculation is a method that formally allocates credit to both single data elements and their interactions using Shapley-theoretic foundations.
It utilizes advanced algorithms such as Möbius inversion, sampling, and spectral methods to scale interaction computations with theoretical rigor.
The approach enhances model interpretation, feature selection, and fairness attribution in machine learning, with practical applications in areas like NLP and vision.

Data Shapley Interaction Calculation refers to the formal allocation of credit not just to single data elements (such as features or instances), but to their interactions, when explaining the output or performance of predictive models using Shapley-theoretic methods. Extending the classical Shapley value—rooted in cooperative game theory—to the field of interactions addresses not only the individual “main effects” but also higher-order synergies and redundancies between groups, with growing prominence in sensitivity analysis, explainable machine learning, and fairness attribution. The exponential complexity of evaluating all possible coalitions traditionally hindered practical interaction calculations. Recent methods, however, present both theoretical and algorithmic advances for scaling, axiomatic rigor, and interpretability.

1. Formalism and Theoretical Foundations

The classical Shapley value attributes the total output or performance difference between an empty and a full coalition to each element (feature, data point, or DB tuple), averaging over all possible insertion orders. The Shapley Interaction Index (Dhamdhere et al., 2019) and its refinements generalize this attribution to feature groups of fixed order (e.g., pairs, triplets), quantifying their non-additive “joint effect.”

The Shapley Taylor Interaction Index (Dhamdhere et al., 2019) defines higher-order interaction attributions via discrete derivatives on the function’s power set, mirroring the truncation of a Taylor expansion. For order- $k$ subsets,

$\mathcal{I}(S) = \frac{k}{n} \sum_{T \subseteq N \setminus S} \delta_S F(T) \cdot \frac{1}{\binom{n-1}{|T|}},$

where $\delta_S F(T)$ is a discrete mixed partial derivative analogous to a finite-difference operator, and the summation is over context sets $T$ . The full attribution decomposes as $\sum_{|S| \leq k} \mathcal{I}(S) = F(N) - F(\emptyset)$ . Faith-Shap (Tsai et al., 2022) approaches the same objective via an $\ell$ -order polynomial regression fit, seeking coefficients that minimize the mean squared error over all coalitions, subject to efficiency; its interaction allocation matches unique coefficients of a best-fit, subset-wise polynomial.

Spectral approaches, as introduced for Shapley-Owen effects (Ruess, 28 Sep 2024), consider the model’s polynomial chaos expansion (PCE). Each interaction effect is expressed as a weighted sum over the expansion’s terms: $\operatorname{Sh}_{u}(val^\mathbf{x}) = \sum_{\alpha=1}^\infty y_\alpha^2 \operatorname{Sh}_{u}(val^\alpha),$ where $y_\alpha$ are model-specific coefficients, and $\operatorname{Sh}_{u}(val^\alpha)$ is a model-independent index for basis polynomials supported on $u$ .

2. Algorithms and Computational Scalability

Direct interaction calculation by permutation or subset enumeration is exponential: for $d$ features, $2^d$ subsets exist. Several advances reduce cost:

Möbius Inverse Algorithms: For variance-based sensitivity analysis, (Plischke et al., 2020) computes all Shapley-Owen (interaction) effects for $k$ inputs in $O(2^k)$ value function evaluations, replacing $k!$ permutation enumeration by fast subset-based Möbius inversion. The interaction effect for group $\alpha$ is:

$\Phi_\alpha = \sum_{\beta: \alpha \subseteq \beta} \frac{M(\beta)}{|\beta| - |\alpha| + 1},$

where $M(\cdot)$ are the Möbius inverses of the value function.

Sampling and Monte Carlo: For independent variables, fast pick-freeze (Goda, 2020) or stratified permutation sampling (Gutiérrez et al., 7 Feb 2024) estimate main and interaction Shapley indices with polynomial runtime, using unbiased estimators whose variance decays inversely with sample count.
Surrogate and Projected Models: SHAFF (Bénard et al., 2021) uses random forests, projecting tree predictions onto subspaces for each variable subset, yielding efficient interaction effect estimators even under dependence. Surrogate model-based trees (Zhou et al., 2022) fit local GAMs in tree leaves for accurate conditional expectation estimation, handling interactions more faithfully than marginal approximation.
Restricted Coalition Selection: L-Shapley and C-Shapley (Chen et al., 2018) exploit known feature graph structure, restricting attribution calculation to $k$ -local (and/or connected) coalitions. This achieves linear or quadratic time (in $d$ ) as opposed to exponential in $d$ , with controlled error when conditional independence assumptions are satisfied.
Chain Rule and Contributive Sampling: The SHEAR method (Wang et al., 2022) leverages the so-called Shapley chain rule, which bounds the error in ignoring a feature $j$ when computing the value for feature $i$ by the cross second derivative $\nabla^2_{i,j}f$ . SHEAR selects only O(log $N$ ) “contributive” cooperators per feature, further reducing cost while controlling error.
Quadratic-Time FANOVA Decomposition: For FANOVA Gaussian Processes, exact stochastic Shapley interaction values are computable in $O(d^2)$ via recursive algorithms built on Newton’s identities and elementary symmetric polynomials (Mohammadi et al., 20 Aug 2025). Both main effects and all interaction effects (uncertainty included) are tractable, avoiding combinatorial enumeration.

3. Axiomatic Approaches and Uniqueness

A suite of axioms guides valid interaction attributions:

Linearity: Attributions respect linear combinations of value functions.
Symmetry: Identically-behaving features/groups yield identical shares.
Dummy: Features not affecting the output get zero attribution (including all their containing interactions).
Efficiency: Total attributions across all considered sets sum to the function’s range (e.g., $F(N) - F(\varnothing)$ ).
Interaction Distribution (for interaction indices): Ensures “pure” interactions are solely attributed to their proper group and not spread across lower orders.

Shapley-Taylor (Dhamdhere et al., 2019) and Faith-Shap (Tsai et al., 2022) provide different mechanisms to enforce these axioms. Unlike Shapley interaction from cooperative game theory, which is typically recursive and often fails efficiency, Shapley-Taylor satisfies both efficiency and a specific interaction distribution axiom. Faith-Shap eliminates the need for non-intuitive ad hoc rules by uniquely solving the regression fit matching the function’s coalitional values, strictly under the extended axioms.

4. Sensitivity Analysis and Data Valuation

In global sensitivity analysis, Shapley effects (including interactions) serve to decompose model output variance among sets of inputs—even with input dependencies (Plischke et al., 2020). Shapley-Owen effects extend the decomposition to any input subset, with analytical and sampling-based algorithms available for estimation. Exponential speedups are realized by the Möbius-based methods, and spectral PCE decompositions (Ruess, 28 Sep 2024) further collapse the combinatorial sum into an efficiently computed (and error-bounded) expansion tied to the model’s orthonormal basis representation. These “interaction shares” robustly detect synergies, redundancies, and input dummies.

In data-centric applications, e.g., data marketplaces (Luo et al., 2022), attribute distribution to individual records (or groups thereof) is essential for fair revenue allocation. If the utility function is “independent” (i.e., additive over tuples), exact decomposition into individual and interaction shares is possible via closed forms or polynomial algorithms that exploit minimal synthesis properties. For KNN-model data interactions, specialized recursions (Belaid et al., 2023) reduce pairwise data Shapley interaction determination from exponential to quadratic complexity by leveraging sorted distance-based structure.

5. Practical Applications and Use Cases

Model Interpretation: In NLP, Shapley Taylor interaction indices quantitatively detect syntactic structure and idiom compositionality by measuring not just single-token importance but pairwise synergies or antagonisms (Singhvi et al., 19 Mar 2024). In vision, contiguity and patch-based grouping via C-Shapley reveals coherent, interpretable regions corresponding to semantic objects (Chen et al., 2018).
Feature and Data Selection: Data Shapley interaction calculations efficiently reveal redundant or antagonistic data clusters, informing sample summarization, outlier detection, and acquisition strategies (Belaid et al., 2023).
Fairness and Sensitivity Attribution: Relative importance and “equitable attribution” in terms of Shapley-Owen effects is positioned as the appropriate (and, under standard axioms, only) way to measure fairness, with full separation between model structure (PCE coefficients) and universal model-independent Shapley factors (Ruess, 28 Sep 2024).

Method/Class	Algorithmic Complexity	Handles Interactions?	Suitability Conditions
Möbius Inverse (Plischke et al., 2020)	$O(2^k)$	Yes (all subsets)	Any model, manageable $k$
L-Shapley/C-Shapley (Chen et al., 2018)	$O(k^2\,d)$	Local/Connected	Structured/graph-based data
Faith-Shap (Tsai et al., 2022)	Polynomial (sampling)	Arbitrary order	Any value function, low-order settings
Spectral PCE (Ruess, 28 Sep 2024)	Truncation-dependent	Yes (all orders)	Model admits polynomial expansion
STI-KNN (Belaid et al., 2023)	$O(t n^2)$	Pairwise data	KNN-type data/valuation

6. Interpretation, Limitations, and Outlook

Shapley interaction indices illuminate how complex outputs arise from synergistic or antagonistic combinations, not just individual data elements. The main limitations are:

For high-order ( $k > 3$ ) effects, even polynomial algorithms can become intractable unless strong model structure or sparsity exists.
For deep networks, architectural modifications (e.g., HarsanyiNet (Chen et al., 2023)) or tailored sampling are required for exactness, with cost/accuracy tradeoffs.
Interaction attribution always depends on the value function chosen (e.g., conditional variance vs. expected output), potentially leading to differing importances.

Nevertheless, the field now possesses a mature suite of theoretical foundations and scalable algorithms, with demonstrable practical applications in interpretability, feature selection, data pricing, and fairness attribution. The ongoing integration of spectral and recursive analytic methods (Ruess, 28 Sep 2024, Mohammadi et al., 20 Aug 2025) continues to push boundaries on exactness and scalability in both deterministic and stochastic modeling settings.