Shapley Explanation Networks

Updated 23 March 2026

Shapley Explanation Networks are neural architectures that embed Shapley value theory to provide intrinsic, built-in explanations during prediction.
They employ innovative designs like intrinsic modules and auxiliary explanation heads to compute efficient, faithful attributions in one forward pass.
SENs have demonstrated robust performance and scalability across images, graphs, time series, and complex-valued signals, advancing explainable AI.

Shapley Explanation Networks (SENs) represent a class of neural architectures and inference pipelines that integrate Shapley value theory—originally from cooperative game theory—into neural network training, prediction, and explanation. SENs aim to make feature attributions a native and principled part of a model’s reasoning process, in contrast to the standard paradigm of expensive post hoc explanation. This approach yields models capable of producing faithful, fair, and often fast explanations of their own predictions across modalities including images, graphs, time series, tabular data, and even complex-valued signals, with theoretical guarantees inherited from the Shapley framework.

1. Mathematical Foundations of Shapley Explanations in Neural Networks

At the core of SENs is the Shapley value for an input-feature attribution problem. Given a model $f:\mathbb{R}^n\to \mathbb{R}$ and an input $x\in \mathbb{R}^n$ , the Shapley value $\phi_i(f, x)$ of the $i$ th feature is:

$\phi_i(f, x) = \sum_{S\subseteq N\setminus \{i\}} \frac{|S|! (n - |S| - 1)!}{n!} \left[ f(x_{S\cup\{i\}}) - f(x_S) \right]$

where $N = \{1, ..., n\}$ and $x_S$ denotes the vector where features in $S$ are observed and others are replaced by a reference, such as a mean or a learned baseline (Wang et al., 2021).

SENs generalize this principle to various settings:

Neurons and filters: Quantifying the influence of an internal neuron on the model’s output or some performance metric (Ghorbani et al., 2020).
Spatial units or patches: Attributing output to image patches, transformer tokens, or graph edges (Fan et al., 16 Dec 2025, Akkas et al., 2024, Akkas et al., 27 Jun 2025).
Complex-valued networks: Extending the cooperative-game definition to $\mathbb{C}^n$ using Wirtinger calculus (Eilers et al., 2024).
Time series: Defining each (feature, time) cell as a player and estimating its Shapley value (Cheng et al., 25 Jan 2025).

Crucially, SENs strive not only for fidelity to the Shapley axioms (efficiency, symmetry, dummy, additivity), but also practical tractability and compatibility with deep learning frameworks.

2. Architectural Patterns and Algorithmic Implementations

Multiple SEN architectures have been developed, varying by target domain and computational strategy.

Intrinsic Shapley modules: SENs such as ShapNet (Wang et al., 2021) realize Shapley transforms as neural network modules. For a function with known sparse interactions, the module computes exact or approximate per-feature Shapley values, cascading them layer-wise in deep architectures, enabling both prediction and explanation in a single forward pass.
Auxiliary Shapley heads: Vision SENs append an explanation head to the last feature layer (e.g., patch tokens of a ViT or spatial features of ResNet), tasked with predicting per-patch Shapley contributions to the score/logit of each class (Fan et al., 16 Dec 2025). Training jointly enforces consistency between prediction and explanation.
Shapley-based pretraining: For time series, SENs incorporate pretext tasks or explainer losses that amortize Shapley estimation over all features and time steps, ensuring that the model produces explanations and predictions in one operation (Cheng et al., 25 Jan 2025).
Analytical and chain rule propagation: SENs in classical feedforward or complex-valued networks derive closed-form approximations to the Shapley value for individual neurons, generalizing LRP and DeepSHAP to complex settings and enabling differentiable relevance backpropagation (Li et al., 2019, Eilers et al., 2024).
Surrogate networks for conditional expectations: SENs can be realized as amortized conditional expectation networks which, once trained, provide efficient evaluations of $E[f(X)|X_S=x_S]$ for any subset $S$ , thereby accelerating conditional Shapley methods and supporting dependency-structured attributions (Richman et al., 2023).
Distributed and scalable implementations: Graph SENs apply distributed sampling and inference (e.g., on supercomputing clusters) to scale KernelSHAP-based edge attributions to millions of edges (Akkas et al., 27 Jun 2025).

3. Approximations, Accelerations, and Scalability

Direct computation of Shapley values is combinatorially hard. SENs employ a range of approximation strategies:

Analytical approximations for specific activations: For ReLU neurons, a reduced-form Gaussian approximation gives a closed-form solution for the Shapley value, enabling fast layer-wise propagation (Li et al., 2019).
Sampling and surrogate regression: KernelSHAP-style weighted least squares, permutation sampling, and regression-based surrogates (e.g., FastSHAP, hybrid least-squares explainer heads) approximate the exponential sum efficiently (Akkas et al., 2024, Cheng et al., 2023).
Chain rule and cooperator selection: Algorithms like SHEAR use Taylor expansions and Hessian-based "contributive cooperator selection" to identify small feature subsets whose interactions dominate the Shapley value, maintaining accuracy while achieving sub-exponential cost (Wang et al., 2022).
Parallel and distributed inference: Batched subgraph masking, distributed least-squares solvers (e.g., CGLS), and block-diagonal batching enable SENs to realize GNN explanations at the scale of modern networks and graphs (Akkas et al., 27 Jun 2025, Akkas et al., 2024).
Latent manifold modeling: For high-dimensional data, SENs may project to a learned low-dimensional manifold (via GANs or autoencoders), compute Shapley values in the manifold space, and back-project to the observation space, reconciling the independence assumptions of classical SHAP with real data manifold structure (Hu et al., 2024).

4. Empirical Performance and Theoretical Guarantees

SENs are validated against several criteria:

Fidelity to Shapley axioms: SENs (e.g., DeepCSHAP, ShapNet, vision SENs) satisfy missingness, efficiency, and local accuracy to high precision on synthetic and real tasks (Eilers et al., 2024, Wang et al., 2021, Fan et al., 16 Dec 2025). Manifold-based and conditional expectation SENs improve fidelity by avoiding off-manifold artifacts (Hu et al., 2024, Richman et al., 2023).
Interpretability and run-time efficiency: Intrinsic SENs provide explanations in a single forward pass, in contrast to post hoc methods requiring $O(2^n)$ or $O(n^2)$ model evaluations. For example, vision SENs match or exceed state-of-the-art post hoc explainers in AOPC, localization, and insertion/deletion metrics, with virtually no inference overhead (Fan et al., 16 Dec 2025). SHAPNN and similar frameworks show 7–8× speedup in tabular settings versus KernelSHAP, with no loss in predictive accuracy (Cheng et al., 2023).
Faithfulness and functional utility: In graph domains, GNNShap and DistShap outperform prior methods by 2–5× in fidelity metrics while remaining scalable to millions of players, supporting use in scientific domains where both throughput and trustworthiness are critical (Akkas et al., 2024, Akkas et al., 27 Jun 2025). Uncertainty-weighted sampling corrects for known baseline-induced pathologies, resulting in more faithful explanations in NLP (Lu, 17 Feb 2025).
Regularization and optimization: Adding explanation-based regularization (e.g., via Shapley L1/L $\infty$ penalties) induces desired interpretability properties such as sparsity or uniformity in attribution maps, often with negligible or even positive effect on prediction error (Wang et al., 2021, Cheng et al., 2023).

5. Domain Extensions and Methodological Variants

SENs have been realized in a wide array of neural architectures and application areas:

Vision: Patch-based attribution SENs for ViTs and CNNs, ShapleyCAM/CRG frameworks for class activation mapping, and SHAPNN for tabular visual classification (Fan et al., 16 Dec 2025, Cai, 9 Jan 2025, Zheng et al., 2022, Cheng et al., 2023).
Graph Neural Networks: Edge and node attribution via KernelSHAP least-squares surrogates, MANOVA/ANOVA-style importance decompositions, and scalable distributed systems for extremely large graphs (Akkas et al., 27 Jun 2025, Akkas et al., 2024, Duval et al., 2021).
Time Series: ShapTST amortizes Shapley computation for time-feature cells, allowing single-pass explanations with robust training improvements and explicit efficiency axiom correction (Cheng et al., 25 Jan 2025).
Complex-valued models: DeepCSHAP implements analytic, Wirtinger-calculus-based Shapley explanations in $\mathbb{C}$ -valued networks, generalizing DeepSHAP and matching SHAP axioms (Eilers et al., 2024).
Tabular and probabilistic models: SHAPNN and Variational Shapley Networks model amortized attributions with uncertainty quantification, facilitating both explanation and robust statistical inference (Cheng et al., 2023, Ketenci et al., 2024).
Layer-wise and neuron attribution: Neuron Shapley computes per-neuron/filer attributions for any task metric, showing unique efficacy in model debiasing and adversarial repair (Ghorbani et al., 2020).

6. Limitations, Challenges, and Open Directions

Despite substantial progress, SENs face continuing challenges:

Sample complexity and global fidelity: For extremely large feature sets, even optimized approximations incur trade-offs. Distributed inference mitigates but does not eliminate these fundamental limits (Akkas et al., 27 Jun 2025).
Baseline and marginal model specification: Off-manifold baselines or improper marginal estimation can lead to biased attributions; manifold-based or uncertainty-weighted approaches offer partial solutions but may require further domain-specific development (Hu et al., 2024, Lu, 17 Feb 2025).
Architectural and computational constraints: Some architectures (e.g., deep ShapNet) require sparsity or limited interaction order for exactness or efficient approximation; generalizing to arbitrary deep nets and attention mechanisms with theoretical guarantees is ongoing work (Wang et al., 2021).
Interpretation and human alignment: Even "faithful" Shapley attributions may misalign with human reasoning, especially in the presence of feature redundancy or conceptual ambiguity; integrating SENs with human-centric concept discovery remains an open direction (Lu, 17 Feb 2025).
Integration into broader XAI pipelines: SENs offer a robust mechanism for neural self-explanation, but practical deployment in critical domains (e.g., healthcare, scientific discovery) requires further validation and regulatory alignment.

7. Synthesis and Impact

Shapley Explanation Networks instantiate a principled unification of prediction and attribution, rooted in cooperative game theory and realized through analytical, algorithmic, and architectural innovation. By making Shapley values intrinsic to model computation—rather than an expensive afterthought—SENs provide efficient, fair, and faithful explanations across diverse neural network domains. SENs have catalyzed advances in architectural transparency, robust model development, and the scientific credibility of modern AI, with ongoing extensions in uncertainty quantification, distributed explanations, and manifold-aware attributions promising further gains in trustworthiness and applicability (Li et al., 2019, Wang et al., 2021, Fan et al., 16 Dec 2025, Eilers et al., 2024, Cheng et al., 25 Jan 2025, Akkas et al., 27 Jun 2025, Hu et al., 2024).