Neuron Shapley Approach

Updated 16 May 2026

Neuron Shapley approach is a game-theoretic framework that assigns each neuron a marginal contribution based on average performance gain over all coalitional contexts.
It employs efficient Monte Carlo sampling and adaptive algorithms to approximate contributions despite the combinatorial complexity of evaluating all neuron subsets.
Empirical studies demonstrate that ranking neurons by Shapley values aids in pruning, fairness correction, and robust defense against adversarial attacks.

The Neuron Shapley approach is a game-theoretic framework that quantifies the individual contribution of neurons or filters in a neural network with respect to a user-defined performance metric. By leveraging the Shapley value from cooperative game theory, this methodology rigorously captures the effect of removing or perturbing subsets of neurons, enabling principled neuron ranking for interpretability, pruning, fairness correction, and robustness improvement. Neuron Shapley formalizes neuron importance as the average marginal performance gain conferred by a neuron across all possible insertion orders and coalitional contexts, providing a unique "fair" attribution that accounts for all higher-order interactions within the network (Ghorbani et al., 2020).

1. Formal Framework and Mathematical Definition

Let $N = \{1, 2, ..., n\}$ denote the set of neurons or filters in a trained neural network. The Neuron Shapley value $\phi_i(V, N)$ for neuron $i$ is defined with respect to a performance function $V(S)$ measuring, for subset $S \subseteq N$ , any network-level metric of interest (accuracy, loss, bias, vulnerability, etc.) after retaining only neurons in $S$ and zeroing out others:

$\phi_i(V, N) = \frac{1}{|N|} \sum_{S \subseteq N \setminus \{i\}} \frac{V(S \cup \{i\}) - V(S)}{\binom{|N| - 1}{|S|}}$

Equivalently, for a uniformly drawn permutation $\pi$ of $N$ , letting $S^i_\pi$ be the set of neurons preceding $\phi_i(V, N)$ 0 in $\phi_i(V, N)$ 1,

$\phi_i(V, N)$ 2

This captures the expected marginal contribution of each neuron over all contexts, with the properties of symmetry, efficiency (in the standard setting), and null-player guaranteed by Shapley’s axioms (Ghorbani et al., 2020, Stier et al., 2019, Adamczewski et al., 2019).

2. Interactions, Synergy, and Theoretical Underpinnings

Neuron Shapley values uniquely capture arbitrary and higher-order interactions among neurons. Marginal contributions $\phi_i(V, N)$ 3 are accumulated over all possible coalitions $\phi_i(V, N)$ 4 of the remaining neurons, so the measure reflects not only individual neuron impact but also synergies or redundancies. If certain neurons only exert influence in specific combinations (e.g., neither is impactful alone but together are essential for a feature), their collective effect emerges naturally across the relevant terms in the sum.

This framework is supported by the cooperative game theory axioms: null-player (zero contribution if $\phi_i(V, N)$ 5 never changes $\phi_i(V, N)$ 6), symmetry (neurons with identical outputs have identical Shapley values), and additivity (the Shapley decomposition respects summing metrics). As a result, the Neuron Shapley value is the unique fair division of performance gain attributable to the neurons (Ghorbani et al., 2020, Stier et al., 2019, Adamczewski et al., 2019).

3. Efficient Approximation Algorithms

Exact computation is combinatorial (exponential in $\phi_i(V, N)$ 7), rendering naively computing all subset values impractical for modern networks. Scalable approximations include:

Monte Carlo over permutations: Randomly sample $\phi_i(V, N)$ 8 orderings of neurons, for each computing the marginal contribution of $\phi_i(V, N)$ 9 at its position, and average. Typically $i$ 0-- $i$ 1 gives reasonable accuracy.
Early truncation: For subnetworks where $i$ 2 falls below an application-specific threshold $i$ 3 ("dead subnetwork"), computation is aborted to avoid unnecessary evaluations.
Adaptive multi-armed bandit (MAB): Focuses computation on neurons whose Shapley value ranks matter most (e.g., top- $i$ 4), adaptively ceasing sampling for neurons whose estimated bounds are unlikely to affect the ordering.

The TMAB-Shapley algorithm yields an order-of-magnitude speedup, supporting high-throughput evaluation (e.g., all 17,216 filters in Inception-v3) with strong empirical agreement to full Monte Carlo (rank correlation $i$ 5) (Ghorbani et al., 2020). Basic permutation-sampling or subset-sampling variants suffice for moderate $i$ 6 ( $i$ 7) (Stier et al., 2019, Adamczewski et al., 2019).

Table: Key Algorithmic Variants

Algorithmic Idea	Advantage	Reference
Permutation Monte Carlo	Unbiased, scalable to $i$ 8	(Ghorbani et al., 2020)
Early Truncation	Skips uninformative trajectories	(Ghorbani et al., 2020)
Multi-armed Bandit	Efficient focus on top-k	(Ghorbani et al., 2020)
Subset Sampling	Simpler for small $i$ 9	(Stier et al., 2019)

4. Empirical Results and Applications

Destructive and Interpretive Power

On Inception-v3 (17,216 filters) for ImageNet classification:

Removing top-10, 20, and 30 highest- $V(S)$ 0 filters reduced top-1 accuracy from 74% to 38%, 8%, and random label performance, respectively, whereas removing 20 random filters had negligible effect (Ghorbani et al., 2020).
Visualization (DeepDream, maximally activating images) shows early "critical" filters detect simple features (edges, textures), while later ones detect higher-level concepts ("crowdedness", "colorfulness") (Ghorbani et al., 2020).

Model Repair

Fairness Correction: For SqueezeNet face gender classifier (CelebA, tested on PPB), filters with negative $V(S)$ 1 with respect to balanced PPB accuracy were identified as sources of bias. Removing the most negative 50--100 filters improved accuracy on Black Female from 54.7% to 81.9% and overall PPB accuracy from 84.9% to 91.7%, with minimal loss to CelebA accuracy (Ghorbani et al., 2020).

Adversarial Robustness: For Inception-v3 under PGD $V(S)$ 2 attack, removing 16 filters with highest adversarial Shapley scores dropped white-box attack success from nearly 100% to 0.1% and reduced black-box transfer attack success by $V(S)$ 337%, while clean accuracy only dropped from 74% to 67% (Ghorbani et al., 2020).

Fault Localization: Hierarchical Deep SHAP-based protocols in SHARPEN identify both layers and neurons most responsible for performance defects by measuring the divergence in SHAP attributions between erroneous and benign conditions. This approach efficiently localizes faulty filters, enabling focused, derivative-free repair strategies (Sun et al., 1 Apr 2026).

Compression and Pruning

Neuron Shapley rankings have been leveraged to aggressively prune networks with minimal loss. For LeNet-5 on MNIST, filters or neurons with lowest Shapley value can be pruned down to $V(S)$ 420--30\% of original size with negligible accuracy loss; similar gains are observed on VGG-16 for CIFAR-10 (Adamczewski et al., 2019, Stier et al., 2019).

Table: Representative Results on Network Compression via Shapley Pruning

Model	Baseline # Params	Shapley-Pruned	Accuracy Loss	Reference
LeNet-5	$V(S)$ 5K	$V(S)$ 6K	$V(S)$ 7	(Adamczewski et al., 2019)
VGG-16	-	Fewer params	$V(S)$ 8	(Adamczewski et al., 2019)

5. Shapley-based Interpretability and Training Dynamics

The Shapley value’s context-sensitivity yields robust interpretability: only a sparse subset of neurons typically dominate model performance, providing a minimal shortlist for visualization or intervention. For ReLU-activated networks, analytical Shapley approximations permit fast, closed-form estimation and linear relevance propagation (LRP), maintaining a global conservation property and enabling high-fidelity attribution maps. Empirically, pixel-wise LRP with Shapley-based weights yields sharper, more concentrated saliency maps than conventional gradient-based methods (Li et al., 2019).

Moreover, Shapley-inspired gradients (non-vanishing, continuous, context-adaptive) can serve as drop-in replacements for ReLU gradients during training, improving convergence, stability, and mitigating "dead neuron" issues. This is operationalized via the “Shapley Activation” (SA) function, which matches Shapley gradient behavior and demonstrates improved performance and stability across optimizers and datasets (Li et al., 2019).

6. Extensions: Fault Localization and Derivative-Free Repair

Recent methodologies (e.g., SHARPEN) extend the Neuron Shapley framework to support automated neural repair. Hierarchical, Deep SHAP-based localization identifies suspicious layers and neurons via activation-divergence between faulty and clean models, enabling focused derivative-free optimization (e.g., CMA-ES) for property repair. This approach avoids the limitations of gradient-based repair, supporting arbitrary architectures and activation functions, and has demonstrated superiority in backdoor and unfairness mitigation across several architectures and defect types (Sun et al., 1 Apr 2026).

7. Limitations and Theoretical Considerations

The primary computational challenge remains scaling exact Shapley value calculation, though efficient stochastic algorithms ameliorate this for contemporary networks. Practical implementations assume that performance metrics can be quickly evaluated on subnetworks with some neurons masked, and that Shapley’s assumptions (e.g., independence of contributions, efficiency) are meaningful for the selected metric. Gaussian approximations for Shapley relevance in high-dimensional, highly correlated settings may introduce error (Li et al., 2019), and neglecting cross-terms in layer-wise propagation may limit interpretive resolution.

Nonetheless, the Neuron Shapley approach remains the only neurally-internal method with a theoretically principled foundation for quantifying contribution, identifying synergistic and antagonistic neuronal roles, and supporting a suite of model analysis, compression, interpretability, and repair tasks across both feedforward and convolutional architectures (Ghorbani et al., 2020, Sun et al., 1 Apr 2026, Stier et al., 2019, Adamczewski et al., 2019, Li et al., 2019).

Markdown Report Issue Upgrade to Chat

References (5)

Neuron Shapley: Discovering the Responsible Neurons (2020)

Analysing Neural Network Topologies: a Game Theoretic Approach (2019)

Neuron ranking -- an informed way to condense convolutional neural networks architecture (2019)

Shapley-Guided Neural Repair Approach via Derivative-Free Optimization (2026)

Shapley Interpretation and Activation in Neural Networks (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neuron Shapley Approach.

Neuron Shapley Approach

1. Formal Framework and Mathematical Definition

2. Interactions, Synergy, and Theoretical Underpinnings

3. Efficient Approximation Algorithms

4. Empirical Results and Applications

Destructive and Interpretive Power

Model Repair

Compression and Pruning

5. Shapley-based Interpretability and Training Dynamics

6. Extensions: Fault Localization and Derivative-Free Repair

7. Limitations and Theoretical Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Neuron Shapley Approach

1. Formal Framework and Mathematical Definition

2. Interactions, Synergy, and Theoretical Underpinnings

3. Efficient Approximation Algorithms

4. Empirical Results and Applications

Destructive and Interpretive Power

Model Repair

Compression and Pruning

5. Shapley-based Interpretability and Training Dynamics

6. Extensions: Fault Localization and Derivative-Free Repair

7. Limitations and Theoretical Considerations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research