Papers
Topics
Authors
Recent
Search
2000 character limit reached

Neuron Shapley Approach

Updated 16 May 2026
  • Neuron Shapley approach is a game-theoretic framework that assigns each neuron a marginal contribution based on average performance gain over all coalitional contexts.
  • It employs efficient Monte Carlo sampling and adaptive algorithms to approximate contributions despite the combinatorial complexity of evaluating all neuron subsets.
  • Empirical studies demonstrate that ranking neurons by Shapley values aids in pruning, fairness correction, and robust defense against adversarial attacks.

The Neuron Shapley approach is a game-theoretic framework that quantifies the individual contribution of neurons or filters in a neural network with respect to a user-defined performance metric. By leveraging the Shapley value from cooperative game theory, this methodology rigorously captures the effect of removing or perturbing subsets of neurons, enabling principled neuron ranking for interpretability, pruning, fairness correction, and robustness improvement. Neuron Shapley formalizes neuron importance as the average marginal performance gain conferred by a neuron across all possible insertion orders and coalitional contexts, providing a unique "fair" attribution that accounts for all higher-order interactions within the network (Ghorbani et al., 2020).

1. Formal Framework and Mathematical Definition

Let N={1,2,...,n}N = \{1, 2, ..., n\} denote the set of neurons or filters in a trained neural network. The Neuron Shapley value ϕi(V,N)\phi_i(V, N) for neuron ii is defined with respect to a performance function V(S)V(S) measuring, for subset SNS \subseteq N, any network-level metric of interest (accuracy, loss, bias, vulnerability, etc.) after retaining only neurons in SS and zeroing out others:

ϕi(V,N)=1NSN{i}V(S{i})V(S)(N1S)\phi_i(V, N) = \frac{1}{|N|} \sum_{S \subseteq N \setminus \{i\}} \frac{V(S \cup \{i\}) - V(S)}{\binom{|N| - 1}{|S|}}

Equivalently, for a uniformly drawn permutation π\pi of NN, letting SπiS^i_\pi be the set of neurons preceding ϕi(V,N)\phi_i(V, N)0 in ϕi(V,N)\phi_i(V, N)1,

ϕi(V,N)\phi_i(V, N)2

This captures the expected marginal contribution of each neuron over all contexts, with the properties of symmetry, efficiency (in the standard setting), and null-player guaranteed by Shapley’s axioms (Ghorbani et al., 2020, Stier et al., 2019, Adamczewski et al., 2019).

2. Interactions, Synergy, and Theoretical Underpinnings

Neuron Shapley values uniquely capture arbitrary and higher-order interactions among neurons. Marginal contributions ϕi(V,N)\phi_i(V, N)3 are accumulated over all possible coalitions ϕi(V,N)\phi_i(V, N)4 of the remaining neurons, so the measure reflects not only individual neuron impact but also synergies or redundancies. If certain neurons only exert influence in specific combinations (e.g., neither is impactful alone but together are essential for a feature), their collective effect emerges naturally across the relevant terms in the sum.

This framework is supported by the cooperative game theory axioms: null-player (zero contribution if ϕi(V,N)\phi_i(V, N)5 never changes ϕi(V,N)\phi_i(V, N)6), symmetry (neurons with identical outputs have identical Shapley values), and additivity (the Shapley decomposition respects summing metrics). As a result, the Neuron Shapley value is the unique fair division of performance gain attributable to the neurons (Ghorbani et al., 2020, Stier et al., 2019, Adamczewski et al., 2019).

3. Efficient Approximation Algorithms

Exact computation is combinatorial (exponential in ϕi(V,N)\phi_i(V, N)7), rendering naively computing all subset values impractical for modern networks. Scalable approximations include:

  • Monte Carlo over permutations: Randomly sample ϕi(V,N)\phi_i(V, N)8 orderings of neurons, for each computing the marginal contribution of ϕi(V,N)\phi_i(V, N)9 at its position, and average. Typically ii0--ii1 gives reasonable accuracy.
  • Early truncation: For subnetworks where ii2 falls below an application-specific threshold ii3 ("dead subnetwork"), computation is aborted to avoid unnecessary evaluations.
  • Adaptive multi-armed bandit (MAB): Focuses computation on neurons whose Shapley value ranks matter most (e.g., top-ii4), adaptively ceasing sampling for neurons whose estimated bounds are unlikely to affect the ordering.

The TMAB-Shapley algorithm yields an order-of-magnitude speedup, supporting high-throughput evaluation (e.g., all 17,216 filters in Inception-v3) with strong empirical agreement to full Monte Carlo (rank correlation ii5) (Ghorbani et al., 2020). Basic permutation-sampling or subset-sampling variants suffice for moderate ii6 (ii7) (Stier et al., 2019, Adamczewski et al., 2019).

Table: Key Algorithmic Variants

Algorithmic Idea Advantage Reference
Permutation Monte Carlo Unbiased, scalable to ii8 (Ghorbani et al., 2020)
Early Truncation Skips uninformative trajectories (Ghorbani et al., 2020)
Multi-armed Bandit Efficient focus on top-k (Ghorbani et al., 2020)
Subset Sampling Simpler for small ii9 (Stier et al., 2019)

4. Empirical Results and Applications

Destructive and Interpretive Power

On Inception-v3 (17,216 filters) for ImageNet classification:

  • Removing top-10, 20, and 30 highest-V(S)V(S)0 filters reduced top-1 accuracy from 74% to 38%, 8%, and random label performance, respectively, whereas removing 20 random filters had negligible effect (Ghorbani et al., 2020).
  • Visualization (DeepDream, maximally activating images) shows early "critical" filters detect simple features (edges, textures), while later ones detect higher-level concepts ("crowdedness", "colorfulness") (Ghorbani et al., 2020).

Model Repair

Fairness Correction: For SqueezeNet face gender classifier (CelebA, tested on PPB), filters with negative V(S)V(S)1 with respect to balanced PPB accuracy were identified as sources of bias. Removing the most negative 50--100 filters improved accuracy on Black Female from 54.7% to 81.9% and overall PPB accuracy from 84.9% to 91.7%, with minimal loss to CelebA accuracy (Ghorbani et al., 2020).

Adversarial Robustness: For Inception-v3 under PGD V(S)V(S)2 attack, removing 16 filters with highest adversarial Shapley scores dropped white-box attack success from nearly 100% to 0.1% and reduced black-box transfer attack success by V(S)V(S)337%, while clean accuracy only dropped from 74% to 67% (Ghorbani et al., 2020).

Fault Localization: Hierarchical Deep SHAP-based protocols in SHARPEN identify both layers and neurons most responsible for performance defects by measuring the divergence in SHAP attributions between erroneous and benign conditions. This approach efficiently localizes faulty filters, enabling focused, derivative-free repair strategies (Sun et al., 1 Apr 2026).

Compression and Pruning

Neuron Shapley rankings have been leveraged to aggressively prune networks with minimal loss. For LeNet-5 on MNIST, filters or neurons with lowest Shapley value can be pruned down to V(S)V(S)420--30\% of original size with negligible accuracy loss; similar gains are observed on VGG-16 for CIFAR-10 (Adamczewski et al., 2019, Stier et al., 2019).

Table: Representative Results on Network Compression via Shapley Pruning

Model Baseline # Params Shapley-Pruned Accuracy Loss Reference
LeNet-5 V(S)V(S)5K V(S)V(S)6K V(S)V(S)7 (Adamczewski et al., 2019)
VGG-16 - Fewer params V(S)V(S)8 (Adamczewski et al., 2019)

5. Shapley-based Interpretability and Training Dynamics

The Shapley value’s context-sensitivity yields robust interpretability: only a sparse subset of neurons typically dominate model performance, providing a minimal shortlist for visualization or intervention. For ReLU-activated networks, analytical Shapley approximations permit fast, closed-form estimation and linear relevance propagation (LRP), maintaining a global conservation property and enabling high-fidelity attribution maps. Empirically, pixel-wise LRP with Shapley-based weights yields sharper, more concentrated saliency maps than conventional gradient-based methods (Li et al., 2019).

Moreover, Shapley-inspired gradients (non-vanishing, continuous, context-adaptive) can serve as drop-in replacements for ReLU gradients during training, improving convergence, stability, and mitigating "dead neuron" issues. This is operationalized via the “Shapley Activation” (SA) function, which matches Shapley gradient behavior and demonstrates improved performance and stability across optimizers and datasets (Li et al., 2019).

6. Extensions: Fault Localization and Derivative-Free Repair

Recent methodologies (e.g., SHARPEN) extend the Neuron Shapley framework to support automated neural repair. Hierarchical, Deep SHAP-based localization identifies suspicious layers and neurons via activation-divergence between faulty and clean models, enabling focused derivative-free optimization (e.g., CMA-ES) for property repair. This approach avoids the limitations of gradient-based repair, supporting arbitrary architectures and activation functions, and has demonstrated superiority in backdoor and unfairness mitigation across several architectures and defect types (Sun et al., 1 Apr 2026).

7. Limitations and Theoretical Considerations

The primary computational challenge remains scaling exact Shapley value calculation, though efficient stochastic algorithms ameliorate this for contemporary networks. Practical implementations assume that performance metrics can be quickly evaluated on subnetworks with some neurons masked, and that Shapley’s assumptions (e.g., independence of contributions, efficiency) are meaningful for the selected metric. Gaussian approximations for Shapley relevance in high-dimensional, highly correlated settings may introduce error (Li et al., 2019), and neglecting cross-terms in layer-wise propagation may limit interpretive resolution.

Nonetheless, the Neuron Shapley approach remains the only neurally-internal method with a theoretically principled foundation for quantifying contribution, identifying synergistic and antagonistic neuronal roles, and supporting a suite of model analysis, compression, interpretability, and repair tasks across both feedforward and convolutional architectures (Ghorbani et al., 2020, Sun et al., 1 Apr 2026, Stier et al., 2019, Adamczewski et al., 2019, Li et al., 2019).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Neuron Shapley Approach.