Neuron Shapley Approach
- Neuron Shapley approach is a game-theoretic framework that assigns each neuron a marginal contribution based on average performance gain over all coalitional contexts.
- It employs efficient Monte Carlo sampling and adaptive algorithms to approximate contributions despite the combinatorial complexity of evaluating all neuron subsets.
- Empirical studies demonstrate that ranking neurons by Shapley values aids in pruning, fairness correction, and robust defense against adversarial attacks.
The Neuron Shapley approach is a game-theoretic framework that quantifies the individual contribution of neurons or filters in a neural network with respect to a user-defined performance metric. By leveraging the Shapley value from cooperative game theory, this methodology rigorously captures the effect of removing or perturbing subsets of neurons, enabling principled neuron ranking for interpretability, pruning, fairness correction, and robustness improvement. Neuron Shapley formalizes neuron importance as the average marginal performance gain conferred by a neuron across all possible insertion orders and coalitional contexts, providing a unique "fair" attribution that accounts for all higher-order interactions within the network (Ghorbani et al., 2020).
1. Formal Framework and Mathematical Definition
Let denote the set of neurons or filters in a trained neural network. The Neuron Shapley value for neuron is defined with respect to a performance function measuring, for subset , any network-level metric of interest (accuracy, loss, bias, vulnerability, etc.) after retaining only neurons in and zeroing out others:
Equivalently, for a uniformly drawn permutation of , letting be the set of neurons preceding 0 in 1,
2
This captures the expected marginal contribution of each neuron over all contexts, with the properties of symmetry, efficiency (in the standard setting), and null-player guaranteed by Shapley’s axioms (Ghorbani et al., 2020, Stier et al., 2019, Adamczewski et al., 2019).
2. Interactions, Synergy, and Theoretical Underpinnings
Neuron Shapley values uniquely capture arbitrary and higher-order interactions among neurons. Marginal contributions 3 are accumulated over all possible coalitions 4 of the remaining neurons, so the measure reflects not only individual neuron impact but also synergies or redundancies. If certain neurons only exert influence in specific combinations (e.g., neither is impactful alone but together are essential for a feature), their collective effect emerges naturally across the relevant terms in the sum.
This framework is supported by the cooperative game theory axioms: null-player (zero contribution if 5 never changes 6), symmetry (neurons with identical outputs have identical Shapley values), and additivity (the Shapley decomposition respects summing metrics). As a result, the Neuron Shapley value is the unique fair division of performance gain attributable to the neurons (Ghorbani et al., 2020, Stier et al., 2019, Adamczewski et al., 2019).
3. Efficient Approximation Algorithms
Exact computation is combinatorial (exponential in 7), rendering naively computing all subset values impractical for modern networks. Scalable approximations include:
- Monte Carlo over permutations: Randomly sample 8 orderings of neurons, for each computing the marginal contribution of 9 at its position, and average. Typically 0--1 gives reasonable accuracy.
- Early truncation: For subnetworks where 2 falls below an application-specific threshold 3 ("dead subnetwork"), computation is aborted to avoid unnecessary evaluations.
- Adaptive multi-armed bandit (MAB): Focuses computation on neurons whose Shapley value ranks matter most (e.g., top-4), adaptively ceasing sampling for neurons whose estimated bounds are unlikely to affect the ordering.
The TMAB-Shapley algorithm yields an order-of-magnitude speedup, supporting high-throughput evaluation (e.g., all 17,216 filters in Inception-v3) with strong empirical agreement to full Monte Carlo (rank correlation 5) (Ghorbani et al., 2020). Basic permutation-sampling or subset-sampling variants suffice for moderate 6 (7) (Stier et al., 2019, Adamczewski et al., 2019).
Table: Key Algorithmic Variants
| Algorithmic Idea | Advantage | Reference |
|---|---|---|
| Permutation Monte Carlo | Unbiased, scalable to 8 | (Ghorbani et al., 2020) |
| Early Truncation | Skips uninformative trajectories | (Ghorbani et al., 2020) |
| Multi-armed Bandit | Efficient focus on top-k | (Ghorbani et al., 2020) |
| Subset Sampling | Simpler for small 9 | (Stier et al., 2019) |
4. Empirical Results and Applications
Destructive and Interpretive Power
On Inception-v3 (17,216 filters) for ImageNet classification:
- Removing top-10, 20, and 30 highest-0 filters reduced top-1 accuracy from 74% to 38%, 8%, and random label performance, respectively, whereas removing 20 random filters had negligible effect (Ghorbani et al., 2020).
- Visualization (DeepDream, maximally activating images) shows early "critical" filters detect simple features (edges, textures), while later ones detect higher-level concepts ("crowdedness", "colorfulness") (Ghorbani et al., 2020).
Model Repair
Fairness Correction: For SqueezeNet face gender classifier (CelebA, tested on PPB), filters with negative 1 with respect to balanced PPB accuracy were identified as sources of bias. Removing the most negative 50--100 filters improved accuracy on Black Female from 54.7% to 81.9% and overall PPB accuracy from 84.9% to 91.7%, with minimal loss to CelebA accuracy (Ghorbani et al., 2020).
Adversarial Robustness: For Inception-v3 under PGD 2 attack, removing 16 filters with highest adversarial Shapley scores dropped white-box attack success from nearly 100% to 0.1% and reduced black-box transfer attack success by 337%, while clean accuracy only dropped from 74% to 67% (Ghorbani et al., 2020).
Fault Localization: Hierarchical Deep SHAP-based protocols in SHARPEN identify both layers and neurons most responsible for performance defects by measuring the divergence in SHAP attributions between erroneous and benign conditions. This approach efficiently localizes faulty filters, enabling focused, derivative-free repair strategies (Sun et al., 1 Apr 2026).
Compression and Pruning
Neuron Shapley rankings have been leveraged to aggressively prune networks with minimal loss. For LeNet-5 on MNIST, filters or neurons with lowest Shapley value can be pruned down to 420--30\% of original size with negligible accuracy loss; similar gains are observed on VGG-16 for CIFAR-10 (Adamczewski et al., 2019, Stier et al., 2019).
Table: Representative Results on Network Compression via Shapley Pruning
| Model | Baseline # Params | Shapley-Pruned | Accuracy Loss | Reference |
|---|---|---|---|---|
| LeNet-5 | 5K | 6K | 7 | (Adamczewski et al., 2019) |
| VGG-16 | - | Fewer params | 8 | (Adamczewski et al., 2019) |
5. Shapley-based Interpretability and Training Dynamics
The Shapley value’s context-sensitivity yields robust interpretability: only a sparse subset of neurons typically dominate model performance, providing a minimal shortlist for visualization or intervention. For ReLU-activated networks, analytical Shapley approximations permit fast, closed-form estimation and linear relevance propagation (LRP), maintaining a global conservation property and enabling high-fidelity attribution maps. Empirically, pixel-wise LRP with Shapley-based weights yields sharper, more concentrated saliency maps than conventional gradient-based methods (Li et al., 2019).
Moreover, Shapley-inspired gradients (non-vanishing, continuous, context-adaptive) can serve as drop-in replacements for ReLU gradients during training, improving convergence, stability, and mitigating "dead neuron" issues. This is operationalized via the “Shapley Activation” (SA) function, which matches Shapley gradient behavior and demonstrates improved performance and stability across optimizers and datasets (Li et al., 2019).
6. Extensions: Fault Localization and Derivative-Free Repair
Recent methodologies (e.g., SHARPEN) extend the Neuron Shapley framework to support automated neural repair. Hierarchical, Deep SHAP-based localization identifies suspicious layers and neurons via activation-divergence between faulty and clean models, enabling focused derivative-free optimization (e.g., CMA-ES) for property repair. This approach avoids the limitations of gradient-based repair, supporting arbitrary architectures and activation functions, and has demonstrated superiority in backdoor and unfairness mitigation across several architectures and defect types (Sun et al., 1 Apr 2026).
7. Limitations and Theoretical Considerations
The primary computational challenge remains scaling exact Shapley value calculation, though efficient stochastic algorithms ameliorate this for contemporary networks. Practical implementations assume that performance metrics can be quickly evaluated on subnetworks with some neurons masked, and that Shapley’s assumptions (e.g., independence of contributions, efficiency) are meaningful for the selected metric. Gaussian approximations for Shapley relevance in high-dimensional, highly correlated settings may introduce error (Li et al., 2019), and neglecting cross-terms in layer-wise propagation may limit interpretive resolution.
Nonetheless, the Neuron Shapley approach remains the only neurally-internal method with a theoretically principled foundation for quantifying contribution, identifying synergistic and antagonistic neuronal roles, and supporting a suite of model analysis, compression, interpretability, and repair tasks across both feedforward and convolutional architectures (Ghorbani et al., 2020, Sun et al., 1 Apr 2026, Stier et al., 2019, Adamczewski et al., 2019, Li et al., 2019).