SV-NUP: Shapley-Based Non-Uniform Pruning
- The paper presents SV-NUP, a framework that assigns principled Shapley value scores to components for targeted non-uniform pruning, enhancing both efficiency and accuracy.
- It employs scalable approximation techniques—such as Monte Carlo sampling, kernel SHAP, and surrogate networks—to estimate contributions in large and complex models.
- Empirical benchmarks demonstrate that SV-NUP outperforms heuristic methods by reducing computational cost and preserving model performance across CNNs, GNNs, LLMs, and other architectures.
Shapley Value-based Non-Uniform Pruning (SV-NUP) is a theoretically grounded framework for neural network and data sparsification that leverages cooperative game theory, specifically the Shapley value, to assign principled importance scores to model elements such as neurons, filters, edges, layers, or data points. These scores enable non-uniform pruning schedules that maximize efficiency or accuracy retention under constraints, outperforming heuristic or magnitude-based strategies—especially in challenging scenarios such as low-data regimes, adversarial noise, or large-scale models. SV-NUP applies across diverse architectures, including convolutional neural networks (CNNs), graph neural networks (GNNs), LLMs, Kolmogorov–Arnold networks (KANs), and recommender systems.
1. Game-Theoretic Foundations of SV-NUP
SV-NUP is rooted in the cooperative game-theoretic concept of the Shapley value, which provides a unique, fair attribution of a system's outcome to its individual components by averaging their marginal contributions over all possible coalitions. In the network context, the "players" can be neurons, filters, layers, network edges, or even training samples, and the worth of a coalition is quantified as a model performance metric (e.g., accuracy, perplexity, loss reduction) after masking or removing all non-coalition elements.
For a set of players , and a value function , the Shapley value of player is defined by
This formulation ensures the axiomatic properties of efficiency, symmetry, null-player, and additivity, yielding a fair assessment of importance and allowing both positive and negative attributions (Akkas et al., 28 Jul 2025, Ancona et al., 2020, Adamczewski et al., 2024, Fan et al., 2 Oct 2025, Jr et al., 2023, Sun et al., 3 May 2025, Zhang et al., 28 May 2025, Ding et al., 8 Feb 2026).
2. SV-NUP Methodology and Pruning Schedules
Computation of Importance Scores
The core SV-NUP pipeline consists of three stages:
- Importance Attribution: Estimate the Shapley value for each candidate element (e.g., neuron, filter, edge, data point) by evaluating its marginal effect on the coalition's value function.
- Global or Layerwise Aggregation: Aggregate local Shapley values to form global rankings, e.g., averaging edge scores over all node subgraphs in GNNs, or aggregating neuron scores over validation samples in CNNs.
- Non-Uniform Pruning Schedule: Select which elements to prune based on their Shapley score. Elements with strongly negative or low contributions are pruned first, and pruning ratios can vary across layers or network components.
This framework subsumes both absolute-threshold (all elements with are dropped) and relative-threshold (bottom- by are pruned) heuristics, as well as global cost-aware budgets (parameter, FLOP, or MAC count) (Ancona et al., 2020, Adamczewski et al., 2024, Akkas et al., 28 Jul 2025, Fan et al., 2 Oct 2025, Jr et al., 2023, Sun et al., 3 May 2025, Ding et al., 8 Feb 2026). Layerwise quotas or adaptive scheduling can further exploit layers' differing redundancy profiles.
3. Scalable Shapley Value Estimation and Approximations
Exact Shapley calculation is infeasible for large sets ( subsets for players), so SV-NUP relies on several approximate strategies:
- Monte Carlo Permutation Sampling: Estimate as the expected marginal contribution over random permutations. Each permutation requires two forward passes per element, and –$20$ typically suffices for stability in practical scenarios.
- Partial-k Shapley: Restrict calculation to coalitions up to size , with complexity per layer (Adamczewski et al., 2024).
- Regression-based (Kernel SHAP) Approximation: Fit a linear or weighted least-squares model to sampled coalition values, using Shapley kernel weights for optimal attribution (Adamczewski et al., 2024).
- Surrogate Neural Networks: For extremely high-dimensional settings (e.g., LLM layer selection), train a compact surrogate to predict model utility from binary layer masks and use this for stratified Monte Carlo Shapley computation (Ding et al., 8 Feb 2026).
- Sliding-window Shapley: For LLMs, restrict marginal calculations to subsets in a local window of adjacent layers, reducing cost to for a window of size in a -layer model (Sun et al., 3 May 2025).
These approximations preserve ranking fidelity and scale SV-NUP to large models with thousands of units or layers (Ancona et al., 2020, Adamczewski et al., 2024, Ding et al., 8 Feb 2026, Sun et al., 3 May 2025, Fan et al., 2 Oct 2025).
4. Architectures and Domains
SV-NUP has been instantiated for a broad variety of settings:
| Domain/Model | Player Type | Value Function / Utility | Aggregation / Notes |
|---|---|---|---|
| CNNs/MLPs | Filter/Neuron | Accuracy, loss, RMSE | Per-layer or global (Adamczewski et al., 2024, Ancona et al., 2020, Jr et al., 2023) |
| GNNs | Edges | Node prediction score | Mean over computational subgraphs (Akkas et al., 28 Jul 2025) |
| LLMs/Transformers | Layer | Inverse perplexity | Sliding window, surrogate net (Sun et al., 3 May 2025, Ding et al., 8 Feb 2026) |
| KANs | Neuron | Expected output/accuracy | Shift-invariant attribute scores (Fan et al., 2 Oct 2025) |
| Recommenders | Data point (interaction) | Loss reduction | FastSHAP amortized network (Zhang et al., 28 May 2025) |
This generality supports a unified theoretical and algorithmic framework for pruning and network compression across architectures.
5. Theoretical Guarantees and Interpretability
Shapley-based attribution lends SV-NUP several theoretical properties:
- Fairness: By the Shapley axioms, importance is allocated fairly among model elements even in the presence of synergy or redundancy.
- Robustness to Adversarial and Noisy Elements: Strongly negative Shapley values flag elements that harm model performance, enabling their removal in a principled way and sometimes improving accuracy post-pruning (Akkas et al., 28 Jul 2025, Ancona et al., 2020, Zhang et al., 28 May 2025).
- Label-Free and Model-Agnostic Operation: SV-NUP does not require labels or retraining after pruning; attributions are post hoc and rely only on forward inference passes (Akkas et al., 28 Jul 2025).
A further advantage is interpretability: negative or low Shapley scores can be used to analyze and explain which components degrade model behavior, support symbolic regression (in KANs), and benchmark attribution consistency under covariate shift (Fan et al., 2 Oct 2025).
6. Empirical Performance and Practical Impact
SV-NUP consistently exhibits superior or state-of-the-art performance relative to magnitude-, gradient-, and heuristic-based pruning baselines:
- CNNs/Vision: In VGG16 on CIFAR-10, 75% parameter reduction at <0.5% loss in accuracy; outperforms Dirichlet, HRank, and Bayesian compression at equivalent budgets (Adamczewski et al., 2024). In low-data or no-fine-tuning regimes, SV-NUP preserves up to 5–10% greater accuracy than heuristics (Ancona et al., 2020).
- GNNs: 80% edge pruning yields only 1.8% accuracy drop and over 60% savings in MACs. Negative contributions are essential; restricting to non-negative scores markedly worsens performance (Akkas et al., 28 Jul 2025).
- LLMs: Non-uniform per-layer budgets via SV-NUP (with surrogate or sliding window) reduce perplexity by 18–20% and boost zero-shot accuracy by up to 1.5 points over uniform/depth-pruning baselines in LLaMA and OPT models (Sun et al., 3 May 2025, Ding et al., 8 Feb 2026). Surrogate-based MC is essential for scalability.
- KANs: Consistently 10–20% lower RMSE after 60–80% neuron removal, with shift-invariant attributions (Fan et al., 2 Oct 2025).
- Recommender Systems: Non-uniform pruning of bottom 20% of interactions recovers 3–7% of lost Recall/NDCG after heavy noise, outperforming intent- or similarity-based denoising schemes (Zhang et al., 28 May 2025).
Ablation studies routinely show that leveraging both positive and negative attributions, and assigning non-uniform quotas tailored to local redundancy, is critical for high-sparsity regimes.
7. Limitations and Extensions
SV-NUP's major limitation is computational cost, especially for naive Shapley calculation; even with sampling, stability may require thousands of forward passes for deep or wide layers. The quality of importance estimates depends on sample size and the faithfulness of surrogates or linear approximations (Ding et al., 8 Feb 2026). Pruning at extreme sparsity or in highly entangled architectures (e.g., AlexNet on Tiny-ImageNet) can be challenging due to variance in scores (Jr et al., 2023, Adamczewski et al., 2024).
Ongoing and future work explores:
- More efficient Shapley estimators (FastSHAP, kernel regression).
- Extensions to multi-criterion or multi-modal pruning (neurons and data jointly).
- Deeper integration with dynamic or structured retraining.
- Adoption of related solution concepts (Banzhaf, core, nucleolus) or hybrid attribution, and further integration with quantized architectures (Ding et al., 8 Feb 2026).
SV-NUP remains a unifying and extensible approach for principled model pruning, offering significant efficiency gains without undue sacrifice of accuracy or interpretability.