Papers
Topics
Authors
Recent
2000 character limit reached

SHAP-Guided Feature Pruning

Updated 29 December 2025
  • SHAP-guided feature pruning is defined as using Shapley value attributions to assess each model component's true marginal contribution, replacing unreliable magnitude heuristics.
  • It leverages algorithms like TreeSHAP and permutation sampling for various architectures (trees, CNNs, KANs), achieving robust compression and improved model performance.
  • Empirical results demonstrate that SHAP-guided pruning maintains predictive accuracy while significantly reducing parameters in applications from medical imaging to IoT intrusion detection.

SHAP-guided feature pruning is a set of principled methods that leverage Shapley value attributions—originating in cooperative game theory—to rank and remove features, filters, or neurons from a machine learning model, enabling compression, interpretability, and enhanced generalization. By quantifying each unit's true marginal contribution to predictive performance, SHAP-guided pruning avoids the pitfalls of magnitude-based heuristics, is robust to covariate and input shifts, and offers rigorous guarantees in applications ranging from tabular classifiers to Kolmogorov–Arnold networks and deep neural architectures.

1. Mathematical Foundations of SHAP-Guided Pruning

The theoretical basis of SHAP-guided pruning is the Shapley value, which uniquely satisfies efficiency, symmetry, and additivity axioms for attributing total model output to individual “players” (features, neurons, or channels). For a set of features F={1,...,n}F = \{1, ..., n\} and a prediction function ff, the Shapley value of feature ii at instance xx is

ϕi(x)=SF{i}S!(nS1)!n![fS{i}(xS{i})fS(xS)],\phi_i(x) = \sum_{S \subseteq F \setminus \{i\}} \frac{|S|!(n - |S| - 1)!}{n!} [f_{S \cup \{i\}}(x_{S \cup \{i\}}) - f_S(x_S)],

where fS(xS)f_S(x_S) evaluates the model with only features SS active. For multilayer networks or convolutional filters, the “player set” generalizes to neurons or channels, and the value function is either the expected output or utility, e.g., top-1 accuracy or negative loss, over coalitions SS (Fan et al., 2 Oct 2025, Adamczewski et al., 2024).

Exact computation requires 2n2^n evaluations, becoming intractable for n>20n > 20. Tree-based models admit the TreeSHAP algorithm for polynomial-time computation (Oliveira et al., 22 Oct 2025), while neural models rely on Monte Carlo or kernel regressions to approximate the sum (Adamczewski et al., 2024, Fan et al., 2 Oct 2025). In the context of Kolmogorov–Arnold networks, the Shapley value for each neuron is defined over the subset of active neurons and is estimated via permutation sampling with antithetic variance reduction (Fan et al., 2 Oct 2025).

2. Algorithms and Practical Implementations

SHAP-guided pruning admits several algorithmic realizations, summarized in the table below:

Model Class SHAP Computation Pruning Rule Reference
Tabular (trees) TreeSHAP Rank by mean ϕi|\phi_i| (Oliveira et al., 22 Oct 2025, Kraev et al., 2024)
CNN/MLP Permutation MC / KernelLS Rank by ϕi\phi_i (approximate) (Adamczewski et al., 2024)
KANs Permutation MC Prune lowest ϕi\phi_i, bottom-up (Fan et al., 2 Oct 2025)
LSTM (CatNet) SHAP derivative Mirror-stat FDR control on MjM_j (Han et al., 2024)

In tabular models, global feature importances are obtained by averaging ϕi(x)|\phi_i(x)| across the dataset. Recursive feature elimination can then be guided by these importances: sequentially drop the least important feature, retraining at each step, and monitor evaluation RMSE or balanced accuracy to select the optimal pruned set (Uhl et al., 11 Sep 2025, Kraev et al., 2024).

For neural networks, SHAP-guided pruning can be implemented layer-wise: after an initial estimation of importance for each filter/neuron via permutation sampling, prune those with the lowest scores and optionally fine-tune (Adamczewski et al., 2024, Fan et al., 2 Oct 2025). Exact Shapley is used in small layers (n<8n < 8); otherwise, approximations become necessary.

For LSTMs and other sequence models, the CatNet framework computes the derivative of the SHAP value with respect to the input at each time point and controls the false discovery rate via a Gaussian mirror statistic. Features with mirror scores exceeding a data-adaptive threshold are retained (Han et al., 2024).

3. Theoretical Properties and Invariance

A central theoretical advantage of SHAP-guided pruning is robustness to input shifts and parameterization. For KANs, the shift-invariance property is formalized:

Proposition (Shift Invariance):

If KAN~(x)=KAN(x+c)\widetilde{\mathrm{KAN}}(x) = \mathrm{KAN}(x + c) for constant cc, then the Shapley value of any neuron ii is unchanged when computed under KAN~\widetilde{\mathrm{KAN}} or the original network. This follows because all coalition outputs shift by the same constant, leaving marginal contributions intact (Fan et al., 2 Oct 2025).

Magnitude-based criteria—such as pruning by the L1L_1 norm of outgoing weights—are sensitive to the coordinate frame and can dramatically reorder importance when inputs are shifted. SHAP-guided scores, which depend only on marginal predictive impact, remain consistent under such transformations.

This invariance extends to correlated inputs—if properly handled via group-wise SHAP subtraction as in REFRESH—yielding substantially more reliable estimates of the effect of pruning groups of features (Sharma et al., 2024).

4. Empirical Performance and Compression Results

SHAP-guided pruning approaches routinely achieve state-of-the-art accuracy/compression trade-offs across a variety of domains:

  • Neural Network Compression: On VGG-16/CIFAR-10, SHAP-guided channel pruning yields, e.g., 7.91% top-1 error at 43M FLOPs (versus 5.36% on the full 313.7M FLOPs) using permutation approximations (Adamczewski et al., 2024). Jaccard index analysis shows Shapley permutation and regression approximations track the oracle ranking within 85-95% efficiency.
  • KANs: ShapKAN’s neuron rankings are empirically stable across strong input distribution shifts, while magnitude-based methods can mis-rank top neurons under domain change. Under ablation, ShapKAN-pruned models outperform vanilla KAN by 10–50% lower RMSE and exhibit higher symbolic recovery fidelity (Fan et al., 2 Oct 2025).
  • Tabular Models and Medical Diagnostics: SHAP pruning on urinary tract disease classification leads to BACC improvements when reducing feature count—e.g., LightGBM, pruned to 18 out of 57 features, achieves BACC 97.03% vs. 93.59% on the full set (Oliveira et al., 22 Oct 2025). SHAP-Select identifies compact sets with optimal accuracy and F1, as shown on credit fraud detection (Kraev et al., 2024).
  • IoT Intrusion Detection: SHAP-pruned and Kronecker-compressed models maintain macro-F1 \geq 0.986 with a 250×250\times reduction in parameters, achieving millisecond-level inference latency for edge deployment (Benaddi et al., 22 Dec 2025).
  • MRI and Imaging Protocols: In diffusion MRI, TreeSHAP-based RFE identifies an 8-feature protocol with only a 10–15% nRMSE increase relative to the 15-feature original, maintaining anatomical fidelity and reproducibility, and outperforming Fisher information and heuristic subsetting (Uhl et al., 11 Sep 2025).

5. Limitations, Computational Costs, and Practical Recommendations

The main computational bottleneck in SHAP-guided pruning is the estimation of Shapley values, especially in wide layers or high-dimensional feature sets. For KANs, permutation sampling with antithetic coupling (m=64m=64–$128$ permutations) balances stability and runtime, with 1–2 seconds per layer typical on commodity GPUs (Fan et al., 2 Oct 2025). In CNNs, permutation or regression-based approximations enable scalability to n100n\sim100 filters (Adamczewski et al., 2024).

For tabular models, TreeSHAP complexity is O(Td2)O(Td^2), and repeated regressions in Shap-Select can grow costly for d100d\gg 100 (Kraev et al., 2024). In groupwise reselection (REFRESH), grouping by correlation is crucial; weak or overlapping correlations degrade the efficacy of SHAP-group subtraction approximations (Sharma et al., 2024).

Recommended practices include:

  • Use ratio-based pruning (e.g., removing neurons whose layer share is below 3–5%) for simplicity and ease of tuning (Fan et al., 2 Oct 2025).
  • Select permutation/regression sample sizes to match rank-stability curves and monitor for convergence in importance sorting (Adamczewski et al., 2024).
  • For tabular models with moderate dd, SHAP-select offers substantial gains in interpretability and minimal retraining overhead (Kraev et al., 2024).
  • For fairness or robustness–sensitive applications, use REFRESH to efficiently search Pareto-optimal feature subsets with respect to primary and secondary metrics, retraining only for plausibly optimal candidates (Sharma et al., 2024).
  • In time-series and LSTM models, CatNet achieves FDR control for pruned features via mirror statistics and SHAP derivatives, generalizable to RNN, CNN, and Transformer architectures (Han et al., 2024).

6. Extensions, Applications, and Research Directions

SHAP-guided pruning frameworks have been adapted to a variety of domains and objectives:

  • Structured Compression: In IoT IDS, SHAP-pruned feature subsets coupled with Kronecker-factorized layers and knowledge distillation yield tiny, performant models (Benaddi et al., 22 Dec 2025).
  • Secondary Objectives: REFRESH enables efficient feature “reselection,” optimizing secondary characteristics (fairness, robustness) post hoc without exhaustive retraining (Sharma et al., 2024).
  • Imaging Protocol Design: MRI sequence optimization uses SHAP–RFE to drastically reduce scan length while preserving all downstream parameter fidelity, outperforming information-theoretic and heuristic baselines (Uhl et al., 11 Sep 2025).
  • Symbolic Regression: In KANs, Shapley-based pruning preserves networks’ capability for symbolic function recovery under distribution shift—a property unattainable by magntiude-based pruning (Fan et al., 2 Oct 2025).

Challenges for future work include scaling SHAP approximation schemes to ultra-high-dimensional settings (d>104d>10^4), integrating conditional SHAP/groupwise dependencies beyond correlation for more complex feature interactions, and unifying SHAP-based attributions across architectural paradigms (transformers, GNNs).

7. Summary Table: Empirical Highlights and Use Cases

Domain Method Best Computational Regime Empirical Finding Reference
Neural Compression Perm/Reg SHAP S10NS\sim10N 7.9% error at <15%<15\% FLOPs (Adamczewski et al., 2024)
KANs Perm SHAP m=64m=64/nn Stable ranks, >>50% RMSE drop (Fan et al., 2 Oct 2025)
Tabular TreeSHAP+RFE d<100d<100 Maintains/improves BACC (Oliveira et al., 22 Oct 2025)
IoT IDS TreeSHAP+Kronecker K=32K=32 features 250×250\times smaller, >0.986>0.986 F1 (Benaddi et al., 22 Dec 2025)
LSTM/Seq CatNet p100p\sim100, MC SHAP FDR control, robust selection (Han et al., 2024)
Imaging/MRI TreeSHAP+RFE N15N\sim15, TreeSHAP $10$–15%15\% error increase for 2×2\times speedup (Uhl et al., 11 Sep 2025)

The unifying theme in SHAP-guided feature pruning is the replacement of ad hoc or coordinate-dependent criteria with theoretically grounded, robust, and empirically validated attribution, enabling deployable, efficient, and interpretable models in a wide range of scientific and engineering contexts.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SHAP-Guided Feature Pruning.