FeatureSHAP: Shapley-based Feature Attribution
- FeatureSHAP is a method that extends classical Shapley attribution to grouped and modular inputs, enabling model-agnostic and interaction-aware explanations.
- It integrates advanced conditional and interventional value functions, block-wise interaction tests, and fast algorithms for tree ensembles and large language models.
- FeatureSHAP supports practical feature selection and unbiased feature removal by leveraging rigorous nonparametric tests and explicit L2 bounds on loss approximation.
FeatureSHAP generalizes Shapley-based feature attribution by enabling principled explanations at the level of features, feature blocks, or modular input segments. It encompasses the calculation of consistent, interpretable, and interaction-aware contributions for both local and global interpretability, model selection, and actionable feature selection. FeatureSHAP integrates advances in conditional and interventional value functions, block-wise interaction tests, fast computation for complex loss functions, and model-agnostic perturbation protocols for LLMs and tree ensembles (Lundberg et al., 2017, Xu et al., 2024, Vitale et al., 23 Dec 2025, Richman et al., 2023, Bhattacharjee et al., 29 Mar 2025, Madakkatel et al., 2024, Jiang et al., 2024, 2207.14490).
1. Foundations: Shapley Formalism and Model-Agnostic Attribution
FeatureSHAP builds on the classical Shapley value paradigm of cooperative game theory, extending its domain from singleton features to arbitrarily defined feature groupings and input partitions. For a function and feature set , the Shapley value for feature is defined as: where is a context-dependent value function measuring expected output or similarity when only features in are present (Lundberg et al., 2017, Vitale et al., 23 Dec 2025).
The uniqueness theorem establishes that, given the axioms of local accuracy (efficiency), missingness, and consistency, the Shapley-based attribution framework is the unique additive solution for feature contributions under both conditional and interventional value functions (Lundberg et al., 2017).
2. Block and Interaction-Aware Feature Attribution
Standard SHAP treats each feature independently, ignoring interaction effects, which can misrepresent model behavior in the presence of feature dependency or nonadditivity. FeatureSHAP advances block-level attribution by automatically partitioning the feature space into blocks of jointly interacting variables. The surrogate additive model takes the form: with the optimal partition chosen to maximize fidelity to subject to a penalization of explanation complexity: where controls the balance between representativeness and interpretability (Xu et al., 2024). Pairwise interaction is detected via Welch -tests on the statistic
enabling efficient construction of an undirected “interaction graph” and pruning the search space for viable partitions.
3. Fast Algorithms for Tree Ensembles and Quadratic Loss
TreeSHAP efficiently computes per-feature attributions in polynomial time for tree ensembles by exploiting the recursive structure and partitioning of decision trees (Lundberg et al., 2017). For global scores such as featurewise , FeatureSHAP applies FFT-based summation and combinatorial dimension reduction. For quadratic loss,
the Shapley value for feature with respect to decomposes as: with each term handled through polynomial and FFT-based expansion, yielding efficient per-tree and per-sample computation (Jiang et al., 2024).
4. Conditional and Interventional Approaches
In the conditional SHAP protocol, the value function is the conditional expectation
preserving dependence between feature components and allowing for faithful importance estimation under real-world covariate structures (Richman et al., 2023). In contrast, interventional SHAP uses product marginals and averages over all possible feature combinations, which may mask true dependencies unless care is taken with the data support (Bhattacharjee et al., 29 Mar 2025). FeatureSHAP mandates explicit column permutation to estimate interventional scores over the extended support, thereby enabling theoretically sound feature selection and safe discard guarantees.
5. Practical Feature Selection and Loss-Based Wrappers
FeatureSHAP underpins robust and unbiased feature selection algorithms, notably LLpowershap, which leverages Shapley values of the logistic loss to differentiate signal from noise and operationalizes feature selection via nonparametric statistical testing against injected random noise columns. LLpowershap’s protocol involves repeated model training, Interventional-TreeSHAP computation, and p-value calculation via the Mann–Whitney test, optionally controlled by power analysis for stability (Madakkatel et al., 2024).
Feature removal is grounded in the result that near-zero aggregate SHAP value (averaged over the product-marginal support) implies provably negligible impact on model accuracy, with explicit bounds on approximation error (Bhattacharjee et al., 29 Mar 2025).
6. Model-Agnostic Application: LLMs and Software Engineering Tasks
FeatureSHAP has been extended to explain outputs of LLMs by attributing the model's prediction to semantically meaningful input segments (prompt blocks, AST nodes) (Vitale et al., 23 Dec 2025). The algorithm modularly decomposes the input, systematically perturbs feature subsets, and evaluates attribution via task-specific similarity metrics (e.g., CodeBLEU or BERTScore). Monte Carlo sampling over feature coalitions and bias-corrected Shapley weights facilitate tractable computation in high-dimensional, tokenized inputs.
Empirical evaluation demonstrates FeatureSHAP’s effectiveness in assigning negligible importance to irrelevant features and providing higher fidelity than random or LLM-as-attributor baselines. Practitioner surveys confirm improved interpretability and decision-making in software engineering settings.
| Application Domain | FeatureSHAP Protocol | Key Guarantees / Properties |
|---|---|---|
| Tree ensembles | TreeSHAP, Q-SHAP, LLpowershap | Consistency, local accuracy, polynomial runtime, safe discard |
| LLMs & SE tasks | Block-level, perturbation, similarity comparison | Model-agnostic, high-fidelity explanations, human-aligned |
| General black-box | Block-wise, partition search, interaction graph | Succinct interaction-aware additive decompositions |
7. Limitations, Robustness, and Future Directions
FeatureSHAP advances both granularity and faithfulness in model interpretation but is subject to practical challenges: sensitivity to feature definition and partitioning, computational scaling with block size, and the assumptions underlying value functions and similarity metrics. The dependency structure of the data, the choice of conditional vs. interventional SHAP, and the nature of interactions directly impact robustness.
Ongoing research is focused on mechanistic interpretability (causal activation patching), dependency-aware attributions, personalized feature splitters for human-aligned explanations, and adaptation to code-to-code tasks and non-trivial output spaces (Vitale et al., 23 Dec 2025, Xu et al., 2024). A plausible implication is that further integration with mechanistic probing and hybrid interpretability protocols will expand FeatureSHAP’s utility across increasingly complex models and data modalities.