FeatureSHAP: Shapley-based Feature Attribution

Updated 30 December 2025

FeatureSHAP is a method that extends classical Shapley attribution to grouped and modular inputs, enabling model-agnostic and interaction-aware explanations.
It integrates advanced conditional and interventional value functions, block-wise interaction tests, and fast algorithms for tree ensembles and large language models.
FeatureSHAP supports practical feature selection and unbiased feature removal by leveraging rigorous nonparametric tests and explicit L2 bounds on loss approximation.

FeatureSHAP generalizes Shapley-based feature attribution by enabling principled explanations at the level of features, feature blocks, or modular input segments. It encompasses the calculation of consistent, interpretable, and interaction-aware contributions for both local and global interpretability, model selection, and actionable feature selection. FeatureSHAP integrates advances in conditional and interventional value functions, block-wise interaction tests, fast computation for complex loss functions, and model-agnostic perturbation protocols for LLMs and tree ensembles (Lundberg et al., 2017, Xu et al., 2024, Vitale et al., 23 Dec 2025, Richman et al., 2023, Bhattacharjee et al., 29 Mar 2025, Madakkatel et al., 2024, Jiang et al., 2024, 2207.14490).

1. Foundations: Shapley Formalism and Model-Agnostic Attribution

FeatureSHAP builds on the classical Shapley value paradigm of cooperative game theory, extending its domain from singleton features to arbitrarily defined feature groupings and input partitions. For a function $f:\mathbb{R}^d\to\mathbb{R}$ and feature set $F = \{f_1, \ldots, f_m\}$ , the Shapley value for feature $f_i$ is defined as: $\phi_i = \sum_{S \subseteq F \setminus \{f_i\}} \frac{|S|!\,(|F|-|S|-1)!}{|F|!}\;[v(S \cup \{f_i\}) - v(S)],$ where $v(S)$ is a context-dependent value function measuring expected output or similarity when only features in $S$ are present (Lundberg et al., 2017, Vitale et al., 23 Dec 2025).

The uniqueness theorem establishes that, given the axioms of local accuracy (efficiency), missingness, and consistency, the Shapley-based attribution framework is the unique additive solution for feature contributions under both conditional and interventional value functions (Lundberg et al., 2017).

2. Block and Interaction-Aware Feature Attribution

Standard SHAP treats each feature independently, ignoring interaction effects, which can misrepresent model behavior in the presence of feature dependency or nonadditivity. FeatureSHAP advances block-level attribution by automatically partitioning the feature space into blocks $B_k$ of jointly interacting variables. The surrogate additive model takes the form: $\hat f(x) = \sum_{B \in \mathcal{P}} v(B),$ with the optimal partition $\mathcal{P}$ chosen to maximize fidelity to $f(x)$ subject to a penalization of explanation complexity: $\mathcal{O}(\mathcal{P}) = -\left(f(x) - \sum_{B \in \mathcal{P}} v(B)\right)^2 - \lambda \sum_{B \in \mathcal{P}} \frac{|B|(|B|-1)}{2},$ where $\lambda$ controls the balance between representativeness and interpretability (Xu et al., 2024). Pairwise interaction is detected via Welch $t$ -tests on the statistic

$I(i, j, S) = v(S\cup\{i\}) + v(S\cup\{j\}) - v(S\cup\{i, j\}) - v(S),$

enabling efficient construction of an undirected “interaction graph” and pruning the search space for viable partitions.

3. Fast Algorithms for Tree Ensembles and Quadratic Loss

TreeSHAP efficiently computes per-feature attributions in polynomial time for tree ensembles by exploiting the recursive structure and partitioning of decision trees (Lundberg et al., 2017). For global scores such as featurewise $R^2$ , FeatureSHAP applies FFT-based summation and combinatorial dimension reduction. For quadratic loss,

$Q_F = \sum_{i=1}^n (y_i - \hat m_F(x_i))^2,$

the Shapley value for feature $j$ with respect to $R^2$ decomposes as: $\phi_{R^2,j} = -\frac{1}{p Q_\emptyset} \sum_{S} \frac{|S|!(p-|S|-1)!}{p!}(Q_{S\cup\{j\}} - Q_S),$ with each $Q_{F}$ term handled through polynomial and FFT-based expansion, yielding efficient per-tree and per-sample computation (Jiang et al., 2024).

4. Conditional and Interventional Approaches

In the conditional SHAP protocol, the value function is the conditional expectation

$\mu_S(x_S) = E[f(X)|X_S = x_S],$

preserving dependence between feature components and allowing for faithful importance estimation under real-world covariate structures (Richman et al., 2023). In contrast, interventional SHAP uses product marginals and averages over all possible feature combinations, which may mask true dependencies unless care is taken with the data support (Bhattacharjee et al., 29 Mar 2025). FeatureSHAP mandates explicit column permutation to estimate interventional scores over the extended support, thereby enabling theoretically sound feature selection and safe discard guarantees.

5. Practical Feature Selection and Loss-Based Wrappers

FeatureSHAP underpins robust and unbiased feature selection algorithms, notably LLpowershap, which leverages Shapley values of the logistic loss to differentiate signal from noise and operationalizes feature selection via nonparametric statistical testing against injected random noise columns. LLpowershap’s protocol involves repeated model training, Interventional-TreeSHAP computation, and p-value calculation via the Mann–Whitney $U$ test, optionally controlled by power analysis for stability (Madakkatel et al., 2024).

Feature removal is grounded in the result that near-zero aggregate SHAP value (averaged over the product-marginal support) implies provably negligible impact on model accuracy, with explicit $L^2$ bounds on approximation error (Bhattacharjee et al., 29 Mar 2025).

6. Model-Agnostic Application: LLMs and Software Engineering Tasks

FeatureSHAP has been extended to explain outputs of LLMs by attributing the model's prediction to semantically meaningful input segments (prompt blocks, AST nodes) (Vitale et al., 23 Dec 2025). The algorithm modularly decomposes the input, systematically perturbs feature subsets, and evaluates attribution via task-specific similarity metrics (e.g., CodeBLEU or BERTScore). Monte Carlo sampling over feature coalitions and bias-corrected Shapley weights facilitate tractable computation in high-dimensional, tokenized inputs.

Empirical evaluation demonstrates FeatureSHAP’s effectiveness in assigning negligible importance to irrelevant features and providing higher fidelity than random or LLM-as-attributor baselines. Practitioner surveys confirm improved interpretability and decision-making in software engineering settings.

Application Domain	FeatureSHAP Protocol	Key Guarantees / Properties
Tree ensembles	TreeSHAP, Q-SHAP, LLpowershap	Consistency, local accuracy, polynomial runtime, safe discard
LLMs & SE tasks	Block-level, perturbation, similarity comparison	Model-agnostic, high-fidelity explanations, human-aligned
General black-box	Block-wise, partition search, interaction graph	Succinct interaction-aware additive decompositions

7. Limitations, Robustness, and Future Directions

FeatureSHAP advances both granularity and faithfulness in model interpretation but is subject to practical challenges: sensitivity to feature definition and partitioning, computational scaling with block size, and the assumptions underlying value functions and similarity metrics. The dependency structure of the data, the choice of conditional vs. interventional SHAP, and the nature of interactions directly impact robustness.

Ongoing research is focused on mechanistic interpretability (causal activation patching), dependency-aware attributions, personalized feature splitters for human-aligned explanations, and adaptation to code-to-code tasks and non-trivial output spaces (Vitale et al., 23 Dec 2025, Xu et al., 2024). A plausible implication is that further integration with mechanistic probing and hybrid interpretability protocols will expand FeatureSHAP’s utility across increasingly complex models and data modalities.