Papers
Topics
Authors
Recent
Search
2000 character limit reached

KernelSHAP-IQ: Advanced Shapley Explanations

Updated 23 March 2026
  • KernelSHAP-IQ is a refined algorithm suite that integrates game theory, weighted least-squares regression, and paired sampling to efficiently estimate Shapley values and interactions.
  • It provides unbiased, consistent estimators with variance reduction via deterministic weighting and antithetic sampling, reducing model evaluation costs significantly.
  • The approach scales to high-dimensional settings and extends to higher-order interactions, achieving robust empirical validation across diverse machine learning tasks.

KernelSHAP-IQ is a suite of advancements and theoretical refinements of the KernelSHAP algorithm for Shapley value and higher-order Shapley interaction estimation in black-box machine learning model explanation. Developed chiefly through the intersection of game theory, weighted least-squares regression, and sampling-based approximation, KernelSHAP-IQ encompasses unbiasedness, efficient variance reduction, higher-order interactions, deterministic weighting, and substantial algorithmic improvements for scalability, all with rigorous statistical guarantees and empirical validation.

1. Theoretical Foundations and KernelSHAP Formalism

KernelSHAP-IQ builds on the canonical formulation of KernelSHAP, which computes feature attributions by solving a weighted least-squares (WLS) regression problem over all possible feature coalitions. For a model-behavior function ν:2DR\nu:2^D\to\mathbb{R} on feature set D={1,,d}D=\{1,\dots,d\}, the Shapley value ϕi\phi_i is obtained as the unique solution to

ϕ=argminφRdESp[(ν(S)iSφi)2],i=1dφi=ν(D).\phi = \arg\min_{\varphi\in\mathbb{R}^d} \mathbb{E}_{S \sim p}\big[(\nu(S) - \sum_{i\in S} \varphi_i)^2\big], \quad \sum_{i=1}^d \varphi_i = \nu(D).

Sampling distribution p(S)μ(S)p(S)\propto \mu(|S|) with the canonical Shapley kernel

μ(t)=1d1(d2t1)1,t=1,,d1,\mu(t) = \frac{1}{d-1} \binom{d-2}{t-1}^{-1}, \quad t=1,\ldots,d-1,

ensures that the WLS solution is precisely the Shapley value estimator in the population limit. KernelSHAP approximates this objective using Monte Carlo sampling, computing estimates efficiently in high-dimensional settings (Fumagalli et al., 2023, Fumagalli et al., 2024, Covert et al., 2020).

2. Unbiasedness, Consistency, and Variance Reduction

KernelSHAP-IQ provides a closed-form, unbiased, and consistent estimator for the Shapley value via a decomposition into deterministic and stochastic components. Utilizing a single sum over all subsets and leveraging the symmetry of the Shapley kernel, for feature ii the estimator is: I^(i)=c1(i)+RKk=1Kν0(Tk)[1(iTk)Tkd],\widehat{I}(i) = c_1(i) + \frac{R}{K}\sum_{k=1}^K \nu_0(T_k) \Big[\mathbf{1}(i \in T_k) - \frac{|T_k|}{d}\Big], where c1(i)=ν0(D)/dc_1(i) = \nu_0(D)/d, R=2Hd1R = 2H_{d-1}, and TkT_k are sampled according to p(T)μ(T)p(T)\propto \mu(|T|). This construction leads to provable unbiasedness (E[I^(i)]=ISV(i)\mathbb{E}[\widehat{I}(i)] = I^{SV}(i)), almost sure consistency (I^(i)ISV(i)\widehat{I}(i) \to I^{SV}(i) as KK\to\infty), and a Chebyshev tail bound on deviation probability. The equivalence of KernelSHAP-IQ to the unbiased KernelSHAP estimator is formalized by explicit inversion of the population covariance in the WLS system, but KernelSHAP-IQ achieves this in a single weighted sum per sample, avoiding the need to solve a linear system (Fumagalli et al., 2023, Covert et al., 2020).

To further accelerate convergence, paired (antithetic) sampling—drawing each coalition along with its complement—yields a strict reduction in estimator variance. This technique produces a lower-variance confidence ellipsoid and speeds up convergence in empirical settings by factors of three to nine across different tasks (Covert et al., 2020, Fumagalli et al., 26 Jan 2026).

3. Higher-Order Shapley Interactions and Polynomial Surrogates

KernelSHAP-IQ extends beyond singleton Shapley values to compute Shapley interaction indices (SII) for arbitrary cardinality interaction sets. For interaction order kk, the k-additive surrogate

ν^k(T)=1Sk,STΦk(S),\widehat{\nu}_k(T) = \sum_{1\leq |S|\leq k, S\subseteq T} \Phi_k(S),

aggregates SII terms up to order kk, with recursive formulation via Bernoulli-weighted sums. Feature interactions are thus encoded as solutions to iterated WLS problems over design matrices reflecting coalition memberships and kernel weights specified to satisfy linearity, symmetry, and dummy axioms.

PolySHAP interprets KernelSHAP-IQ as polynomial regression:

  • For order k=2k=2 (pairwise interactions), the surrogate takes the form h(z)=iβizi+i<jβijzizjh(z) = \sum_i \beta_i z_i + \sum_{i<j} \beta_{ij} z_i z_j, and the Shapley value for feature ii is given by ϕi=βi+12jiβij\phi_i = \beta_i + \frac{1}{2}\sum_{j\neq i} \beta_{ij}.
  • Empirically and algebraically, standard KernelSHAP with paired sampling yields exactly the same solution as degree-2 PolySHAP, so KernelSHAP-IQ can leverage paired sampling to consistently estimate both main effects and pairwise interactions without direct second-order regression (Fumagalli et al., 26 Jan 2026, Fumagalli et al., 2024).

Closed-form, efficient WLS solution for SII with order k=2k=2 is established; for k>2k>2, empirical results confirm the extension's accuracy, though formal proof remains open for arbitrary kk (Fumagalli et al., 2024).

4. Deterministic and Improved Weighting Schemes

A further innovation in KernelSHAP-IQ replaces the stochastic sampling weights of traditional KernelSHAP with deterministic corrections (coupon-collector quantile or CEL-kernel), analytically reducing estimator variance: wS=pS1(12pS)E[L]/2,w_S = \frac{p_S}{1 - (1 - 2p_S)^{E[L]/2}}, where E[L]E[L] is the expected number of draws to sample enough unique coalitions. Deterministic weights preserve unbiasedness while guaranteeing strictly lower unconditional variance than random weighting by the law of total variance. This weight correction is compatible with paired sampling and practically lowers required evaluation budgets by 20–50% for fixed MAE (Olsen et al., 2024).

5. Algorithmic Implementations and Scalability

KernelSHAP-IQ maintains computational feasibility in moderate to high dimensions by:

  • Linear runtime scaling in KK (samples) and minimal overhead in weight computation; batched and vectorized evaluation routines further exploit hardware parallelism.
  • Employing sampling-by-subset-size stratification and deterministic inclusion of “border” coalitions (all-zeros/all-ones) to stabilize the estimator.
  • Offering plug-in integration with existing KernelSHAP implementations, e.g., via subclassing KernelExplainer in Python with deterministic or improved weights (Olsen et al., 2024).
  • For n30n\leq 30, higher-order interaction estimation remains feasible; for d3000d\sim 3000 (e.g., CIFAR-10), optimizations such as bucket-Poisson sampling and fast weighting allow KernelSHAP-IQ to produce competitive attributions at scale (Chen et al., 5 Jun 2025).

Algorithmic steps for order-kk KernelSHAP-IQ:

  1. Sample bb distinct, possibly paired, coalitions using kernel-proportional probabilities and record associated weights.
  2. Build the WLS design matrix for desired interaction order and solve for the interaction coefficients iteratively.
  3. Aggregate estimates as prescribed by Bernoulli-weighted transformation for kk-Shapley values.

6. Applications and Empirical Validation

KernelSHAP-IQ has demonstrated:

  • Superior mean squared error (MSE) and higher top-KK precision than permutation-sampling and non-WLS interaction baselines across natural language (DistilBERT), vision (ResNet18, ViT), tabular tasks, and synthetic games.
  • In practice, paired deterministic weighting schemes halve the number of model evaluations while maintaining a given error rate (Olsen et al., 2024, Fumagalli et al., 2023).
  • In time-series settings, KernelSHAP-IQ naturally extends to feature–time pairs, employing a vector autoregressive surrogate (“VARSHAP”) with lag-based attributions, closed-form solutions for linear models (AR, MA, ARMA, VARMAX), and event-detection by aggregating overlapping explanations (Villani et al., 2022).
  • For global and local optimization of anomaly detection pipelines, KernelSHAP-IQ-driven feature selection leads to marked gains in accuracy and data efficiency. In one documented case, alpha=0.90, F1=0.76F_1=0.76, and a reduction of false positives by 28 pp were achieved by retraining a model on SHAP-IQ-selected features (Roshan et al., 2023).

7. Practical Guidance and Limitations

Best practices for KernelSHAP-IQ:

  • Set sample budget b=500b=500–2000 or 100×d100\times d for stability; increase for higher-order interactions or smaller signal strengths.
  • Prefer k=2k=2 (pairwise SII) for most applications; higher kk is meaningful only if 2kd2k\leq d to maintain estimator stability (Fumagalli et al., 2024).
  • Always report empirical standard errors or confidence estimates, aggregating over multiple runs to quantify uncertainty.
  • For time-series, employ lagged surrogates or time-consistent subgame decomposition to enforce consistency across look-back windows.
  • In very high-dimensional regimes with limited budget, paired sampling with deterministic weighting offers the best balance of computational cost, variance, and scalability.

Known limitations:

  • At higher-order interactions or for small sample sizes, estimator noise can be considerable; variance estimates should always be consulted.
  • The extension to arbitrary interaction order k>4k>4 and d>30d>30 is limited by cubic scaling in design-matrix operations.
  • Theoretical guarantees for exact SII via WLS have thus far been proven for k2k\leq 2; k>2k>2 remains a subject of ongoing research.

Key References: (Fumagalli et al., 2023, Fumagalli et al., 2024, Chen et al., 5 Jun 2025, Covert et al., 2020, Olsen et al., 2024, Villani et al., 2022, Roshan et al., 2023, Fumagalli et al., 26 Jan 2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to KernelSHAP-IQ.