KernelSHAP-IQ: Advanced Shapley Explanations

Updated 23 March 2026

KernelSHAP-IQ is a refined algorithm suite that integrates game theory, weighted least-squares regression, and paired sampling to efficiently estimate Shapley values and interactions.
It provides unbiased, consistent estimators with variance reduction via deterministic weighting and antithetic sampling, reducing model evaluation costs significantly.
The approach scales to high-dimensional settings and extends to higher-order interactions, achieving robust empirical validation across diverse machine learning tasks.

KernelSHAP-IQ is a suite of advancements and theoretical refinements of the KernelSHAP algorithm for Shapley value and higher-order Shapley interaction estimation in black-box machine learning model explanation. Developed chiefly through the intersection of game theory, weighted least-squares regression, and sampling-based approximation, KernelSHAP-IQ encompasses unbiasedness, efficient variance reduction, higher-order interactions, deterministic weighting, and substantial algorithmic improvements for scalability, all with rigorous statistical guarantees and empirical validation.

1. Theoretical Foundations and KernelSHAP Formalism

KernelSHAP-IQ builds on the canonical formulation of KernelSHAP, which computes feature attributions by solving a weighted least-squares (WLS) regression problem over all possible feature coalitions. For a model-behavior function $\nu:2^D\to\mathbb{R}$ on feature set $D=\{1,\dots,d\}$ , the Shapley value $\phi_i$ is obtained as the unique solution to

$\phi = \arg\min_{\varphi\in\mathbb{R}^d} \mathbb{E}_{S \sim p}\big[(\nu(S) - \sum_{i\in S} \varphi_i)^2\big], \quad \sum_{i=1}^d \varphi_i = \nu(D).$

Sampling distribution $p(S)\propto \mu(|S|)$ with the canonical Shapley kernel

$\mu(t) = \frac{1}{d-1} \binom{d-2}{t-1}^{-1}, \quad t=1,\ldots,d-1,$

ensures that the WLS solution is precisely the Shapley value estimator in the population limit. KernelSHAP approximates this objective using Monte Carlo sampling, computing estimates efficiently in high-dimensional settings (Fumagalli et al., 2023, Fumagalli et al., 2024, Covert et al., 2020).

2. Unbiasedness, Consistency, and Variance Reduction

KernelSHAP-IQ provides a closed-form, unbiased, and consistent estimator for the Shapley value via a decomposition into deterministic and stochastic components. Utilizing a single sum over all subsets and leveraging the symmetry of the Shapley kernel, for feature $i$ the estimator is: $\widehat{I}(i) = c_1(i) + \frac{R}{K}\sum_{k=1}^K \nu_0(T_k) \Big[\mathbf{1}(i \in T_k) - \frac{|T_k|}{d}\Big],$ where $c_1(i) = \nu_0(D)/d$ , $R = 2H_{d-1}$ , and $T_k$ are sampled according to $p(T)\propto \mu(|T|)$ . This construction leads to provable unbiasedness ( $\mathbb{E}[\widehat{I}(i)] = I^{SV}(i)$ ), almost sure consistency ( $\widehat{I}(i) \to I^{SV}(i)$ as $K\to\infty$ ), and a Chebyshev tail bound on deviation probability. The equivalence of KernelSHAP-IQ to the unbiased KernelSHAP estimator is formalized by explicit inversion of the population covariance in the WLS system, but KernelSHAP-IQ achieves this in a single weighted sum per sample, avoiding the need to solve a linear system (Fumagalli et al., 2023, Covert et al., 2020).

To further accelerate convergence, paired (antithetic) sampling—drawing each coalition along with its complement—yields a strict reduction in estimator variance. This technique produces a lower-variance confidence ellipsoid and speeds up convergence in empirical settings by factors of three to nine across different tasks (Covert et al., 2020, Fumagalli et al., 26 Jan 2026).

3. Higher-Order Shapley Interactions and Polynomial Surrogates

KernelSHAP-IQ extends beyond singleton Shapley values to compute Shapley interaction indices (SII) for arbitrary cardinality interaction sets. For interaction order $k$ , the k-additive surrogate

$\widehat{\nu}_k(T) = \sum_{1\leq |S|\leq k, S\subseteq T} \Phi_k(S),$

aggregates SII terms up to order $k$ , with recursive formulation via Bernoulli-weighted sums. Feature interactions are thus encoded as solutions to iterated WLS problems over design matrices reflecting coalition memberships and kernel weights specified to satisfy linearity, symmetry, and dummy axioms.

PolySHAP interprets KernelSHAP-IQ as polynomial regression:

For order $k=2$ (pairwise interactions), the surrogate takes the form $h(z) = \sum_i \beta_i z_i + \sum_{i<j} \beta_{ij} z_i z_j$ , and the Shapley value for feature $i$ is given by $\phi_i = \beta_i + \frac{1}{2}\sum_{j\neq i} \beta_{ij}$ .
Empirically and algebraically, standard KernelSHAP with paired sampling yields exactly the same solution as degree-2 PolySHAP, so KernelSHAP-IQ can leverage paired sampling to consistently estimate both main effects and pairwise interactions without direct second-order regression (Fumagalli et al., 26 Jan 2026, Fumagalli et al., 2024).

Closed-form, efficient WLS solution for SII with order $k=2$ is established; for $k>2$ , empirical results confirm the extension's accuracy, though formal proof remains open for arbitrary $k$ (Fumagalli et al., 2024).

4. Deterministic and Improved Weighting Schemes

A further innovation in KernelSHAP-IQ replaces the stochastic sampling weights of traditional KernelSHAP with deterministic corrections (coupon-collector quantile or CEL-kernel), analytically reducing estimator variance: $w_S = \frac{p_S}{1 - (1 - 2p_S)^{E[L]/2}},$ where $E[L]$ is the expected number of draws to sample enough unique coalitions. Deterministic weights preserve unbiasedness while guaranteeing strictly lower unconditional variance than random weighting by the law of total variance. This weight correction is compatible with paired sampling and practically lowers required evaluation budgets by 20–50% for fixed MAE (Olsen et al., 2024).

5. Algorithmic Implementations and Scalability

KernelSHAP-IQ maintains computational feasibility in moderate to high dimensions by:

Linear runtime scaling in $K$ (samples) and minimal overhead in weight computation; batched and vectorized evaluation routines further exploit hardware parallelism.
Employing sampling-by-subset-size stratification and deterministic inclusion of “border” coalitions (all-zeros/all-ones) to stabilize the estimator.
Offering plug-in integration with existing KernelSHAP implementations, e.g., via subclassing KernelExplainer in Python with deterministic or improved weights (Olsen et al., 2024).
For $n\leq 30$ , higher-order interaction estimation remains feasible; for $d\sim 3000$ (e.g., CIFAR-10), optimizations such as bucket-Poisson sampling and fast weighting allow KernelSHAP-IQ to produce competitive attributions at scale (Chen et al., 5 Jun 2025).

Algorithmic steps for order- $k$ KernelSHAP-IQ:

Sample $b$ distinct, possibly paired, coalitions using kernel-proportional probabilities and record associated weights.
Build the WLS design matrix for desired interaction order and solve for the interaction coefficients iteratively.
Aggregate estimates as prescribed by Bernoulli-weighted transformation for $k$ -Shapley values.

6. Applications and Empirical Validation

KernelSHAP-IQ has demonstrated:

Superior mean squared error (MSE) and higher top- $K$ precision than permutation-sampling and non-WLS interaction baselines across natural language (DistilBERT), vision (ResNet18, ViT), tabular tasks, and synthetic games.
In practice, paired deterministic weighting schemes halve the number of model evaluations while maintaining a given error rate (Olsen et al., 2024, Fumagalli et al., 2023).
In time-series settings, KernelSHAP-IQ naturally extends to feature–time pairs, employing a vector autoregressive surrogate (“VARSHAP”) with lag-based attributions, closed-form solutions for linear models (AR, MA, ARMA, VARMAX), and event-detection by aggregating overlapping explanations (Villani et al., 2022).
For global and local optimization of anomaly detection pipelines, KernelSHAP-IQ-driven feature selection leads to marked gains in accuracy and data efficiency. In one documented case, alpha=0.90, $F_1=0.76$ , and a reduction of false positives by 28 pp were achieved by retraining a model on SHAP-IQ-selected features (Roshan et al., 2023).

7. Practical Guidance and Limitations

Best practices for KernelSHAP-IQ:

Set sample budget $b=500$ –2000 or $100\times d$ for stability; increase for higher-order interactions or smaller signal strengths.
Prefer $k=2$ (pairwise SII) for most applications; higher $k$ is meaningful only if $2k\leq d$ to maintain estimator stability (Fumagalli et al., 2024).
Always report empirical standard errors or confidence estimates, aggregating over multiple runs to quantify uncertainty.
For time-series, employ lagged surrogates or time-consistent subgame decomposition to enforce consistency across look-back windows.
In very high-dimensional regimes with limited budget, paired sampling with deterministic weighting offers the best balance of computational cost, variance, and scalability.

Known limitations:

At higher-order interactions or for small sample sizes, estimator noise can be considerable; variance estimates should always be consulted.
The extension to arbitrary interaction order $k>4$ and $d>30$ is limited by cubic scaling in design-matrix operations.
Theoretical guarantees for exact SII via WLS have thus far been proven for $k\leq 2$ ; $k>2$ remains a subject of ongoing research.