Conditional SHAP Explained

Updated 17 March 2026

Conditional SHAP is a feature attribution method that calculates local Shapley values using conditional expectations to respect feature dependencies.
It employs surrogate models like neural networks, trees, and dynamic programming to efficiently approximate contributions across high-dimensional inputs.
The approach delivers robust, statistically grounded feature importance insights and supports extensions for causal inference and drop-one testing.

Conditional SHAP quantifies the contribution of each feature to a model’s prediction by employing conditional expectations that respect the statistical dependence structure among the features. This formulation addresses key deficiencies of marginal (interventional) SHAP, particularly in settings with correlated or structured inputs. Advances in neural- and tree-based surrogates, Markovian tractability results, and statistical interpretability have significantly developed the computational and inferential machinery for conditional SHAP explanations.

1. Mathematical Formulation and Semantics

Conditional SHAP is formally defined via local Shapley values, where the value function for a subset of features $S$ is given by the conditional expectation: $v(S) = \mathbb{E}[f(X)\mid X_S=x_S]$ for a predictor $f:\mathbb{R}^d\to\mathbb{R}$ , random vector $X \sim \pi$ , and instance $x$ . The conditional Shapley value for feature $i$ is

$\phi_i = \sum_{S\subseteq F\setminus\{i\}} \frac{|S|!\,(d-|S|-1)!}{d!} \left[\, v(S\cup\{i\}) - v(S)\, \right]$

where $v(S)$ captures the prediction when only the features in $S$ are fixed, integrating out the remainder under the observed distribution of $X_{-S}$ conditional on $X_S=x_S$ (Richman et al., 2023, Jullum et al., 2 Apr 2025, Bénard et al., 2021).

This construction ensures that marginal contributions are computed in the context of the observed feature dependencies, yielding explanations that are robust to collinearity and higher-order dependencies. In contrast, the common "interventional" SHAP (marginal SHAP) replaces $v(S)$ with an unconditional expectation, which can be severely biased in the presence of correlated features (Richman et al., 2023).

2. Computational Methodologies for Estimating Conditional SHAP

The key computational challenge lies in efficiently estimating $v(S)$ for exponentially many $S$ while preserving feature dependencies. Several recent methodologies have been developed:

Surrogate Neural Network (Conditional Expectation Network): A single neural network is trained to predict $v(S)$ for arbitrary $S,x$ by masking input features according to $S$ and using a specially constructed mask vector. Training optimizes over all subset masks with loss terms ensuring correct calibration for both fully observed and fully masked inputs. Once trained, the network allows rapid batched computation of $v(S)$ across many coalitions for use in the canonical KernelSHAP linear system (Richman et al., 2023).
Surrogate Tree Models: A supervised tree is built on model predictions; conditional expectations are approximated within each leaf by fitting local generalized additive models (GAMs), and path probabilities under conditioning are computed using shallow random forests for efficient estimation. Subset selection with "thresholding" accelerates computation (Zhou et al., 2022).
Random Forest Projection (SHAFF): After fitting a random forest, each coalition's $v(S)$ is estimated via projective traversals, ignoring splits outside $S$ , and averaging over in-leaf training data. This approach yields consistent variance-based Shapley effects (global SHAP) and is computationally efficient, scaling quasi-linearly in sample size (Bénard et al., 2021).
KernelSHAP with Conditional Models (shapr): The shapr framework offers an extensive collection of generators and regressors for approximating conditional distributions, including empirical (Mahalanobis-weighted), Gaussian, Gaussian copula, conditional inference trees, VAE-based generative models, and regression surrogates. The same sampling and weighted linear system as standard KernelSHAP is used, guaranteeing flexibility and extensibility (Jullum et al., 2 Apr 2025).
Markovian Factorizations: For feature distributions with a first-order Markov property (chains, tree-structured DAGs), dynamic programming enables exact computation of conditional expectations in polynomial time. This yields tractable conditional SHAP values for weighted automata, disjoint DNFs, and decision trees under the Markovian assumption (Marzouk et al., 2024).

3. Statistical and Algorithmic Properties

Conditional SHAP admits a principled statistical interpretation: it formalizes feature importance as the average effect of each variable when integrating over all possible knowledge configurations, always conditioning on the observed values of the coalition subset. This property makes the conditional SHAP value uniquely aligned with the interpretation of true conditional variable importance in the presence of collinear and/or dependent features (Richman et al., 2023, Teneggi et al., 2022).

The algorithmic steps require (1) the ability to compute or approximate conditional expectations $v(S)$ for sampled coalitions, (2) solving the canonical weighted least squares system (KernelSHAP), and (3) mechanisms for convergence detection or early stopping in iterative schemes (as in shapr). For nontrivial distributions, all existing scalable approaches rely on either model-based surrogates, generative models, or specialized message-passing algorithms(Richman et al., 2023, Jullum et al., 2 Apr 2025, Zhou et al., 2022, Marzouk et al., 2024).

Notably, under square-integrability and assuming universal approximation of the conditional expectation operator, surrogate neural networks (and other sufficiently rich regressors) provide asymptotically consistent estimators for $v(S)$ as dataset and model size increases (Richman et al., 2023).

4. Software and Practical Implementations

Tables below illustrate principal features of major approaches, extracted from the described literature:

Software/Method	Conditional Expectation Estimation	Supported Models
Conditional Expectation Network	Masked-input neural network	Black-box, neural nets
Surrogate Model-Based Tree [MBT]	Single SLIM tree and local GAMs	Any fitted $f$
SHAFF	Projected random forest	Any forest-compatible
shapr/shaprpy	Empirical, Gaussian, ctree, VAEAC, regression surrogates	Any black-box

shapr/shaprpy provide an extensive, extensible ecosystem for conditional SHAP estimation in both R and Python. Key features include parallel batch evaluation, kernel-based subset sampling, iterative estimation with convergence detection, group-wise and causal/asymmetric SHAP, and specialized modules for time series forecasting (Jullum et al., 2 Apr 2025).

5. Statistical Guarantees and Interpretability

Recent results supply rigorous statistical connections between SHAP values and conditional independence testing:

SHAP-XRT demonstrates that marginal contributions in conditional SHAP bound the expected $p$ -values for local conditional randomization tests (CRTs), and the full Shapley value yields a valid upper bound for the expected global-conditional null $p$ -value (Teneggi et al., 2022). In practice, this means that large conditional SHAP values imply the existence of some conditioning set for which the respective feature passes a conditional independence test with high power. This statistical interpretation grounds the use of SHAP values for feature selection and scientific inference, as opposed to purely descriptive variable importance.

The main assumptions for these guarantees include the availability of correct conditional samplers $\mathcal{L}(X_{-S} \mid X_S = x_S)$ and the determinism of the predictor $f$ . In practice, generative models (VAEAC, ctree surrogates) or kernel-weighted empirical sampling are used to approximate these distributions.

6. Tractability, Complexity, and Limitations

The computation of conditional SHAP values is generally NP-hard due to the exponential number of subsets and the complexity of evaluating conditional expectations under unrestricted dependencies. However, under the Markovian assumption (first-order chains, tree-structured graphical models), SHAP values can be computed exactly in polynomial time for certain model classes (decision trees, weighted automata, disjoint DNFs), by leveraging dynamic programming and algebraic automaton representations (Marzouk et al., 2024). For general feature distributions or arbitrary Bayesian networks with high treewidth, the problem remains computationally infeasible without additional structure or approximations.

All scalable algorithms in high dimensions rely on subset sampling (kernel-based, importance or antithetic) and surrogate modeling, with accuracy-runtime trade-offs governed by user-chosen hyperparameters (subset count, tree/NN depth, sample sizes) and the representational power of the conditional estimator (Richman et al., 2023, Zhou et al., 2022, Jullum et al., 2 Apr 2025, Bénard et al., 2021).

Conditional SHAP machinery supports several extensions:

drop1 and ANOVA-style importances: Conditional SHAP frameworks naturally yield drop-single-feature and sequential (ordered) decompositions that generalize their GLM counterparts. These preserve all dependencies, enabling accurate quantification of feature and feature-group contributions even in complex regression settings (Richman et al., 2023).
Conditional-dependence PDPs: Rather than plotting partial dependence under broken independence assumptions, conditional SHAP-based machinery produces PDPs that respect the true feature dependency structure, thereby avoiding averaging predictions over unrealistic input combinations (Richman et al., 2023).
Causal and asymmetric SHAP: When causal graphs and confounding information are available, conditional SHAP can be generalized to compute causal and asymmetric Shapley values, restricting allowable coalitions and conditioning on interventional distributions ("do-calculus"), as in shapr (Jullum et al., 2 Apr 2025).

A plausible implication is that, with these extensions, conditional SHAP now unifies local and global variable importance, conditional independence testing, and causal interpretability within a single framework, provided the necessary conditional estimation and sampling routines are available.

References:

"Conditional expectation network for SHAP" (Richman et al., 2023)
"Shapley Computations Using Surrogate Model-Based Trees" (Zhou et al., 2022)
"On the Tractability of SHAP Explanations under Markovian Distributions" (Marzouk et al., 2024)
"SHAFF: Fast and consistent SHApley eFfect estimates via random Forests" (Bénard et al., 2021)
"shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python" (Jullum et al., 2 Apr 2025)
"SHAP-XRT: The Shapley Value Meets Conditional Independence Testing" (Teneggi et al., 2022)