Counterfactual Policy Mean Embeddings (CPME)

Updated 21 February 2026

Counterfactual Policy Mean Embeddings (CPME) are nonparametric kernel-based estimators that encode entire counterfactual outcome distributions in RKHS.
They unify plug-in, doubly robust, and Bayesian methodologies to enable rigorous off-policy evaluation and distributional causal inference.
Recent advances extend CPME to conditional and heterogeneous effect estimation, offering scalable and nearly minimax optimal convergence rates.

Counterfactual Policy Mean Embeddings (CPME) represent a class of nonparametric, kernel-based estimators for entire counterfactual outcome distributions arising in off-policy evaluation (OPE), treatment effect estimation, and distributional causal inference. CPME encodes distributions induced by hypothetical interventions or alternative decision policies into reproducing kernel Hilbert spaces (RKHS), enabling rigorous nonparametric analysis of distributional properties, hypothesis testing, and uncertainty quantification. The CPME paradigm unifies plug-in, Bayesian, and doubly robust methodologies and supports efficient estimation, testing, and inference even in high-dimensional and complex outcome spaces.

1. Formal Definition and Mathematical Framework

Let $X$ denote covariates, $A$ an action or treatment (possibly randomized by a policy $\pi$ ), and $Y$ the observed outcome. Fix a characteristic kernel $k_\mathcal{Y}$ with associated RKHS $\mathcal{H}_\mathcal{Y}$ and feature map $\varphi_\mathcal{Y}: y \mapsto k_\mathcal{Y}(\cdot, y)$ . The counterfactual policy mean embedding $\chi(\pi)$ for a policy $\pi$ is defined as:

$\chi(\pi) := \mathbb{E}_{X \sim P_X,\,A \sim \pi(\cdot|X),\,Y \sim P(\cdot|X,A)}\left[\varphi_\mathcal{Y}(Y)\right]$

$\chi(\pi)$ encodes the entire counterfactual outcome law $v(\pi)$ under policy $\pi$ as a mean embedding in $\mathcal{H}_\mathcal{Y}$ :

$\chi(\pi) = \int k_\mathcal{Y}(\cdot, y) \, dv(\pi)(y)$

When $k_\mathcal{Y}$ is characteristic (e.g., Gaussian), this embedding uniquely determines the law $v(\pi)$ . Furthermore, in the conditional setting, the conditional counterfactual mean embedding is given by

$\mu_{Y^a|X=x} := \mathbb{E}\left[ \varphi_\mathcal{Y}(Y^a) \mid X = x \right] \in \mathcal{H}_\mathcal{Y}$

where $Y^a$ is the potential outcome under treatment $a$ (Anancharoenkij et al., 4 Feb 2026).

2. Estimation Methodologies

CPME estimation leverages observed data $(X_i, A_i, Y_i)$ , the logging policy $\pi_0$ , and a target policy $\pi$ . Key estimation strategies include:

2.1. Plug-in Estimation

Estimation of the mean embedding decouples into two components: (i) estimation of the conditional embedding operator $C_{Y|A,X}$ $C_{Y ∣ A, X}$ , and (ii) estimation of the kernel policy embedding $\mu_\pi$ $μ_{π}$ :
- $C_{Y|A,X}$ is estimated via regularized kernel ridge regression on $\mathcal{H}_{A,X} \to \mathcal{H}_{\mathcal{Y}}$ .
- $\mu_\pi$ is estimated either by direct averaging or importance weighting.
- The plug-in CPME estimate is $\hat{\chi}_\pi^{pi} = \hat{C}_{Y|A,X} \hat{\mu}_\pi$ (Zenati et al., 3 Jun 2025).

2.2. Doubly Robust Estimation

The doubly robust (DR) estimator corrects CPME for biases in both the conditional outcome model and the propensity model, employing the efficient influence function (EIF) $\psi_\pi(z)$ : $\hat{\chi}_\pi^{dr} = \hat{\chi}_\pi^{pi} + \frac{1}{n}\sum_{i=1}^n \left\{ \frac{\pi(a_i|x_i)}{\pi_0(a_i|x_i)} \left[\varphi_\mathcal{Y}(y_i) - \hat{H}_{Y|A,X}(a_i,x_i)\right] + \int \hat{H}_{Y|A,X}(a,x_i)\pi(da|x_i) - \hat{\chi}_\pi^{pi} \right\}$
$\hat{H}_{Y|A,X}$ is the learned conditional mean embedding.
$\hat{\chi}_\pi^{dr}$ remains consistent if either the propensity or the outcome embedding is correctly specified (Zenati et al., 3 Jun 2025, Anancharoenkij et al., 4 Feb 2026).

2.3. Bayesian CPME

CPME admits a Bayesian formulation placing a Gaussian process prior on the conditional mean embedding $F(x, y) := \mu_{Y|X=x}(y)$ over $\mathcal{X} \times \mathcal{Y}$ .
Posterior inference yields a mean and covariance in $\mathcal{H}_{k_y}$ , propagating epistemic uncertainty from both the outcome and downstream functionals $f$ (Martinez-Taboada et al., 2022).

3. Practical Algorithms and Implementation

CPME estimators can be systematically constructed with the following components:

Estimator	First Stage	Second Stage	Complexity
Plug-in	Conditional embedding	Policy embedding	$O(n^3)$
Doubly robust	Conditional+propensity	EIF/one-step correction	$O(n^3)$
Ridge Regression	Nuisance (prop, outcome)	Kernel ridge in RKHS	$O(n^3)$
Deep Feature	Nuisance (prop, outcome)	Neural net + linear map	$O(M^3 + nM)$
Neural-Kernel	Nuisance (prop, outcome)	Neural net over grid	$O(nM^2)$

Regularization hyperparameters are selected by cross-validation or split-sample procedures.
For large-scale data, Nyström and inducing-point kernel approximations or neural network parameterizations are preferred (Zenati et al., 3 Jun 2025, Anancharoenkij et al., 4 Feb 2026, Martinez-Taboada et al., 2022).

4. Theoretical Guarantees and Statistical Properties

CPME frameworks provide provable rates and robustness properties:

The plug-in estimator achieves RKHS norm error $O_p\big(n^{-(c-1)/(2(c+1/b))}\big)$ with source condition $c$ and eigenvalue decay $b$ ; $c=3$ yields rate $n^{-1/4}$ (Zenati et al., 3 Jun 2025).
The doubly-robust estimator achieves $O_p(n^{-1/2})$ given nuisance models converge at $n^{-1/4}$ (“semiparametric efficiency”) (Zenati et al., 3 Jun 2025).
Ridge Regression CPME achieves minimax $n^{-2r/(2r+d_v)}$ rate for $r$ -smooth densities in $d_v$ dimensions (Anancharoenkij et al., 4 Feb 2026).
All doubly robust approaches ensure consistent estimation if either the outcome or the propensity model is correctly specified, reflected in the meta-estimator rate: $E\left\| \widehat{\mu}_{Y^1|V} - \mu_{Y^1|V} \right\|^2 \lesssim \text{(estimation error)} + \min\left\{ E(\hat\pi-\pi)^2, E\|\hat\mu_0 - \mu_0\|^2 \right\}$ (Anancharoenkij et al., 4 Feb 2026).

5. Hypothesis Testing, Sampling, and Inference

CPME uniquely enables nonparametric distributional testing and sampling:

The “Doubly-Robust Kernel Policy Test” (DR-KPT) provides a cross-U statistic based on the EIF, facilitating two-sample tests $H_0: \chi(\pi) = \chi(\pi')$ . Under $H_0$ and proper nuisance convergence, the statistic is asymptotically standard normal (Zenati et al., 3 Jun 2025).
Sampling from CPME employs kernel herding on $\hat{\chi}_\pi$ ; samples from the estimated embedding converge to the target law in MMD at rate $O(\|\hat{\chi} - \chi\| + m^{-1/2})$ (Zenati et al., 3 Jun 2025).
Recovery of functionals (mean, quantile, density) is achieved by the RKHS inner product $\langle f, \chi(\pi) \rangle$ or by explicit density formulas when $k_\mathcal{Y}$ integrates to one (Anancharoenkij et al., 4 Feb 2026, Martinez-Taboada et al., 2022).
Bayesian CPME propagates epistemic uncertainty to functionals via the joint GP posterior (Martinez-Taboada et al., 2022).

6. Empirical Performance and Guidelines

Extensive simulations and real-data experiments support CPME efficacy:

In policy evaluation and distributional effect estimation (recommender simulators, synthetic selection bias, MSLR-WEB30K), CPME and especially DR-CPME outperform plug-in and classical estimators (DM, IPS, DR-NN) in mean-squared error and test power, exhibiting robustness to high-dimensional covariate shift (Zenati et al., 3 Jun 2025, Muandet et al., 2018).
Deep Feature and Neural-Kernel implementations yield fast, scalable estimators for large $n$ (Anancharoenkij et al., 4 Feb 2026).
Bayesian CPME yields calibrated confidence intervals under both outcome and regression uncertainty (Martinez-Taboada et al., 2022).
Recommended practical pipeline:
- Small $n$ ( $<$ 5000): Ridge regression/plug-in estimators.
- Medium $n$ ( $10^4$ – $10^5$ ): Deep Feature with moderate neural width.
- Large $n$ , high $d_v$ : Neural-Kernel estimator.
- Always enforce sample splitting for nuisance fitting to guarantee double robustness (Anancharoenkij et al., 4 Feb 2026, Martinez-Taboada et al., 2022).

7. Extensions and Recent Developments

Recent advances have extended CPME to heterogeneous effect estimation via “Conditional Counterfactual Mean Embeddings” (CCME), which estimates the conditional distributional law $Y^a|X=x$ or $Y^{\pi}|X=x$ for continuous policies and treatment (Anancharoenkij et al., 4 Feb 2026). The Bayesian CPME framework accommodates multiple treatment effects and data fusion scenarios, supporting uncertainty quantification for sequential or hierarchically dependent outcomes (Martinez-Taboada et al., 2022). CPME supports complex outcomes (images, graphs, sequences) via appropriate kernels (Muandet et al., 2018) and provides doubly robust kernel tests for distributional causal hypotheses (Zenati et al., 3 Jun 2025). Methods are scalable and admit nearly minimax optimal convergence rates under regularity.

Key literature includes "Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation" (Martinez-Taboada et al., 2022), "Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings" (Zenati et al., 3 Jun 2025), "Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates" (Anancharoenkij et al., 4 Feb 2026), and "Counterfactual Mean Embeddings" (Muandet et al., 2018).

Markdown Report Issue Upgrade to Chat

References (4)

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates (2026)

Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings (2025)

Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation (2022)

Counterfactual Mean Embeddings (2018)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Counterfactual Policy Mean Embeddings (CPME).

Counterfactual Policy Mean Embeddings (CPME)

1. Formal Definition and Mathematical Framework

2. Estimation Methodologies

2.1. Plug-in Estimation

2.2. Doubly Robust Estimation

2.3. Bayesian CPME

3. Practical Algorithms and Implementation

4. Theoretical Guarantees and Statistical Properties

5. Hypothesis Testing, Sampling, and Inference

6. Empirical Performance and Guidelines

7. Extensions and Recent Developments

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Counterfactual Policy Mean Embeddings (CPME)

1. Formal Definition and Mathematical Framework

2. Estimation Methodologies

2.1. Plug-in Estimation

2.2. Doubly Robust Estimation

2.3. Bayesian CPME

3. Practical Algorithms and Implementation

4. Theoretical Guarantees and Statistical Properties

5. Hypothesis Testing, Sampling, and Inference

6. Empirical Performance and Guidelines

7. Extensions and Recent Developments

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research