Papers
Topics
Authors
Recent
Search
2000 character limit reached

Counterfactual Policy Mean Embeddings (CPME)

Updated 21 February 2026
  • Counterfactual Policy Mean Embeddings (CPME) are nonparametric kernel-based estimators that encode entire counterfactual outcome distributions in RKHS.
  • They unify plug-in, doubly robust, and Bayesian methodologies to enable rigorous off-policy evaluation and distributional causal inference.
  • Recent advances extend CPME to conditional and heterogeneous effect estimation, offering scalable and nearly minimax optimal convergence rates.

Counterfactual Policy Mean Embeddings (CPME) represent a class of nonparametric, kernel-based estimators for entire counterfactual outcome distributions arising in off-policy evaluation (OPE), treatment effect estimation, and distributional causal inference. CPME encodes distributions induced by hypothetical interventions or alternative decision policies into reproducing kernel Hilbert spaces (RKHS), enabling rigorous nonparametric analysis of distributional properties, hypothesis testing, and uncertainty quantification. The CPME paradigm unifies plug-in, Bayesian, and doubly robust methodologies and supports efficient estimation, testing, and inference even in high-dimensional and complex outcome spaces.

1. Formal Definition and Mathematical Framework

Let XX denote covariates, AA an action or treatment (possibly randomized by a policy π\pi), and YY the observed outcome. Fix a characteristic kernel kYk_\mathcal{Y} with associated RKHS HY\mathcal{H}_\mathcal{Y} and feature map φY:ykY(,y)\varphi_\mathcal{Y}: y \mapsto k_\mathcal{Y}(\cdot, y). The counterfactual policy mean embedding χ(π)\chi(\pi) for a policy π\pi is defined as:

χ(π):=EXPX,Aπ(X),YP(X,A)[φY(Y)]\chi(\pi) := \mathbb{E}_{X \sim P_X,\,A \sim \pi(\cdot|X),\,Y \sim P(\cdot|X,A)}\left[\varphi_\mathcal{Y}(Y)\right]

χ(π)\chi(\pi) encodes the entire counterfactual outcome law v(π)v(\pi) under policy π\pi as a mean embedding in HY\mathcal{H}_\mathcal{Y}:

χ(π)=kY(,y)dv(π)(y)\chi(\pi) = \int k_\mathcal{Y}(\cdot, y) \, dv(\pi)(y)

When kYk_\mathcal{Y} is characteristic (e.g., Gaussian), this embedding uniquely determines the law v(π)v(\pi). Furthermore, in the conditional setting, the conditional counterfactual mean embedding is given by

μYaX=x:=E[φY(Ya)X=x]HY\mu_{Y^a|X=x} := \mathbb{E}\left[ \varphi_\mathcal{Y}(Y^a) \mid X = x \right] \in \mathcal{H}_\mathcal{Y}

where YaY^a is the potential outcome under treatment aa (Anancharoenkij et al., 4 Feb 2026).

2. Estimation Methodologies

CPME estimation leverages observed data (Xi,Ai,Yi)(X_i, A_i, Y_i), the logging policy π0\pi_0, and a target policy π\pi. Key estimation strategies include:

2.1. Plug-in Estimation

  • Estimation of the mean embedding decouples into two components: (i) estimation of the conditional embedding operator CYA,XC_{Y|A,X}, and (ii) estimation of the kernel policy embedding μπ\mu_\pi:
    • CYA,XC_{Y|A,X} is estimated via regularized kernel ridge regression on HA,XHY\mathcal{H}_{A,X} \to \mathcal{H}_{\mathcal{Y}}.
    • μπ\mu_\pi is estimated either by direct averaging or importance weighting.
    • The plug-in CPME estimate is χ^πpi=C^YA,Xμ^π\hat{\chi}_\pi^{pi} = \hat{C}_{Y|A,X} \hat{\mu}_\pi (Zenati et al., 3 Jun 2025).

2.2. Doubly Robust Estimation

  • The doubly robust (DR) estimator corrects CPME for biases in both the conditional outcome model and the propensity model, employing the efficient influence function (EIF) ψπ(z)\psi_\pi(z): χ^πdr=χ^πpi+1ni=1n{π(aixi)π0(aixi)[φY(yi)H^YA,X(ai,xi)]+H^YA,X(a,xi)π(daxi)χ^πpi}\hat{\chi}_\pi^{dr} = \hat{\chi}_\pi^{pi} + \frac{1}{n}\sum_{i=1}^n \left\{ \frac{\pi(a_i|x_i)}{\pi_0(a_i|x_i)} \left[\varphi_\mathcal{Y}(y_i) - \hat{H}_{Y|A,X}(a_i,x_i)\right] + \int \hat{H}_{Y|A,X}(a,x_i)\pi(da|x_i) - \hat{\chi}_\pi^{pi} \right\}
  • H^YA,X\hat{H}_{Y|A,X} is the learned conditional mean embedding.
  • χ^πdr\hat{\chi}_\pi^{dr} remains consistent if either the propensity or the outcome embedding is correctly specified (Zenati et al., 3 Jun 2025, Anancharoenkij et al., 4 Feb 2026).

2.3. Bayesian CPME

  • CPME admits a Bayesian formulation placing a Gaussian process prior on the conditional mean embedding F(x,y):=μYX=x(y)F(x, y) := \mu_{Y|X=x}(y) over X×Y\mathcal{X} \times \mathcal{Y}.
  • Posterior inference yields a mean and covariance in Hky\mathcal{H}_{k_y}, propagating epistemic uncertainty from both the outcome and downstream functionals ff (Martinez-Taboada et al., 2022).

3. Practical Algorithms and Implementation

CPME estimators can be systematically constructed with the following components:

Estimator First Stage Second Stage Complexity
Plug-in Conditional embedding Policy embedding O(n3)O(n^3)
Doubly robust Conditional+propensity EIF/one-step correction O(n3)O(n^3)
Ridge Regression Nuisance (prop, outcome) Kernel ridge in RKHS O(n3)O(n^3)
Deep Feature Nuisance (prop, outcome) Neural net + linear map O(M3+nM)O(M^3 + nM)
Neural-Kernel Nuisance (prop, outcome) Neural net over grid O(nM2)O(nM^2)

4. Theoretical Guarantees and Statistical Properties

CPME frameworks provide provable rates and robustness properties:

  • The plug-in estimator achieves RKHS norm error Op(n(c1)/(2(c+1/b)))O_p\big(n^{-(c-1)/(2(c+1/b))}\big) with source condition cc and eigenvalue decay bb; c=3c=3 yields rate n1/4n^{-1/4} (Zenati et al., 3 Jun 2025).
  • The doubly-robust estimator achieves Op(n1/2)O_p(n^{-1/2}) given nuisance models converge at n1/4n^{-1/4} (“semiparametric efficiency”) (Zenati et al., 3 Jun 2025).
  • Ridge Regression CPME achieves minimax n2r/(2r+dv)n^{-2r/(2r+d_v)} rate for rr-smooth densities in dvd_v dimensions (Anancharoenkij et al., 4 Feb 2026).
  • All doubly robust approaches ensure consistent estimation if either the outcome or the propensity model is correctly specified, reflected in the meta-estimator rate: Eμ^Y1VμY1V2(estimation error)+min{E(π^π)2,Eμ^0μ02}E\left\| \widehat{\mu}_{Y^1|V} - \mu_{Y^1|V} \right\|^2 \lesssim \text{(estimation error)} + \min\left\{ E(\hat\pi-\pi)^2, E\|\hat\mu_0 - \mu_0\|^2 \right\} (Anancharoenkij et al., 4 Feb 2026).

5. Hypothesis Testing, Sampling, and Inference

CPME uniquely enables nonparametric distributional testing and sampling:

  • The “Doubly-Robust Kernel Policy Test” (DR-KPT) provides a cross-U statistic based on the EIF, facilitating two-sample tests H0:χ(π)=χ(π)H_0: \chi(\pi) = \chi(\pi'). Under H0H_0 and proper nuisance convergence, the statistic is asymptotically standard normal (Zenati et al., 3 Jun 2025).
  • Sampling from CPME employs kernel herding on χ^π\hat{\chi}_\pi; samples from the estimated embedding converge to the target law in MMD at rate O(χ^χ+m1/2)O(\|\hat{\chi} - \chi\| + m^{-1/2}) (Zenati et al., 3 Jun 2025).
  • Recovery of functionals (mean, quantile, density) is achieved by the RKHS inner product f,χ(π)\langle f, \chi(\pi) \rangle or by explicit density formulas when kYk_\mathcal{Y} integrates to one (Anancharoenkij et al., 4 Feb 2026, Martinez-Taboada et al., 2022).
  • Bayesian CPME propagates epistemic uncertainty to functionals via the joint GP posterior (Martinez-Taboada et al., 2022).

6. Empirical Performance and Guidelines

Extensive simulations and real-data experiments support CPME efficacy:

7. Extensions and Recent Developments

Recent advances have extended CPME to heterogeneous effect estimation via “Conditional Counterfactual Mean Embeddings” (CCME), which estimates the conditional distributional law YaX=xY^a|X=x or YπX=xY^{\pi}|X=x for continuous policies and treatment (Anancharoenkij et al., 4 Feb 2026). The Bayesian CPME framework accommodates multiple treatment effects and data fusion scenarios, supporting uncertainty quantification for sequential or hierarchically dependent outcomes (Martinez-Taboada et al., 2022). CPME supports complex outcomes (images, graphs, sequences) via appropriate kernels (Muandet et al., 2018) and provides doubly robust kernel tests for distributional causal hypotheses (Zenati et al., 3 Jun 2025). Methods are scalable and admit nearly minimax optimal convergence rates under regularity.

Key literature includes "Bayesian Counterfactual Mean Embeddings and Off-Policy Evaluation" (Martinez-Taboada et al., 2022), "Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings" (Zenati et al., 3 Jun 2025), "Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates" (Anancharoenkij et al., 4 Feb 2026), and "Counterfactual Mean Embeddings" (Muandet et al., 2018).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Counterfactual Policy Mean Embeddings (CPME).