Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 80 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 33 tok/s Pro

GPT-5 High 25 tok/s Pro

GPT-4o 117 tok/s Pro

Kimi K2 176 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

Counterfactual Policy Mean Embedding

Updated 30 June 2025

Counterfactual Policy Mean Embedding (CPME) is a nonparametric framework that represents entire outcome distributions in a reproducing kernel Hilbert space for comprehensive off-policy evaluation.
It employs both plug-in and doubly robust estimators to accurately capture policy effects, achieving improved convergence rates and reliable hypothesis testing.
CPME supports practical applications across domains like healthcare, recommendation systems, and advertising by enabling efficient sample generation via kernel herding.

Counterfactual Policy Mean Embedding (CPME) is a nonparametric framework that provides a unified Hilbert space representation of counterfactual outcome distributions under arbitrary target policies, enabling comprehensive distributional off-policy evaluation. Rather than focusing exclusively on expectations (such as the average treatment effect), CPME captures the full outcome distribution induced by a policy, representing it in a reproducing kernel Hilbert space (RKHS) via a feature map associated with a characteristic kernel on the outcome space. This approach allows for rigorous analysis, estimation, and hypothesis testing concerning the impact of interventions or new policies across a spectrum of practical domains including recommendation, advertising, healthcare, and reinforcement learning.

1. Mathematical Formulation and Identification

CPME formalizes the embedding of the outcome distribution $v(\pi)$ , induced by a policy $\pi$ , as a kernel mean embedding in an RKHS. For observed context $x$ , action $a$ , and outcome $y$ , and a target policy $\pi(a|x)$ , the outcome distribution under $\pi$ is

$v(\pi) = \mathbb{E}_{\pi \times \mathbb{P}_X} [\mathbb{P}_{Y|X,A}]$

The CPME is defined as

$\chi(\pi) = \mathbb{E}_{P_\pi}[\varphi_y(Y(a))] \in \mathcal{H}_y$

where $\varphi_y$ is the feature map induced by a characteristic kernel $k_y$ on the outcome space. Under standard identification conditions (consistency, conditional exchangeability, positivity), the embedding can be expanded as

$\chi(\pi) = \mathbb{E}_{\pi \times \mathbb{P}_X}[\mu_{Y|A,X}(a, x)]$

with $\mu_{Y|A,X}(a, x) = \mathbb{E}[\varphi_y(Y) | A=a, X=x]$ the conditional mean embedding.

This operator-valued representation accommodates settings with discrete or continuous actions and permits analysis over complex, possibly structured outcome spaces (e.g., sequences, images, graphs).

2. Estimation Strategies: Plug-in and Doubly Robust CPME Estimators

Plug-in Estimation

The CPME can be written as the action of a conditional mean operator on the policy mean embedding: $\chi(\pi) = C_{Y|A,X} \mu_\pi$ where $C_{Y|A,X}$ is the conditional mean operator and $\mu_\pi = \mathbb{E}_{\pi \times \mathbb{P}_X}[\varphi_{A,X}(a,x)]$ is the mean embedding of the policy-context joint distribution.

The plug-in estimator proceeds by:

Estimating $C_{Y|A,X}$ via kernel ridge regression on observed triples $(x_i, a_i, y_i)$ .
Estimating $\mu_\pi$ as the empirical or importance-weighted mean over samples (depending on known logging policy).
Computing $\hat{\chi}_{\text{pi}}(\pi) = \hat{C}_{Y|A,X}\hat{\mu}_\pi$ .

The plug-in estimator achieves convergence rates up to $n^{-1/4}$ under standard RKHS regularity conditions.

Doubly Robust Estimation

To improve accuracy and robustness, a doubly robust (DR) estimator is derived via the efficient influence function (EIF) for CPME. For nuisance estimators $\hat{\mu}_{Y|A,X}$ and $\hat{e}_0(a|x)$ , the DR estimator is

$\hat{\chi}_{\text{dr}}(\pi) = \hat{\chi}_{\text{pi}}(\pi) + \frac{1}{n} \sum_{i=1}^n \left[ \frac{\pi(a_i|x_i)}{\hat{e}_0(a_i|x_i)} \left( \varphi_y(y_i) - \hat{\mu}_{Y|A,X}(a_i, x_i) \right) + \int \hat{\mu}_{Y|A,X}(a', x_i) \pi(d a'|x_i) - \hat{\chi}_{\text{pi}}(\pi) \right]$

This estimator is consistent provided either the propensity model or the outcome embedding is correctly specified. Under suitable conditions, it achieves a parametric $n^{-1/2}$ rate.

3. Hypothesis Testing: Doubly Robust Kernel Test Statistic

CPME enables principled two-sample testing for distributional equivalence or difference between the outcome distributions of two policies $\pi$ and $\pi'$ . Leveraging the difference of their efficient influence functions, the doubly robust kernel policy test statistic is defined as

$T_n = \frac{1}{\sqrt{m}} \sum_{i=1}^m f_n(z_i), \quad f_n(z_i) = \frac{1}{n-m} \sum_{j=m+1}^n \langle \hat{\psi}_{\pi,\pi'}(z_i), \hat{\psi}_{\pi,\pi'}(z_j) \rangle$

with sample splitting used between estimation and evaluation portions of the data.

Under the null hypothesis $v(\pi) = v(\pi')$ , $T_n$ converges in distribution to $\mathcal{N}(0, 1)$ . This provides analytic $p$ -values and confidence intervals. This test is computationally efficient (no resampling required) and valid under nonparametric conditions.

4. Sampling from the Estimated CPME

CPME supports sample generation from the estimated counterfactual distribution using deterministic kernel herding:

Start with $y_1 = \arg\max_y \hat{\chi}(\pi)(y)$ .
For $t \geq 2$ , set $y_t = \arg\max_y \left( \hat{\chi}(\pi)(y) - \frac{1}{t-1} \sum_{s=1}^{t-1} k_y(y_s, y) \right)$ .

Under the stated regularity, the empirical distribution over herded samples converges weakly to $v(\pi)$ . The herding procedure guarantees the maximum mean discrepancy (MMD) between herded samples and $v(\pi)$ decays at $\mathcal{O}(n^{-1/2} + r_e(n) + m^{-1/2})$ .

5. Empirical Properties and Practical Benefits

Comprehensive simulation studies in the CPME framework show:

Testing: The DR kernel policy test retains nominal Type I error rates and substantially higher power than plug-in or linear mean tests, especially under complex or shift scenarios (e.g., mixtures, higher-moment changes).
Sampling: Herded samples from DR-CPME estimates closely match the ground truth outcome distribution with respect to both MMD and Wasserstein distance, outperforming plug-in and standard baselines.
Policy evaluation: CPME estimators achieve lower mean squared errors than existing direct and inverse propensity estimators. The doubly robust version further improves bias and variance reduction, especially as data size increases or in nontrivial covariate shift settings.
Computational efficiency: The DR kernel test is several orders of magnitude faster than permutation-based alternatives.

These properties allow CPME to be integrated into practical off-policy evaluation and model validation pipelines, especially for large-scale or high-dimensional outcome spaces.

6. Summary Table: Plug-in vs. Doubly Robust CPME

Aspect	Plug-in Estimator	Doubly Robust (DR) Estimator
Assumptions	Outcome/CME model correct	Either outcome/CME or propensity model correct
Convergence	$n^{-1/4}$ (nonparametric optimal)	Up to $n^{-1/2}$ (parametric)
Test statistic	Permutation MMD test	Cross U-statistic, normal distribution for p-values
Sampling	Kernel herding from estimated embedding	Same, with improved convergence
Application	OPE, hypothesis testing, sampling	All plus bias/variance reduction

7. Extensions and Applications

CPME generalizes previous approaches to off-policy evaluation, distributional treatment effect analysis, and counterfactual inference:

It enables nonparametric, distributional OPE for both discrete and continuous action spaces.
Supports structured and high-dimensional outcomes.
Provides tools for nonparametric hypothesis testing, p-value computation, and confidence interval construction without resampling.
Facilitates sample generation from counterfactual distributions for downstream model development, simulation, or uncertainty quantification.

Applications span online platforms, recommender systems, healthcare evaluation, and algorithmic policy selection, with empirical evidence for superior accuracy, computational efficiency, and robustness to nuisance model misspecification.

References to Key Formulas

Plug-in estimator: $\hat{\chi}_{\text{pi}}(\pi) = \hat{C}_{Y|A,X} \hat{\mu}_\pi$
Efficient influence function: $\psi_\pi(y, a, x)$ (see above)
DR estimator: $\hat{\chi}_{\text{dr}}(\pi) = \hat{\chi}_{\text{pi}}(\pi) + \frac{1}{n} \sum_{i=1}^n \hat{\psi}_\pi(y_i, a_i, x_i)$
Kernel test statistic: $T_n$ as above
Herded sample: $y_t$ update as above

CPME, through its RKHS-based nonparametric representation, doubly robust estimation, and computationally tractable testing and sampling routines, provides a powerful and practical tool for modern distributional causal inference and off-policy evaluation tasks.

PDF Markdown Chat (Pro)

Follow Topic

Get notified by email when new papers are published related to Counterfactual Policy Mean Embedding (CPME).

Continue Learning

We haven't generated follow-up questions for this topic yet.

Generate Now