Counterfactual Policy Mean Embedding
- Counterfactual Policy Mean Embedding (CPME) is a nonparametric framework that represents entire outcome distributions in a reproducing kernel Hilbert space for comprehensive off-policy evaluation.
- It employs both plug-in and doubly robust estimators to accurately capture policy effects, achieving improved convergence rates and reliable hypothesis testing.
- CPME supports practical applications across domains like healthcare, recommendation systems, and advertising by enabling efficient sample generation via kernel herding.
Counterfactual Policy Mean Embedding (CPME) is a nonparametric framework that provides a unified Hilbert space representation of counterfactual outcome distributions under arbitrary target policies, enabling comprehensive distributional off-policy evaluation. Rather than focusing exclusively on expectations (such as the average treatment effect), CPME captures the full outcome distribution induced by a policy, representing it in a reproducing kernel Hilbert space (RKHS) via a feature map associated with a characteristic kernel on the outcome space. This approach allows for rigorous analysis, estimation, and hypothesis testing concerning the impact of interventions or new policies across a spectrum of practical domains including recommendation, advertising, healthcare, and reinforcement learning.
1. Mathematical Formulation and Identification
CPME formalizes the embedding of the outcome distribution , induced by a policy , as a kernel mean embedding in an RKHS. For observed context , action , and outcome , and a target policy , the outcome distribution under is
The CPME is defined as
where is the feature map induced by a characteristic kernel on the outcome space. Under standard identification conditions (consistency, conditional exchangeability, positivity), the embedding can be expanded as
with the conditional mean embedding.
This operator-valued representation accommodates settings with discrete or continuous actions and permits analysis over complex, possibly structured outcome spaces (e.g., sequences, images, graphs).
2. Estimation Strategies: Plug-in and Doubly Robust CPME Estimators
Plug-in Estimation
The CPME can be written as the action of a conditional mean operator on the policy mean embedding: where is the conditional mean operator and is the mean embedding of the policy-context joint distribution.
The plug-in estimator proceeds by:
- Estimating via kernel ridge regression on observed triples .
- Estimating as the empirical or importance-weighted mean over samples (depending on known logging policy).
- Computing .
The plug-in estimator achieves convergence rates up to under standard RKHS regularity conditions.
Doubly Robust Estimation
To improve accuracy and robustness, a doubly robust (DR) estimator is derived via the efficient influence function (EIF) for CPME. For nuisance estimators and , the DR estimator is
This estimator is consistent provided either the propensity model or the outcome embedding is correctly specified. Under suitable conditions, it achieves a parametric rate.
3. Hypothesis Testing: Doubly Robust Kernel Test Statistic
CPME enables principled two-sample testing for distributional equivalence or difference between the outcome distributions of two policies and . Leveraging the difference of their efficient influence functions, the doubly robust kernel policy test statistic is defined as
with sample splitting used between estimation and evaluation portions of the data.
Under the null hypothesis , converges in distribution to . This provides analytic -values and confidence intervals. This test is computationally efficient (no resampling required) and valid under nonparametric conditions.
4. Sampling from the Estimated CPME
CPME supports sample generation from the estimated counterfactual distribution using deterministic kernel herding:
- Start with .
- For , set .
Under the stated regularity, the empirical distribution over herded samples converges weakly to . The herding procedure guarantees the maximum mean discrepancy (MMD) between herded samples and decays at .
5. Empirical Properties and Practical Benefits
Comprehensive simulation studies in the CPME framework show:
- Testing: The DR kernel policy test retains nominal Type I error rates and substantially higher power than plug-in or linear mean tests, especially under complex or shift scenarios (e.g., mixtures, higher-moment changes).
- Sampling: Herded samples from DR-CPME estimates closely match the ground truth outcome distribution with respect to both MMD and Wasserstein distance, outperforming plug-in and standard baselines.
- Policy evaluation: CPME estimators achieve lower mean squared errors than existing direct and inverse propensity estimators. The doubly robust version further improves bias and variance reduction, especially as data size increases or in nontrivial covariate shift settings.
- Computational efficiency: The DR kernel test is several orders of magnitude faster than permutation-based alternatives.
These properties allow CPME to be integrated into practical off-policy evaluation and model validation pipelines, especially for large-scale or high-dimensional outcome spaces.
6. Summary Table: Plug-in vs. Doubly Robust CPME
Aspect | Plug-in Estimator | Doubly Robust (DR) Estimator |
---|---|---|
Assumptions | Outcome/CME model correct | Either outcome/CME or propensity model correct |
Convergence | (nonparametric optimal) | Up to (parametric) |
Test statistic | Permutation MMD test | Cross U-statistic, normal distribution for p-values |
Sampling | Kernel herding from estimated embedding | Same, with improved convergence |
Application | OPE, hypothesis testing, sampling | All plus bias/variance reduction |
7. Extensions and Applications
CPME generalizes previous approaches to off-policy evaluation, distributional treatment effect analysis, and counterfactual inference:
- It enables nonparametric, distributional OPE for both discrete and continuous action spaces.
- Supports structured and high-dimensional outcomes.
- Provides tools for nonparametric hypothesis testing, p-value computation, and confidence interval construction without resampling.
- Facilitates sample generation from counterfactual distributions for downstream model development, simulation, or uncertainty quantification.
Applications span online platforms, recommender systems, healthcare evaluation, and algorithmic policy selection, with empirical evidence for superior accuracy, computational efficiency, and robustness to nuisance model misspecification.
References to Key Formulas
- Plug-in estimator:
- Efficient influence function: (see above)
- DR estimator:
- Kernel test statistic: as above
- Herded sample: update as above
CPME, through its RKHS-based nonparametric representation, doubly robust estimation, and computationally tractable testing and sampling routines, provides a powerful and practical tool for modern distributional causal inference and off-policy evaluation tasks.