Pseudo-outcome Imputation in Counterfactual Regression

Updated 28 December 2025

The paper introduces a framework (PIPCFR) that integrates post-treatment variables into pseudo-outcome imputation for more accurate individual treatment effect estimation.
It establishes a novel theoretical risk bound that balances variance reduction and bias control, outperforming prior pseudo-outcome and matching-based methods.
Empirical evaluations across simulations and real-world datasets demonstrate significant performance gains and robustness under complex post-treatment conditions.

Pseudo-outcome Imputation with Post-treatment Variables for Counterfactual Regression (PIPCFR) is an algorithmic framework for individual treatment effect (ITE) estimation in observational studies that formally integrates post-treatment variables into pseudo-outcome construction. The core innovation is to improve counterfactual outcome prediction by representing and leveraging information in post-treatment variables while actively controlling the bias introduced by their post-treatment status. The approach is rigorously characterized by a novel theoretical risk bound and demonstrates empirical superiority over prior pseudo-outcome and matching-based methods in a variety of simulation and real-world settings (Lin et al., 21 Dec 2025).

1. Problem Setting and Motivation

In the standard Neyman–Rubin potential outcomes framework, each observational unit is described by pre-treatment covariates $x\in\mathbb{R}^d$ , a binary treatment indicator $t\in\{0,1\}$ , post-treatment variables $s\in\mathbb{R}^p$ (observed only under the assigned treatment), and an outcome $y=y(t)\in\mathbb{R}$ . The objective is to estimate the ITE $\tau^*(x) = \mathbb{E}[Y(1)-Y(0)|x]$ for each $x$ , but the fundamental challenge is that only one of the two potential outcomes is observed per unit.

Traditional approaches address this via imputation—using models trained on observed data to predict missing counterfactuals (“pseudo-outcomes”). However, most prior work overlooks the role of post-treatment variables, which may contain supplemental information about outcome-relevant stochasticity. The omission of such variables can lead to increased variance in counterfactual predictions. PIPCFR introduces a pseudo-outcome construction mechanism that conditions on both $x$ and learned representations of $s$ , with techniques to mitigate spurious biases from using post-treatment data (Lin et al., 21 Dec 2025).

2. Theoretical Formulation and Risk Bound

PIPCFR defines a pseudo-outcome constructor $q(x, t, \varphi)$ , where $\varphi = \psi(s)$ is a learned representation of post-treatment features. For each instance, $q_t(x, \varphi)$ serves as a pseudo-label for the unobserved counterfactual. The ITE error is

$\varepsilon_{\mathrm{ITE}} = \mathbb{E}_x \left[ (f(x,1)-f(x,0) - \tau^*(x))^2 \right]$

which cannot be directly minimized due to missing counterfactuals.

PIPCFR establishes the following risk upper bound under standard assumptions (overlap, SUTVA, unconfoundedness):

$\varepsilon_{\mathrm{ITE}} \leq \varepsilon_{\mathrm{PIP}} + \mathbb{E}_x \left[ B \cdot \mathrm{IPM}_G(p(s|x,t=0), p(s|x,t=1)) \right] + (2\delta+1)\tilde{\varepsilon}_{\mathrm{CF}} + 2\sum_{t=0}^1 p(t) \sqrt{\epsilon_F^t Q_t}$

where:

$\varepsilon_{\mathrm{PIP}}$ is a PIP (pseudo-outcome imputation proxy) loss capturing student-pseudo-outcome and teacher-factual fit,
$\mathrm{IPM}_G$ quantifies the distance (e.g., via MMD or KL) between representations of $s$ under each treatment conditional on $x$ (penalizing post-treatment bias),
$\tilde{\varepsilon}_{\mathrm{CF}}$ is the teacher’s counterfactual error,
$Q_t$ is the squared error between $q_{1-t}(x, \varphi)$ and $\mathbb{E}[Y(1-t)|x]$ ,
$\epsilon_F^t$ is the factual regression error for treatment $t$ .

Incorporating $s$ enables variance reduction in counterfactual estimates, but also risks bias due to treatment-induced shifts in the distribution of $s$ . The IPM/KL term controls this bias by enforcing $\varphi \perp t|x$ , trading variance reduction against bias mitigation (Lin et al., 21 Dec 2025).

3. Model Architecture and Algorithmic Design

PIPCFR is implemented as an end-to-end trainable architecture comprising three main modules:

Post-treatment Representation Learner (PRL) ( $\psi_\eta$ ): An MLP mapping $s$ to $\varphi \in \mathbb{R}^{d'}$ .
Pseudo Counterfactual Constructor (PCC) ( $q(x, t, \varphi)$ ): A TARNet-style network producing $q_t(x, \varphi)$ , serving as a “teacher.”
Counterfactual Regressor (CFR) ( $f(x,t)$ ): Another TARNet-style network as the “student” for ITE estimation.

The training objective decomposes into:

Propensity/independence losses: $L_p$ (propensity model log-likelihood) and $L_{KL}$ (KL-divergence penalizing dependence between $\varphi$ and $t|x$ ).
Teacher loss: $L_y$ , balancing fit to outcomes with IPM/MMD regularization to match representation distributions across treatments.
PIP student loss: $L_{\text{pip}}$ incorporates fit to factual outcome and imputed counterfactual, with a cross-term to reduce estimation variance.

The model is trained using alternating minimization:

Update $g, \hat{g}$ with $\nabla L_p$ ;
Update $\psi_\eta$ with $\nabla (L_{KL} + L_y)$ ;
Update $h, \psi_\alpha$ with $\nabla L_y$ ;
Update $f$ with $\nabla L_{\text{pip}}$ .

Post-treatment features $s$ are only needed at training; at test time, only $x$ is used for prediction (Lin et al., 21 Dec 2025).

4. Empirical Results and Benchmarking

PIPCFR was evaluated on a suite of standard datasets:

IHDP (RCT-based simulation, 747 units),
News (5000 items, high-dimensional word counts with topic model post-treatment embeddings),
Large synthetic temporal causal system (10,000 units, mediator/adjustment/noise components),
Real-world gaming data (3M users, high-dimensional input).

Comparisons were made against Causal Forest, meta-learners (XLearner, RLearner, DR-Learner), representation methods (TARNet, CFRNet-MMD/Wass, DRCFR, ESCFR), DragonNet, and matching-based techniques.

Performance is reported using root PEHE on held-out data. PIPCFR_WASS achieves

$2.35$ vs. baseline $\sim 2.9$ (↓20%) on IHDP,
$0.44$ vs. $\sim 0.76$ (↓42%) on News,
$3.06$ vs. $\sim 4.09$ (↓25%) on synthetic data,
$~0.2$ points reduction vs. best baseline on real-world gaming data.

PIPCFR demonstrates improved robustness as post-treatment noise grows and as the length of post-treatment sequences increases, with advantages increasing with the difficulty of post-treatment structure. Substituting alternative teacher networks (DRCFR, DragonNet, ESCFR) preserves relative performance gains (Lin et al., 21 Dec 2025).

5. Connections to Pseudo-outcome Counterfactual Regression

The idea of learning from pseudo-outcomes for counterfactual inference predates PIPCFR and has been formalized in the Forster–Warmuth framework (Yang et al., 2023). The generic strategy involves constructing a pseudo-outcome via efficient influence functions so that its conditional mean yields the target counterfactual regression, and then fitting a flexible regressor (e.g., series estimator). A central insight from this literature is that, with appropriate construction, pseudo-outcome plug-in estimators can achieve second-order bias in the error of nuisance estimation and minimax rate optimality under weak regularity conditions. When post-treatment variables are available, they can be incorporated in the pseudo-outcome and in the nuisance estimation stage, provided suitable identifiability (e.g., sequential ignorability) holds (Yang et al., 2023). PIPCFR elaborates this principle by learning data-driven representations of post-treatment variables, providing explicit variance/bias trade-off, and connecting the integration of post-treatment information to ITE estimation risk via a formal bound.

6. Limitations and Future Directions

PIPCFR is subject to several constraints. It currently assumes binary treatment assignment, complete post-treatment data available during training, and relatively simple MLP-based sequence models for post-treatment representation. The framework relies fundamentally on no hidden confounding and overlap. Directions for future research include:

Generalization to multi-armed or continuous treatments,
Identification in the presence of hidden confounders (e.g., via proxy-variable or instrumental-variable methods),
Deployment of advanced sequence models (RNNs, Transformers) for richer post-treatment data,
Theoretical refinement of bias–variance trade-offs in the learned post-treatment representation (Lin et al., 21 Dec 2025).

These avenues are critical to extending the practical scope and theoretical guarantees of pseudo-outcome imputation with post-treatment variables.

7. Summary Table: Core Components of PIPCFR

Component	Function	Technical Detail
Post-treatment Representation Learner	Learns $\varphi$ from $s$	MLP, enforced independence with KL/IPM regularizer
Pseudo Counterfactual Constructor (PCC)	Imputes pseudo-counterfactual $q_t(x,\varphi)$	TARNet-style “teacher” network
Counterfactual Regressor (CFR)	Final ITE predictor $f(x, t)$	TARNet-style “student” network

The design enables PIPCFR to systematically exploit outcome-relevant post-treatment information while rigorously controlling induced biases to yield improved counterfactual estimation fidelity (Lin et al., 21 Dec 2025).

PDF Markdown Chat (Pro)

References (2)

PIPCFR: Pseudo-outcome Imputation with Post-treatment Variables for Individual Treatment Effect Estimation (2025)

Forster-Warmuth Counterfactual Regression: A Unified Learning Approach (2023)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Pseudo-outcome Imputation with Post-treatment Variables for Counterfactual Regression (PIPCFR).