Doubly Robust Learning: Methods & Applications

Updated 22 December 2025

Doubly Robust Learning is a framework that combines outcome regression and propensity models to ensure consistent estimations if either model is correctly specified.
It leverages techniques such as AIPW, TMLE, and cross-fitting to achieve statistical efficiency and robustness against model misspecification.
Applications range from causal inference and off-policy evaluation to recommender systems and semi-supervised learning, enhancing bias correction and inference reliability.

Doubly robust learning is a statistical framework for estimating treatment effects or correcting bias in observational and semi-supervised data, characterized by estimators that combine two models—typically an outcome regression and a propensity (or missingness) model—such that the target estimand remains consistent if either model is correctly specified. This property enhances both robustness to model misspecification and statistical efficiency, and has motivated extensive methodological and empirical research across causal inference, recommender systems, off-policy evaluation, and modern semi-supervised learning.

1. Fundamental Principles and Estimator Structure

Doubly robust (DR) learning originated in causal inference for observational studies, where the target, often the average treatment effect (ATE) $\tau = \E[Y(1) - Y(0)]$, is not directly identifiable due to confounding between treatment $D \in \{0,1\}$ , covariates $X$ , and outcome $Y$ . Identification relies on three assumptions: consistency (SUTVA), conditional ignorability ( $\{Y(0), Y(1)\} \perp D\,|\,X$ ), and overlap ($0 < P(D=1|X) < 1$) (Tan et al., 2022, Hlynsson, 2 Jun 2024).

Classical “singly robust” approaches are:

Outcome regression (imputation): Models $\E(Y|D=d, X=x)$ for $d=0,1$ , then averages their difference over $X$ .
Inverse probability weighting (IPW): Models the propensity $e(x) = P(D=1|X=x)$ , weighting observed outcomes to simulate a randomized experiment.

Both approaches are consistent only if their respective nuisance models are correctly specified.

A canonical doubly robust estimator, such as the augmented IPW (AIPW), is

$\hat\tau_{AIPW} = \frac{1}{n}\sum_{i=1}^n \left[\frac{D_i(Y_i - \hat m_1(X_i))}{\hat e(X_i)} - \frac{(1-D_i)(Y_i - \hat m_0(X_i))}{1-\hat e(X_i)} + \hat m_1(X_i) - \hat m_0(X_i)\right]$

where $\hat m_d(X)$ and $\hat e(X)$ are estimated outcome and propensity models. Consistency requires only one model to be correctly specified (Tan et al., 2022, Hlynsson, 2 Jun 2024).

2. Theoretical Guarantees and Rate Conditions

Doubly robust estimators possess several notable theoretical properties (Tan et al., 2022, Dukes et al., 2021, Kennedy, 2022):

Double robustness: Consistency of the target estimates if either the propensity or the outcome model is correct.
Product rate efficiency: With cross-fitting (sample splitting), $\sqrt{n}$ -consistency and asymptotic normality require only that the product of $L_2$ -errors, $\|\hat m - m\|_2 \cdot \|\hat e - e\|_2$ , converges faster than $n^{-1/2}$ , so each nuisance can converge at a rate $o_p(n^{-1/4})$ .
Semiparametric efficiency: If both models are estimated at the required rate, DR estimators achieve the semiparametric efficiency bound given by the variance of the efficient influence function (Kennedy, 2022).
Bias structure: When both models are misspecified, the bias is of order $O(\|\hat{m}-m\|_\infty \cdot \|1 - p/\hat{p}\|_\infty)$ ; thus, mild misspecification in both can still lead to substantial bias (Li et al., 2022).

Recent work addressed regularity and finite-sample issues. For example, the calibrated DML (C-DML) procedure applies isotonic regression to the initial nuisance fits, yielding doubly robust asymptotic normality even if only one nuisance is estimated sufficiently well, while the other can converge arbitrarily slowly or inconsistently (Laan et al., 5 Nov 2024). This advances beyond traditional DR estimators where valid confidence intervals typically require both nuisances to be estimated at a fast enough rate.

3. Key Methodologies: TMLE, Cross-Fitting, and Extensions

Beyond vanilla AIPW, DR frameworks include a variety of methodologies:

Targeted Maximum Likelihood Estimation (TMLE): TMLE performs an initial outcome regression, then applies a low-dimensional “fluctuation” or targeting step linked to the efficient influence function, aligning the plug-in estimator with semiparametric efficiency while maintaining the natural range of predicted outcomes. TMLE exhibits improved finite-sample stability in low-overlap scenarios compared to AIPW (Tan et al., 2022).
Cross-fitting (Double Machine Learning): Cross-fitting fits nuisance models on separate training folds from those used in the final estimation, removing the need for Donsker (uniform boundedness) conditions and further guarding against overfitting. Rate-doubly-robust asymptotics are thus attainable with highly adaptive machine learning estimators (e.g., random forests, boosting, ensemble methods) for nuisance functions (Tan et al., 2022, Kennedy, 2022).
Double Score Matching and Multiple Robust (MR) Estimation: In contexts such as recommendation systems, MR generalizes DR by optimally combining several candidate nuisance models. MR estimators are unbiased if any candidate is correct, expanding the “chance” of unbiasedness beyond DR’s two-model structure (Li et al., 2022).
Conservative DR (CDR): For high-variance and "poisonous imputation" regimes, as in recommendation, CDR performs uncertainty screening on imputations by mean and variance checks, reverting to IPW wherever imputation is insufficiently reliable, thus strictly tightening variance/tail bounds relative to unconstrained DR (Song et al., 2023).

4. Applications Across Fields

Doubly robust methodology has been adapted across research domains:

Causal inference in observational studies: The primary domain, where DR estimators serve as a powerful tool for ATE and CATE estimation under confounding, often embedded in double machine learning workflows using modern machine learning for nuisance estimation (Tan et al., 2022, Hlynsson, 2 Jun 2024).
Survival analysis with informative censoring: Augmented IPCW (AIPCW) estimators handle time-to-event data subject to covariate-dependent censoring, providing model and rate double robustness for log-hazard ratio estimation under the Cox model (Luo et al., 2022).
Recommender systems: DR methods debias missing-not-at-random (MNAR) and bandit feedback via the combination of error imputation and propensity weighting; advanced variants address high-variance imputations (CDR) or enhance selection of nuisance learners (selective ML) (Li et al., 2022, Song et al., 2023, Cui et al., 2019).
Semi-supervised learning: DR is used to debias pseudo-labeling and SSL risk estimation, yielding procedures that interpolate between pure-supervised (when pseudo-labels are unreliable) and full pseudo-labeling (when accurate), with recent advances introducing context-adaptivity for site-specific or covariate-dependent reliability (Zhu et al., 2023, Pham et al., 1 Feb 2025, Ruah et al., 21 Feb 2025).
Off-policy evaluation and learning: DR estimators are foundational in OPE/OFF-L policies for applications ranging from recommender logs to reinforcement learning, including doubly robust actor-critic algorithms and distributionally robust OPE (Islam et al., 2019, Kallus et al., 2022, Chen et al., 2020).
Gradient-based optimization: Stochastic DR gradients for SGD address missing outcome or covariate settings, improving convergence and variance properties (Lee et al., 2018).

5. Practical Implementation and Empirical Insights

A typical algorithmic workflow for DR estimation in the ATE context involves:

Variable selection: Methods such as Lasso, BART variable-importance, or other screening algorithms reduce extraneous covariates to stabilize estimation (Tan et al., 2022).
Flexible nuisance fitting: Multiple machine learning algorithms or SuperLearner ensembles for both outcome regression and propensity estimation.
Cross-fitting: K-fold sample splitting to mitigate overfitting and empirical-process complications, attaining product-rate robustness.
Estimation and inference: Compute AIPW or TMLE estimates with empirical variance or bootstrap for interval estimation; diagnostic checks (propensity distribution, positivity, residuals) (Tan et al., 2022, Hlynsson, 2 Jun 2024).
Pitfalls: Non-overlap (propensity near 0 or 1), improper trimming, failure to cross-fit, or over-broad model libraries can degrade performance or inflate variance.

Simulation studies consistently show that singly robust estimators (pure regression or IPW) are highly vulnerable to model misspecification, yielding biased and inefficient inference. Doubly robust estimators (AIPW, TMLE) outperform across a range of confounding, nonlinearity, and overlap conditions, with TMLE frequently offering improved stability and coverage in finite samples. In real-world applications (e.g., fibromyalgia outcomes, large-scale ads personalization, semi-supervised vision tasks), DR learning yields marked gains over standard or singly robust methods (Tan et al., 2022, Shi et al., 29 Sep 2024, Zhu et al., 2023, Pham et al., 1 Feb 2025).

6. Recent Advances and Future Directions

The past several years have yielded refinements and extensions to classical DR frameworks:

Calibrated DR and Double Score Matching: Calibration via isotonic regression on nuisance fits enables consistent inference even for very slowly convergent or inconsistent nuisance estimators, mitigating the need to rely on both models being estimated at $o_p(n^{-1/4})$ (Laan et al., 5 Nov 2024).
Selective ML for Nuisance Selection: Cross-validated or perturbation-minimizing selection of nuisance function learners yields estimators robust to poor machine learning algorithm choices, addressing high-dimensionality and model uncertainty (Cui et al., 2019).
Variance-Minimizing DR (MRDR): Optimization of the imputation function for minimal DR estimator variance, preserving double robustness and enhancing practical stability in recommender and conversion-rate settings (Guo et al., 2021).
Applications to Distributionally Robust OPE and Heterogeneous Environments: DR estimators adapted to adversarial or heterogeneously reliable data sources; e.g., in context-aware SSL, context-dependent weighting further reduces bias-variance when pseudo-label accuracy varies across data regimes (Ruah et al., 21 Feb 2025, Kallus et al., 2022).
Extensions to Multi-arm Treatments, Continuous Actions, and Beyond: DR learning has been generalized to angle-based or vectorized frameworks for multi-arm treatments, and, using ReLU deep nets, to handle continuous or high-cardinality action spaces with rates controlled by intrinsic low-dimensional structure (Meng et al., 2020, Chen et al., 2020).

7. Limitations and Open Problems

While doubly robust learning has become foundational for semiparametric estimation and bias-corrected machine learning, important challenges remain:

Bias under joint model misspecification: When both nuisance estimators are even slightly misspecified, product-form bias can degrade estimator performance substantially (Li et al., 2022).
Non-regularity with single-nuisance consistency: In “one-sided” scenarios (only one consistent nuisance estimator), the resulting asymptotic distributions can be non-regular, with erratic finite-sample behavior (Dukes et al., 2021).
Poisonous Imputation: In recommendation and contextual ML applications, poor imputations can actually worsen bias/variance over singly robust estimators, motivating filter-based remedies (CDR) (Song et al., 2023).
Tuning cross-fitting and model libraries: Overly aggressive cross-fitting or excessively wide/complex model libraries can induce instability and overfitting (Tan et al., 2022).
Extending beyond linear functionals: Generalizing DR calibration and efficiency guarantees to arbitrary (nonlinear, adaptive) functionals remains an active area (Laan et al., 5 Nov 2024).

Further open directions include plug-in-and-calibrated estimators that combine TMLE and calibration, multiply robust extensions for longitudinal/proximal causal models, and empirical-process theory for non-Donsker, machine-learned nuisance fits in ultra-high dimensions.

References (select papers covered, cited by arXiv id):

(Tan et al., 2022): Double Machine Learning Methods for Estimating Average Treatment Effects: A Comparative Study
(Hlynsson, 2 Jun 2024): A Tutorial on Doubly Robust Learning for Causal Inference
(Laan et al., 5 Nov 2024): Doubly Robust Inference via Calibration
(Li et al., 2022): Multiple Robust Learning for Recommendation
(Dukes et al., 2021): On Doubly Robust Inference for Double Machine Learning in Semiparametric Regression
(Song et al., 2023): CDR: Conservative Doubly Robust Learning for Debiased Recommendation
(Guo et al., 2021): Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate Estimation
(Ruah et al., 21 Feb 2025): Context-Aware Doubly-Robust Semi-Supervised Learning
(Zhu et al., 2023): Doubly Robust Self-Training
(Kallus et al., 2022): Doubly Robust Distributionally Robust Off-Policy Evaluation and Learning
(Chen et al., 2020): Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks