Doubly Robust Estimation in Causal Inference

Updated 25 July 2025

Doubly robust estimation is a method in causal inference and missing data analysis that leverages two models for bias protection under partial model misspecification.
It typically combines an outcome regression and an auxiliary model (e.g., propensity score) using techniques like AIPW and TMLE for efficient estimation.
Empirical studies show that doubly robust methods improve bias reduction and confidence interval accuracy in complex observational and randomized trial settings.

Doubly robust estimation is a foundational strategy in modern causal inference and missing data methodologies, designed to ensure estimator consistency even under partial model misspecification. It underpins a class of estimators for treatment effects and population means that “double-protect” against error by jointly leveraging two independently specified nuisance parameter models—usually an outcome regression model and an auxiliary model, such as the propensity score or a missingness mechanism. This approach yields substantive advantages in terms of bias reduction, efficiency, and valid inference, particularly in settings with missing data, non-probability samples, or complex observational designs.

1. Doubly Robust Estimators: Definition and Key Principles

Doubly robust (DR) estimators are constructed by specifying two models: one for the outcome (outcome regression) and another for a nuisance parameter that arises due to treatment assignment, missing data, selection, or truncation mechanisms. In classical treatment effect estimation, these are:

The outcome regression: $m(W)$ , typically $\mathbb{E}(Y \mid M{=}1, A{=}1, W)$ ,
The missingness or treatment mechanism: $g(W)$ , often a product of treatment and missingness probabilities $g_A(W) \cdot g_M(W)$ .

A DR estimator remains consistent if at least one of these two models is correctly specified, regardless of the correctness of the other. This double safeguard is crucial when parametric models might be misspecified, especially in moderate or high-dimensional settings where flexible, data-adaptive methods are often selected (Díaz et al., 2017).

This generalizes across various application domains, including missing data (Díaz et al., 2017), survival analysis (Díaz, 2017), non-probability surveys (Chen et al., 2018), and observational studies with interference (Liu et al., 2018).

2. Construction and Methodology of DR Estimators

DR estimators most often take the structure of augmented inverse probability weighted (AIPW) estimators or targeted minimum loss-based estimators (TMLE). Central to their construction is the efficient influence function (EIF), which characterizes the semiparametric efficiency bound for the target parameter.

For example, in the context of treatment effect estimation with missing outcome data:

$D_{(\eta, \theta)}(O) = \frac{A \cdot M}{g(W)} [Y - m(W)] + m(W) - \theta,$

where AIPW and TMLE estimators both rely on solving an estimating equation involving this EIF. The AIPW estimator is typically:

$0 = \sum_{i=1}^n \left\{ \frac{A_i M_i}{\hat{g}(W_i)} [Y_i - \hat{m}(W_i)] + \hat{m}(W_i) - \theta \right\}$

By contrast, TMLE iteratively “targets” the initial estimator by solving a submodel (e.g., a logistic tilting regression) to remove any drift and to ensure that the EIF has mean zero under the estimated distribution (Díaz et al., 2017).

Nuisance parameter estimation in these frameworks now routinely employs data-adaptive or ensemble learning strategies (e.g., Super Learner), thus capturing complex, high-dimensional relationships while maintaining asymptotic guarantees (Díaz, 2017).

3. Consistency, Asymptotic Properties, and Regularity Conditions

DR estimators achieve consistency and asymptotic normality if either the outcome regression or the auxiliary model converges at the required rate. Critical conditions include:

Existence of a “working pair” $\eta_1 = (g_1, m_1)$ such that either $g_1$ or $m_1$ is correctly specified.
Rates of convergence: One typically requires that $\|\hat{m}-m_1\| = o_P(1)$ and $\|\hat{g}-g_1\| = o_P(1)$ , with their product $o_P(n^{-1/2})$ for $\sqrt{n}$ -consistency.
Empirical process conditions: Frequently a Donsker assumption, but cross-fitting has been widely adopted to avoid these restrictions, particularly when using machine learning estimators.

For TMLE and other targeted estimators, explicit drift correction is often implemented to ensure the EIF’s empirical mean is sufficiently small, thereby achieving asymptotic normality even when at least one nuisance parameter is only estimated at rate $n^{-1/4}$ (Díaz et al., 2017, Díaz, 2017).

$\sqrt{n}(\hat{\theta} - \theta_0) = (P_n - P_0) D_{(\eta_1, \theta_0)} + o_P(1)$

4. Empirical Evaluation: Simulation and Application Evidence

Simulation studies have consistently demonstrated that standard DR estimators (AIPW, TMLE) as well as their enhanced forms (e.g., drift-corrected or targeted DR estimators) exhibit superior performance in terms of bias, efficiency, and confidence interval coverage, especially under model misspecification (Díaz et al., 2017). Key findings include:

When both models are correct, all estimators are efficient; however, IPW estimators may exhibit high variance and instability in small samples.
When only one model is correct, enhanced DR estimators (e.g., drift-corrected TMLE) show reduced bias and improved variance, whereas standard estimators may be biased or unreliable.
When both models are misspecified, drift-corrected DR estimators often maintain lower bias and RMSE than standard comparators.

Applied illustrations reinforce these findings. In a trial of antiretroviral therapy for HIV, nearly 40% dropout resulted in informative missingness. DR estimators incorporating flexible (ensemble) nuisance estimation produced more reliable effect estimates—with improved finite-sample bias and interval coverage—than conventional methods (Díaz et al., 2017).

5. Implementation Considerations and R Code

The implementation of DR estimators in modern statistical software leverages modularity and ensemble learning for nuisance parameter estimation. For instance, R code supplied in (Díaz et al., 2017) utilizes the SuperLearner package to combine GLM, LASSO, GAM, MARS, and Random Forest algorithms for working models.

Key steps include:

Estimate nuisance parameters using data-adaptive methods, often via cross-validated ensemble algorithms.
Compute auxiliary covariates required for TMLE targeting or AIPW estimation.
Target or "tilt" the initial estimate through iterative updating until drift is minimized (e.g., $\varepsilon$ coefficients below a threshold).
Calculate the final treatment effect estimate and its estimated variance via the (estimated) influence function.
Construct confidence intervals using Wald-type formulas based on the influence function's estimated empirical variance.

This structure allows for robust and reproducible implementation in practical analyses, including use of cross-validation, bandwidth selection, and machine learning pipelines for nuisance modeling.

6. Extensions and Generalizations

The doubly robust framework has been broadly adapted across statistical domains:

In non-probability survey inference, DR estimators exploit auxiliary probability sample information to reliably estimate population means, even when either the response model or sampling mechanism is misspecified (Chen et al., 2018).
In survival analysis, DR techniques—often augmented with cross-fitting and drift correction—yield valid estimators for treatment effects under right censoring or complex missingness structures, provided flexible nuisance estimation is possible (Díaz, 2017).
DR methods have further been deployed in contexts including partial interference settings (for direct and indirect effects), high-dimensional data fusion, semiparametric modeling, and with machine learning-driven nuisance regression.

Recent research emphasizes the integration of DR approaches with highly adaptive regression and cross-validated ensemble methods, the consideration of finite sample properties, and the formalization of computationally tractable estimation procedures for complex causal estimands.

7. Impact and Practical Guidance

The adoption of doubly robust estimation represents a methodological advancement in bias correction, particularly in the analysis of randomized trials with informative missingness, observational studies subject to confounding, and survey designs with non-random selection.

Key practical implications include:

Use of ensemble machine learning for nuisance estimation enables application in moderate to high dimensional contexts where parametric modeling is prone to misspecification.
Drift correction and targeted updating (e.g., in TMLE) provides finite sample bias reduction beyond what is achieved by standard DR estimators.
Confidence interval construction via estimated influence functions allows for robust inference under minimal regularity assumptions, provided at least one nuisance model is correct.

The increasing inclusion of modular, open-source code (e.g., SuperLearner in R) lowers barriers to adoption and ensures accessibility to advanced DR methods for varied analytic contexts, supporting both rigor and flexibility in applied analysis.

In sum, doubly robust estimation furnishes an essential strategy for valid, efficient, and practical inference in the presence of model uncertainty, missing data, or complex confounding structures. Its ongoing methodological evolution ensures its relevance for both theoretical and applied statistical research (Díaz et al., 2017, Díaz, 2017, Chen et al., 2018, Liu et al., 2018).