Double Robustness in Semiparametric Methods

Updated 30 March 2026

Double robustness is a property ensuring that an estimator remains consistent if at least one of the two nuisance functions, such as an outcome regression or a propensity score, is correctly specified.
It underpins efficiency in semiparametric models by leveraging influence-function orthogonality and convexity conditions to mitigate bias from model misspecification.
Applications include causal inference, missing data analysis, survival analysis, and high-dimensional estimation, demonstrating its practical utility across econometrics and related fields.

Double robustness is a key structural property in modern semiparametric estimation, causal inference, missing data, high-dimensional settings, and econometrics. It underpins the robustness and efficiency of a broad class of estimators by ensuring consistency if at least one among two candidate nuisance functions is correctly specified or estimated, and in many models is intimately tied to influence-function orthogonality and information-geometry properties of the statistical model.

1. Definition and Formalism

Let $(Y, X)$ be observed data, with $Y$ the outcome and $X$ a vector of covariates. The target parameter, denoted $\theta$ , is typically a functional of the observed law, such as an average treatment effect or a mean under missing data. In the prototypical setup, there exist two nuisance functions: $\gamma_1$ (often an outcome regression) and $\gamma_2$ (often a propensity score or exposure mechanism).

Definition. An estimator (or, more precisely, an estimating function $\phi$ ) is doubly robust if for any law $p$ and any choice of $\gamma_1$ or $\gamma_2$ ,

$E_p[\phi(X; \theta(p), \gamma_1(p), \gamma_2)] = 0 \quad \text{and} \quad E_p[\phi(X; \theta(p), \gamma_1, \gamma_2(p))] = 0$

so that an estimator based on $\phi$ is consistent for $\theta(p)$ if either $\gamma_1$ or $\gamma_2$ is correctly specified, regardless of the other (Ying, 2024).

Canonical instances include the Augmented Inverse Probability Weighted (AIPW) estimator for treatment effects, calibration estimators for compliers in IV models, and locally projected estimators in dynamic econometrics.

2. Semiparametric Theory and Influence-Function Perspective

In semiparametric models, double robustness often arises from influence-function orthogonality. A canonical result is that the influence curve for a pathwise differentiable functional $\theta$ is orthogonal to the nuisance tangent space, ensuring insensitivity to infinitesimal perturbations of these nuisance components.

Under convexity of the relevant “contour sets” (sets of models with fixed target/nuisance values), the influence function is itself doubly robust “for free” (Ying, 2024). This means estimators constructed using the canonical influence function enjoy double robustness without further adjustment in a large class of models, such as partially linear regression, missing data with MAR, and standard causal inference scenarios.

3. Asymptotic Theory and Rate Double Robustness

The classical result for estimators $\widehat\theta$ built with plug-in nuisance fits $\widehat\gamma_1$ , $\widehat\gamma_2$ is that $\widehat\theta$ is asymptotically normal and root- $n$ consistent so long as the product of the $L_2$ rates of estimation is $o(n^{-1/2})$ (Shu et al., 2018, Sandqvist, 2024). This "rate double robustness" ensures

$\|\widehat\gamma_1 - \gamma_1\| \cdot \|\widehat\gamma_2 - \gamma_2\| = o_P(n^{-1/2})$

implies the empirical process term dominates and the remainder vanishes. For Z-estimation with orthogonal moment equations (Neyman orthogonality), the required rate on the nuisance functions can be relaxed: as shown in (Lok, 2024), if either nuisance is $o_P(1)$ and the other is estimated at rate $n^{-1/4}$ , the plug-in sandwich variance estimator for $\widehat\theta$ is consistent and the limiting law of $\sqrt{n}(\widehat\theta - \theta)$ is unaffected by the nuisance uncertainty.

4. Classes of Double Robust Estimators

Point Estimation Examples

Model/Target	DR Estimator	Required Correct Models
ATT/ATE (causal inference)	AIPW/TMLE	Propensity score or outcome model
Complier average characteristics	DR moment w/ $\kappa$	Weight or regression function
Location/scale w/ MAR (missing data)	AIPW + robustification	PS or outcome regression
Survival w/ censoring	DRCUT pseudo-outcomes	Censoring or regression hazard
Local Projections (irf, time series)	Direct LP estimator	PL regression or "shock" model

Structural Property: For these, the estimator is consistent (and typically regular asymptotically linear) if either of the two models/nuisance estimators is correctly specified (or estimated sufficiently well), not necessarily both (Shu et al., 2018, Singh et al., 2019, Cantoni et al., 2018, Sandqvist, 2024, Olea et al., 2024).

Sequential Double Robustness (SDR)

In longitudinal data (e.g., longitudinal G-computation), sequential double robustness (SDR) arises when, at each time point of a multistage process, consistency is guaranteed if, for each $t$ , either the regression or the treatment model at time $t$ is correctly specified. This generalizes standard DR, allowing mixtures across time points (Luedtke et al., 2017).

5. Robustness, Limitations, and Fragility

Double Robustness vs. Double Fragility

While DR estimators offer protection under partial misspecification, when both nuisance models are incorrect, the error in the estimator can magnify, with bias of order the product of the two errors. This phenomenon is termed “double fragility” (Testa et al., 26 Sep 2025):

$\eta = E \left[ (\widehat\mu(X) - \mu(X)) \left(1 - \frac{\pi(X)}{\widehat\pi(X)} \right) \right]$

which can dominate the estimator's error if both working models are poorly fitted.

Adaptive Correction Clipping (ACC) methods have been proposed to address this, ensuring point estimates cannot be worse than the worse of the individual outcome regression or IPW estimators, thereby achieving what the authors term "double safety" (Testa et al., 26 Sep 2025).

Variance Estimation

Classical variance estimators such as the influence-function (IF) plug-in estimator are not doubly robust: IF-based variance estimation is valid only if both working models are correct. Empirical sandwich estimators and nonparametric bootstrap methods, by instead leveraging the unbiasedness of stacked estimating equations, retain double robustness for variance estimation (Shook-Sa et al., 2024).

6. Extensions and Practical Examples

High-dimensional and Machine Learning Nuisance Estimation

Sample splitting and cross-fitting enable the use of complex machine learning models for nuisance components while preserving DR properties—provided product rates are met (Sandqvist, 2024, Shu et al., 2018, Liu et al., 2023). Even in high- $d$ settings, plug-in DML-type estimators retain root- $n$ inference under appropriate sparsity and rate assumptions.

External Controls and Attaching Data Sources

Naive incorporation of external control samples into doubly robust ATT estimators can paradoxically degrade efficiency under single-model misspecification. A “double-safe” estimator optimally combines the standard (trial-only) and external-control estimators to ensure no efficiency loss relative to the best available approach in each scenario, while preserving DR (Dai et al., 24 Sep 2025).

Model Geometry and Parameterization

Information geometry and semiparametric theory provide necessary and sufficient conditions for the existence of DR model structures (Ying, 2024). Variation-independence and convexity (or m-flatness) of the “contour sets” are central: in models where contours are not convex (e.g., certain odds ratio models under canonical parameterization), true DR estimators do not exist.

7. Applications and Illustrations

Representative applications illustrating DR include:

Causal inference for ATE/ATT in the presence of confounding (Shu et al., 2018, Dai et al., 24 Sep 2025).
Inference for complier populations in instrumental-variable analysis (Singh et al., 2019).
Average outcome/location/scale estimation with missing data and MAR (Cantoni et al., 2018).
Rate double robustness of DML and flexible ML-driven approaches (Sandqvist, 2024, Liu et al., 2023).
DR inference for conditional means under coarsening at random censoring (Sandqvist, 2024).
Double-robust hypothesis testing in over-identified GMM (Kleibergen et al., 2021).
LP-based inference in time-series settings with unmodeled serial correlation (Olea et al., 2024).

References

"A Geometric Perspective on Double Robustness by Semiparametric Theory and Information Geometry" (Ying, 2024)
"Improved Estimation of Average Treatment Effects on the Treated: Local Efficiency, Double Robustness, and Beyond" (Shu et al., 2018)
"Double Robustness of Local Projections and Some Unpleasant VARithmetic" (Olea et al., 2024)
"Sequential Double Robustness in Right-Censored Longitudinal Models" (Luedtke et al., 2017)
"Doubly robust inference with censoring unbiased transformations" (Sandqvist, 2024)
"Double Robust Variance Estimation with Parametric Working Models" (Shook-Sa et al., 2024)
"Double Robustness for Complier Parameters and a Semiparametric Test for Complier Characteristics" (Singh et al., 2019)
"Robust semiparametric inference with missing data" (Cantoni et al., 2018)
"Rescuing double robustness: safe estimation under complete misspecification" (Testa et al., 26 Sep 2025)
"Incorporating External Controls for Estimating the Average Treatment Effect on the Treated with High-Dimensional Data: Retaining Double Robustness and Ensuring Double Safety" (Dai et al., 24 Sep 2025)
"Demystified: double robustness with nuisance parameters estimated at rate n-to-the-1/4" (Lok, 2024)
"Double robust inference for continuous updating GMM" (Kleibergen et al., 2021)