DML-CMR Estimator for Moment Restrictions

Updated 30 June 2025

DML-CMR estimator is a bias-correcting double machine learning method designed for conditional moment restriction problems.
It applies Neyman orthogonality and cross-fitting to control first-stage estimation errors in high-dimensional settings.
It achieves minimax-optimal convergence and robust inference, outperforming traditional IV and causal estimation methods.

The DML-CMR estimator is a methodology within the double/debiased machine learning (DML) framework designed to estimate solutions to conditional moment restriction (CMR) problems. These problems are central in statistics, econometrics, and causal inference, as they include instrumental variable (IV) regression, proximal causal learning, and various high-dimensional inference settings. DML-CMR seeks to provide an unbiased, minimax-optimal, and computationally efficient estimator in scenarios where both the outcome and the function of interest (often high-dimensional or nonparametric) require regularized or machine learning-based estimators.

1. Problem Setting and Definition

CMRs specify that, given random variables $X$ , $C$ , and $Y$ , there exists a function $f_0$ such that:

$\mathbb{E}[Y - f_0(X) \mid C] = 0$

This encompasses a wide class of problems, including IV regression (where $C$ is the instrument), proximal causal learning, and general causal function estimation where nuisance dependence complicates direct estimation of $f_0$ . Standard two-stage plug-in approaches, especially with machine learning estimators (e.g., deep neural networks), can lead to considerable bias due to regularization and overfitting.

The DML-CMR estimator is explicitly constructed to correct such bias, leveraging the principle of Neyman orthogonality and cross-fitting for honest sample splitting.

2. Methodology and Learning Objective

Neyman Orthogonal Score Construction

A principal challenge in two-stage approaches to CMR is that naive plug-in estimators are not Neyman orthogonal: errors from the first-stage nuisance estimations directly and linearly affect bias in estimating $f_0$ . To mitigate this, DML-CMR introduces a Neyman orthogonal score for CMRs:

$\psi(D; f, (s, g)) = (s(c) - g(f, c))^2$

where:

$s(c) = \mathbb{E}[Y \mid c]$ (nuisance regression)
$g(f, c) = \mathbb{E}[f(X) \mid c]$ (lifted model prediction)

Neyman orthogonality ensures that small errors in the nuisance estimators $s, g$ impact $f_0$ estimation only at second order.

Cross-Fitting Algorithm

DML-CMR employs a cross-fitting routine for bias control:

Partition the data into $K$ folds.
For each fold $k$ $k$ :
- Train nuisance estimators $\widehat{s}_k$ and $\widehat{g}_k$ on all data except the $k$ -th fold.
For each fold, minimize the empirical Neyman orthogonal loss using the corresponding out-of-fold estimates:

$\frac{1}{K} \sum_{k=1}^{K} \mathbb{E}_{c \in I_k} \left[ (\widehat{s}_k(c) - \widehat{g}_k(f_\theta, c))^2 \right]$

where $f_\theta$ is a parameterized model for $f$ .

Gradient-based methods are typically used for updating $\theta$ , and minibatch optimization can be applied for efficiency with large datasets.

3. Theoretical Properties and Convergence Rates

DML-CMR achieves statistical inference and estimation properties under mild and transparent assumptions:

Minimax-optimal convergence: $O(N^{-1/2})$ for parametric $f$ .
Asymptotic normality: For parameterized $f_\theta$ ,

$\sqrt{N}(\widehat{\theta} - \theta_0) \xrightarrow{d} \mathcal{N}(0, \sigma^2)$

Robustness to bias: By construction, only second-order nuisance estimation errors appear in bias, conferring resilience to regularization and overfitting effects.
Idenfiability and regularity: Assumptions include parameterizability of $f_0$ , boundedness and realizability of true functions and nuisances, Neyman orthogonality, and non-singular Jacobian of the score.

Notably, the DML-CMR framework accommodates high-dimensional or deep nuisance learners, provided these estimators achieve convergence rates of $o(N^{-1/4})$ for the nuisance parameters.

4. Bias Reduction Techniques and the Role of Deep Learning

Double machine learning estimators can be particularly susceptible to bias when deep neural networks (DNNs) or other flexible learners are used in both stages. DML-CMR addresses this with two main design elements:

Score function orthogonality: Ensures robustness to small errors in the estimation of nuisance functions.
Sample splitting (cross-fitting): Guarantees out-of-fold independence between estimates of the nuisance functions and the final-stage fit, preventing overfitting bias.

When applied with DNNs for the nuisance estimators and final estimator, DML-CMR achieves fast convergence and low bias provided DNN approximation errors shrink appropriately with data.

5. Empirical Performance and Application Domains

DML-CMR has been empirically evaluated across a range of problem domains:

Instrumental Variable (IV) Regression: Real and synthetic datasets (Ticket Demand, MNIST variants, IHDP, PM-CMR).
Proximal Causal Learning (PCL): High-dimensional vision settings (dSprites).
High-dimensional settings: DML-CMR demonstrates substantial improvement in resilience to bias and variance, outperforming methods such as DeepIV, DeepGMM, KIV, DFIV, CEVAE, NMMR, and PKDR on mean squared error and predictive accuracy.

A summary from the provided evidence:

Setting	DML-CMR Outcome	Outperforms
IV regression	Lower MSE, robust inference	DeepIV, KIV, DFIV, DeepGMM
PCL	Equal or better performance	NMMR, CEVAE, PKDR, DFPV
High-dim	Lower bias, variance	All examined baselines

These results suggest practical state-of-the-art performance in scenarios with weak instruments, complex nonlinearities, or high feature dimension.

6. Implementation Considerations and Resources

DML-CMR is designed to be compatible with a wide range of machine learning frameworks and can utilize deep learning libraries for both nuisance estimation and final parameterization. The methodology has been released as code at https://github.com/shaodaqian/DML-CMR (as given in the data). Implementation requires the ability to fit regression models (potentially DNN-based) and optimize a squared loss on out-of-fold predictions, as well as manage cross-fitting.

Computational requirements depend on model complexity and dataset size, but employ standard routines for model fitting and cross-validation/minibatching. In lower-dimensional settings, a non-cross-fitted variant (CE-DML-CMR) may achieve similar performance with faster computation, though potential bias is not controlled as rigorously.

7. Impact and Significance

The DML-CMR estimator represents a substantial advance in the field of statistical estimation under conditional moment restrictions, especially where modern ML techniques are required. Its bias-reducing properties, theoretically optimal rates, and empirical performance across challenging datasets make it a valuable tool for causal inference problems—extending the robust estimation guarantees of double machine learning beyond traditional moment conditions, and into nonlinear, high-dimensional, and deep learning domains.

PDF Markdown Chat (Upgrade)