Riesz Regression Methods

Updated 28 July 2025

Riesz regression is a family of methods based on the Riesz representation theorem that enables efficient estimation of linear functionals in causal models.
It employs data-driven loss minimization with machine learning algorithms like neural networks, random forests, and boosted trees to construct optimal weighting functions.
The approach achieves double robustness and Neyman orthogonality, ensuring asymptotically valid inference even in high-dimensional or nonparametric settings.

Riesz regression is a family of statistical and machine learning methods grounded in the Riesz representation theorem, providing a unified and theoretically rigorous framework for the estimation of linear functionals of regression functions—especially in causal inference and semiparametric efficiency theory (Chernozhukov et al., 2018, Chernozhukov et al., 2021, &&&2&&&, Lee et al., 8 Jan 2025, Williams et al., 25 Jul 2025). In modern applications, Riesz regression enables the construction of estimators for causal and policy effects by replacing explicit analytical forms for weighting functions (Riesz representers) with data-adaptive, machine-learned alternatives. This strategy delivers valid inference and efficiency even in high-dimensional or nonparametric settings.

1. Mathematical Principles and the Riesz Representation Theorem

The cornerstone of Riesz regression is the Riesz representation theorem, which states that for any bounded linear functional $m(\cdot)$ on a Hilbert space (e.g., $L^2$ ), there exists a unique element—the Riesz representer $\alpha_0$ —such that

$\mathbb{E}[m(X; Q)] = \mathbb{E}[\alpha_0(X) Q(X)]$

for all $Q$ in the space (Williams et al., 25 Jul 2025, Chernozhukov et al., 2021). This dual representation allows the reformulation of statistical estimands, such as the average treatment effect (ATE), into reweighting forms familiar in inverse probability weighting (IPW) but justified as unique solutions in the dual space.

Key consequences:

The Riesz representer encodes the optimal weighting for the functional of interest, grounding estimators such as the Horvitz–Thompson estimator and efficient influence functions in a unified theory (Williams et al., 25 Jul 2025).
For complex or nested estimands (e.g., mediation effects), iterative or recursive application yields a hierarchy of Riesz representers, allowing for systematic EIF construction (Williams et al., 25 Jul 2025).

2. Core Methodologies for Riesz Regression

The statistical implementation of Riesz regression consists of data-driven estimation of the Riesz representer using loss-minimization formulations that are directly linked to the Riesz theorem. The generic loss function is: $\alpha_0 = \arg\min_\alpha \mathbb{E}\left[\alpha(X)^2 - 2 m(W; \alpha)\right]$ Alternatively, in more elaborate settings, nested or weighted losses encode dependencies on previously estimated nuisance parameters (Chernozhukov et al., 2021, Lee et al., 8 Jan 2025, Williams et al., 25 Jul 2025).

Modern Riesz regression workflows deploy:

Direct minimization algorithms for the loss, which may use neural networks, random forests, or gradient boosting as flexible function classes (Chernozhukov et al., 2021, Chernozhukov et al., 2021, Lee et al., 8 Jan 2025).
Automatic debiasing: In debiased machine learning, the Riesz representer is used within "double machine learning" or "targeted minimum loss-based estimation" to correct plug-in bias via augmented estimating equations:

$\psi(W, \gamma, \alpha, \theta) = m(W, \gamma) - \theta + \alpha(X)[Y - \gamma(X)]$

This yields estimators that are robust to regularization and model selection bias (Chernozhukov et al., 2021, Chernozhukov et al., 2021).

Empirical risk minimization: In practice, the expected loss is replaced by sample averages, often with cross-fitting or data-splitting to avoid overfitting (Chernozhukov et al., 2021, Chernozhukov et al., 2021).

3. Machine Learning Implementations

Recent work has equipped Riesz regression with scalable and adaptive machine learning estimators:

Method	Function class	Objective/Loss
RieszNet	Neural nets	Multitask learning of regression + Riesz head, joint loss (Chernozhukov et al., 2021)
ForestRiesz	Random forest	Locally linear Riesz representer, solved via GRF (Chernozhukov et al., 2021)
RieszBoost	Boosted trees	Gradient boosting on Riesz loss, data augmentation for pseudo-outcomes (Lee et al., 8 Jan 2025)

These methods automatically construct Riesz representers even for high-dimensional or complex functional forms, bypassing analytical derivations. Empirically, RieszNet and ForestRiesz have achieved competitive or superior mean absolute errors and valid coverage compared to existing approaches for problems such as ATE and average derivative estimation (Chernozhukov et al., 2021, Lee et al., 8 Jan 2025).

4. Applications in Causal Inference and Semiparametrics

Riesz regression underpins modern efficient estimators across a variety of causal and policy estimation problems:

Average Treatment Effect (ATE):

$\theta = \mathbb{E}\left[ \frac{1(A=1)}{f(A=1|W)} Y - \frac{1(A=0)}{f(A=0|W)} Y \right]$

Here, $\alpha(A,W) = \frac{1(A=1)}{f(A=1|W)} - \frac{1(A=0)}{f(A=0|W)}$ is the Riesz representer (Williams et al., 25 Jul 2025, Chernozhukov et al., 2021).

Policy and Distributional Effects: Transporting or shifting covariate distributions leads to more complex Riesz representers, where Riesz regression avoids analytical intractability (Chernozhukov et al., 2018).
Mediation Analysis: Estimation of natural direct effects (NDE) and related mediation functionals is achieved via sequential Riesz regressions through recursively defined losses involving previous representers (Williams et al., 25 Jul 2025).
Generalized Linear and Nonlinear Settings: Riesz regression extends to generalized regression settings with weighted losses incorporating derivatives or Gateaux differentials (Chernozhukov et al., 2021).
Density Estimation: Bona fide Riesz projections enforce convex constraints for nonnegativity and normalization in density estimation, generalizing the Riesz regression philosophy to nonparametric pdf estimation (Pla et al., 2022).
Hilbertian Linear Models and Infinite Dimensions: In functional linear models, principal component-based estimators in Hilbertian frameworks share the projection operator spirit of Riesz regression, extending applicability to functional and infinite-dimensional data (Hörmann et al., 2012).

5. Theoretical Properties and Inference Guarantees

Riesz regression confers several theoretically desirable properties:

Double (Sparsity) Robustness: Estimators remain asymptotically valid if either the regression or the representer is estimated at sufficient rate, accommodating high-dimensional or dense nuisance parameters (Chernozhukov et al., 2018, Chernozhukov et al., 2021).
Neyman Orthogonality: The estimating equations used are invariant to first-order errors in nuisance estimates, conferring robustness to regularization bias (Chernozhukov et al., 2018, Chernozhukov et al., 2021).
Uniform Finite-Sample Error Bounds: High-probability mean-square error bounds are derived for Riesz regression under complexity constraints of the chosen function class (e.g., covering/critical radii for neural nets or random forests) (Chernozhukov et al., 2021, Chernozhukov et al., 2020).
Asymptotic Normality: When the product of convergence rates of the regression and Riesz representer estimators is o(n^{-1/2}), the resulting debiased estimator is asymptotically efficient (Chernozhukov et al., 2021, Chernozhukov et al., 2021).
End-to-End Automatability: Riesz regression algorithms enable users to specify parameters and learners while fully automating the debiasing, supporting a modern "Auto-DML" paradigm (Chernozhukov et al., 2021, Lee et al., 8 Jan 2025).

6. Practical Advantages and Limitations

The Riesz regression approach is notable for:

Avoiding analytical derivation and density estimation for weights, improving numerical stability in the presence of positivity violations and circumventing problems with extreme weights (Lee et al., 8 Jan 2025, Williams et al., 25 Jul 2025).
Supporting arbitrary machine learning estimators, including neural nets, random forests, and boosted trees, with empirical performance competitive or superior to "plug-in" approaches (Chernozhukov et al., 2021, Lee et al., 8 Jan 2025).
Direct applicability to high-dimensional tabular data, complex mediation settings, and settings with non-standard covariate or treatment distributions (Lee et al., 8 Jan 2025, Chernozhukov et al., 2021).

Potential limitations include challenges in extrapolating the method beyond L² settings, tailoring the loss for nonlinear or composite functionals where the representer is not unique or linear, and computational cost when using highly flexible learner classes. Further research explores extensions of Riesz regression to more general structures (e.g., non-doubling geometry, dynamic environments) and comparative evaluation under practical constraints (He, 19 Mar 2025).

7. Broader Theoretical and Structural Connections

Riesz regression methods closely relate to several mathematical themes:

Ordered Vector Spaces and Pre-Riesz Spaces: The order-approximation properties and pervasiveness criteria for pre-Riesz spaces determine whether Riesz regression schemes can be defined intrinsically, without full vector lattice closures (Kalauch et al., 2018).
Harmonic and Singular Integral Analysis: Dimension-free Riesz transform estimates and their dyadic analogues underpin regression methods where directional derivatives or edge detection are critical; Riesz regression frameworks benefit from these stability results, especially in high-dimensional or irregular geometries (Kucharski et al., 2021, Kucharski et al., 2023, Domelevo et al., 2023, Domelevo et al., 2023, He, 19 Mar 2025).
Series Representations and Integral Transforms: Techniques inspired by the Riesz function and its acceleration methods (Mellin transform, Kummer acceleration, etc.) have potential implications for advanced regression frameworks seeking analytic regularity or error control (Smith, 2012).

In summary, Riesz regression is a principled, machine learning–adaptable methodology for efficient and robust estimation of causal and semiparametric functionals. It translates the abstract mathematical assurance of the Riesz representation theorem into practical, scalable, and theoretically justified algorithms, profoundly impacting modern statistical inference, especially in high-dimensional and complex settings (Williams et al., 25 Jul 2025, Chernozhukov et al., 2021, Chernozhukov et al., 2021, Lee et al., 8 Jan 2025).