Bregman-Riesz Regression

Updated 21 October 2025

Bregman-Riesz Regression is a unified framework that combines Bregman divergences with Riesz representation to directly estimate linear functionals in high-dimensional settings.
It reframes regression as an empirical risk minimization problem, integrating flexible machine learning models with robust loss functions.
The approach offers practical benefits such as robust debiasing, efficient algorithmic implementations like RieszBoost, and strong theoretical guarantees.

Bregman-Riesz Regression is a unified methodology for regression and semiparametric inference that leverages Bregman divergences and the Riesz representation theorem to estimate linear functionals (such as causal effects, policy impacts, and density ratios) in high-dimensional settings. This paradigm accommodates flexible machine learning architectures, robust loss functions, and structured data augmentation, offering both theoretical rigor and computational tractability. It subsumes numerous classical and modern approaches, including direct density ratio estimation, debiased/doubly robust machine learning, and sparse inverse problem solvers, by framing them as empirical risk minimization problems governed by Bregman geometry and Riesz representers.

1. Foundations: Bregman Divergences, Riesz Representation, and the Unified Risk

Bregman divergences are defined via a strictly convex, differentiable generator $F$ : for random variables $X$ ,

$D_F(g_0 \Vert g) = \mathbb{E}[ F(g_0(X)) - F(g(X)) - F'(g(X))(g_0(X) - g(X)) ]$

where $g_0$ is the target function and $g$ is the candidate estimate.

The Riesz representation theorem ensures that any continuous linear functional $H$ on an $L^2$ space can be expressed as

$H(f) = \langle f, \alpha_0 \rangle$

where $\alpha_0$ is the unique Riesz representer, often interpreted as a weighting or density ratio function.

Bregman-Riesz regression unites these frameworks by formulating the regression or learning task as empirical risk minimization over the Bregman divergence associated with the Riesz representer. The canonical risk takes the form:

$\mathcal{R}_F(\alpha) = \mathbb{E}_{P_0}[ F'(\alpha(X)) \alpha(X) - F(\alpha(X)) ] - H(F' \circ \alpha)$

where $P_0$ denotes the observed distribution, and $F'$ is the derivative of the generator. The minimizer $\alpha^*$ coincides with the true density ratio or Riesz representer associated with $H$ (Hines et al., 17 Oct 2025).

2. Direct Density Ratio Learning and Its Connection to Riesz Regression

Density ratios $p_1(x)/p_0(x)$ are central in causal inference, importance sampling, and other domains. Classical estimation is unstable due to the curse of dimensionality and potential division by near-zero denominators. Bregman-Riesz regression replaces this with direct estimation via empirical risk minimization:

Least Squares Importance Fitting (uLSIF), Kullback-Leibler Importance Estimation (KLIEP), and score matching are all instantiations with specific choices of $F$ (quadratic, log-exp, etc.).
Probabilistic classification methods estimate the density ratio as the odds:

$\alpha_0(x) = \frac{\mathbb{P}(\Delta=1 \mid X=x)}{\mathbb{P}(\Delta=0 \mid X=x)}$

for augmented samples drawn from both numerator and denominator distributions (Hines et al., 17 Oct 2025).

The unified risk minimization framework allows switching between Bregman divergences to control robustness and tail behavior, e.g., using Negative Binomial or Itakura-Saito divergences in regions of poor overlap.

Data augmentation techniques enable application in causal inference where the numerator distribution (e.g. a policy intervention) is unobserved. By generating synthetic samples via interventions or matched permutations, the empirical risk can be estimated as:

$\mathcal{R}_F(\alpha) = 2 \mathbb{E}_Q\big[(1-\Delta)[F'(\alpha(X))\alpha(X) - F(\alpha(X))] - \Delta F'(\alpha(X))\big]$

allowing for flexible reweighting and inference (Hines et al., 17 Oct 2025).

3. Debiased and Doubly Robust Estimation Using Riesz Representers

Semiparametric efficient estimation for causal quantities often requires debiasing plug-in estimators. The efficient influence function (EIF) for an estimand can always be expressed in terms of the regression and the Riesz representer:

$\psi(W, \gamma, \alpha, \theta) = m(W, \gamma) - \theta + \alpha(X) (Y - \gamma(X))$

where $m$ encodes the parameter of interest (e.g. mean difference), $\gamma$ the regression, $\alpha$ the Riesz representer, and $\psi$ the orthogonal score.

Automatic estimation of the Riesz representer is performed via minimization:

$\alpha_0 = \operatorname{argmin}_\alpha \mathbb{E}[ -2 m(W, \alpha) + \alpha(X)^2 ]$

This approach removes the need for closed-form derivation (which can be brittle or intractable), permits direct machine learning-based estimation using neural nets, random forests, or boosting, and yields doubly robust estimators when used with sample splitting or cross-fitting (Chernozhukov et al., 2021, Chernozhukov et al., 2018, Williams et al., 25 Jul 2025).

4. Algorithmic and Optimization Frameworks

A variety of algorithmic frameworks implement Bregman-Riesz regression:

Gradient boosting (RieszBoost) minimizes the Riesz loss over an augmented dataset, offering computational efficiency and robust handling of positivity violations (Lee et al., 8 Jan 2025).
Adversarial min-max frameworks construct minimax objectives whose population limit is directly proportional to the Bregman divergence between candidate Riesz representers and the truth; for example:

$\widehat\alpha = \arg\min_\alpha \max_{f \in \mathcal{F}}\big\{ \mathbb{E}[m(Z; f) - \alpha(X) f(X)] - \|f\|^2_2 \big\}$

yielding mean-square error bounds in terms of the critical radius of the function space (Chernozhukov et al., 2020).

Linearized Bregman methods and split feasibility algorithms treat the regularization term as the Bregman generator, using soft-thresholding (proximal mapping) to promote sparsity or other structure (Lorenz et al., 2013, Dai et al., 15 Apr 2024).
Exact continuous relaxations of $\ell_0$ penalties via Bregman divergences preserve sparsity and allow efficient non-convex optimization while maintaining fidelity to the original objective (Essafri et al., 19 Mar 2025).

5. Simulation, Empirical, and Software Results

Empirical studies across causal parameters (ATE, ATT, ASE, LASE), policy effects, and stabilized weights demonstrate:

Choice of Bregman divergence alters estimator stability: Negative Binomial or Itakura-Saito losses outperform least squares in low-overlap settings or when density ratios are extreme.
Data augmentation (permutations, train-time sampling) improves performance in stabilized weight estimation for dose-response curves.
RieszBoost exhibits lower RMSE and confidence interval width compared to indirect inverse-propensity approaches, especially in high-dimensional or tabular data.
Open source software (https://github.com/CI-NYC/densityratios) provides ready-to-use implementations for gradient boosting, neural nets, and kernel methods with built-in losses and augmentation schemes (Hines et al., 17 Oct 2025, Lee et al., 8 Jan 2025).

6. Theoretical Guarantees and Extensions

The Bregman-Riesz regression framework provides:

Uniform finite-sample mean square error bounds and Berry–Esseen type normality guarantees, even in nonregular or nonparametric settings (Chernozhukov et al., 2018, Chernozhukov et al., 2021).
Double sparsity and rate robustness: As long as either the regression or the representer is estimated sufficiently accurately (and the product of errors vanishes at the $\sqrt{n}$ rate), valid inference is obtained (Chernozhukov et al., 2018, Chernozhukov et al., 2020).
Asymptotic efficient estimation via Neyman orthogonal scores, supporting inference using any sufficiently rich machine learner for nuisance parameters.
Robustness to practical positivity violations and adaptability to complex intervention models and loss functions through flexible choice of Bregman generator and data augmentation.

7. Relation to Broader Methodologies and Applications

Bregman-Riesz regression subsumes and generalizes:

Classical Bregman divergence minimization (mirror descent, KLIEP, uLSIF, score matching).
Efficient semiparametric estimation (TMLE, AIPW, double machine learning).
Sparse inverse problem solvers (linearized Bregman, split feasibility, L0/Bregman relaxations).
Flexible density ratio estimation for reinforcement learning, diffusion modeling, and covariate shift adaptation.

The approach naturally accommodates advanced neural architectures, boosting algorithms, and kernel regression, while permitting robust statistical guarantees across diverse structural settings in causal inference, high-dimensional regression, and nonlinear functionals.

In summary, Bregman-Riesz regression provides a theoretically grounded and practically flexible framework for direct estimation of weighting functions and causal quantities, robustly integrating Bregman geometry, Riesz theory, dynamic data augmentation, and empirically validated machine learning methods. Its unified risk minimization perspective facilitates the design of efficient, adaptable, and scalable estimators for modern statistical and machine learning problems encompassing causal inference, density ratio estimation, and structured regression (Hines et al., 17 Oct 2025, Lee et al., 8 Jan 2025, Chernozhukov et al., 2021, Chernozhukov et al., 2018, Lorenz et al., 2013, Dai et al., 15 Apr 2024, Williams et al., 25 Jul 2025, Essafri et al., 19 Mar 2025, Chernozhukov et al., 2020, Nock et al., 2016, Benning et al., 2017, Laude et al., 2019, Rauh et al., 2020).