Bregman Regularized Proximal Point Algorithm

Updated 27 January 2026

Bregman Regularized Proximal Point Algorithm is a generalization of the classical proximal point method that uses Bregman divergences to capture problem geometry and constraints.
The method supports inexact updates and accelerates convergence from O(1/N) to O(1/N²) through controlled error tolerances and Nesterov-type mixing.
It is widely applied in convex optimization, equilibrium problems, unbalanced optimal transport, and stochastic settings, offering both theoretical guarantees and practical computational benefits.

The Bregman Regularized Proximal Point Algorithm is a generalization of the classical proximal point approach for finding zeros of monotone operators or minimizers of convex (and more generally, nonconvex or composite) functions. It leverages Bregman divergences—parameterized by strictly convex, smooth “distance-generating” functions—to regularize the update steps, enabling iterations to better reflect problem geometry and constraints. The Bregman framework underpins methodological advances across convex optimization, equilibrium problems, optimal transport, and large-scale machine learning, offering both theoretical guarantees and practical computational benefits.

1. Foundations: Bregman Divergence and Proximal Updates

Let $h:X\to\mathbb{R}\cup\{+\infty\}$ be a Legendre function on a convex domain $X$ (i.e., strictly convex, differentiable on its interior, with $\|\nabla h(x)\|\to\infty$ at the boundary). The associated Bregman divergence is

$D_h(x, y) = h(x) - h(y) - \langle\nabla h(y), x - y\rangle, \quad x, y \in \text{int dom } h.$

For $h(x) = \frac{1}{2}\|x\|^2$ , $D_h$ reduces to the squared Euclidean norm; for $h(x) = \sum x_i\log x_i$ , it delivers the Kullback–Leibler (KL) divergence.

The (exact) Bregman proximal point update for minimizing a convex $f : X\to\mathbb{R}\cup\{+\infty\}$ is given by:

$x^{k+1} = \arg\min_{x \in X} \{f(x) + (1/\gamma_k) D_h(x, x^k)\},$

where $\gamma_k > 0$ is the stepsize parameter. The first-order optimality condition reads:

$0 \in \partial f(x^{k+1}) + (1/\gamma_k)[\nabla h(x^{k+1}) - \nabla h(x^k)].$

This construction encompasses Euclidean PPA, mirror descent, and entropy-regularized iterations as special cases (Jiang et al., 2022, Zhou et al., 2015).

2. Inexact and Accelerated Bregman Proximal Point Methods

Solving each subproblem exactly is often prohibitively costly or impractical. Inexact variants relax the requirement by allowing a controlled error, typically subject to summability:

$0 \in \partial_{\,\delta_k}f(x^{k+1}) + (1/\gamma_k)[\nabla h(x^{k+1}) - \nabla h(x^k)], \quad \sum_k \delta_k < \infty,$

where $\partial_{\,\delta}$ denotes the $\delta$ -subdifferential (Chen et al., 2024, Yang et al., 2021).

Acceleration builds on estimate-sequence or Nesterov-type constructions. Auxiliary sequences $\{z^k\}$ and mixing weights $\theta_k$ are introduced, leading to iterations such as: \begin{align*} y^k & = \theta_k z^k + (1-\theta_k) x^k,\ x^{k+1} &\approx \arg\min_{x \in X} \big{ f(x) + (1/\gamma_k) D_h(x, y^k) \big},\ z^{k+1} & = \arg\min_{x \in X} H_{k+1}(x), \end{align*} with $H_{k+1}$ an appropriately defined estimate function. Rates improve from $O(1/N)$ to $O(1/N^\lambda)$ , where $\lambda=2$ under strong convexity and Lipschitz assumptions, so $O(1/N^2)$ convergence is attained (Yang et al., 2021, Chen et al., 2024, Yan et al., 2020).

Summary Table (rate and conditions):

Method	Required Conditions	Complexity Rate
BPPA	Convex $f$ , strongly convex $h$	$O(1/N)$
Accelerated	Quadratic scaling / Nesterov acceleration	$O(1/N^2)$
Entropic	Joint convexity of $D_h$ (e.g. KL)	$O(1/N)$

3. Bregman Proximal Point in Structured and Stochastic Settings

Extensions encompass nonconvex, composite, and stochastic objectives. In composite minimization, the Bregman–proximal–gradient method updates via:

$x^{k+1} = \arg\min_{x\in X}\{g(x) + \langle\nabla f(x^k), x \rangle + (1/\gamma_k) D_h(x, x^k)\},$

where $f$ is smooth and $g$ is proximable (Zhou et al., 2015, Guilmeau et al., 2022).

Variance-reduced stochastic algorithms (e.g., SAGA/SVRG-like schemes) apply the Bregman regularization to each stochastic subproblem:

$x_{k+1} = \arg\min_{x}\{f_{i_k}(x) - \langle e_k, x \rangle + (1/\alpha_k) D_h(x, x_k)\},$

where $e_k$ is a control variate correction ensuring (in expectation) unbiasedness for the global proximal mapping. Such schemes admit sublinear or linear rates depending on convexity and relative smoothness properties (Traoré et al., 18 Oct 2025, Wang et al., 2024).

4. Applications to Unbalanced Optimal Transport

The inexact Bregman proximal point method has demonstrated effectiveness for unbalanced optimal transport (UOT) problems, where the objective is: $\min_{P \ge 0} \langle C, P \rangle + \tau_1 \mathrm{KL}(P \mathbbm{1}_m \| a) + \tau_2 \mathrm{KL}(P^T \mathbbm{1}_n \| b).$ Choosing $h(P) = \sum_{ij} P_{ij}(\log P_{ij} - 1)$ produces a matrix–KL regularization, and the subproblem becomes a generalized Sinkhorn scaling (Chen et al., 2024). The IBPUOT algorithm runs a fixed number (often just one) of internal scaling updates per outer loop and terminates when the inexactness criterion is satisfied:

$0 \in \partial_{\delta_k} f(P^{k+1}) + \epsilon_k [\nabla h(P^{k+1}) - \nabla h(P^k)].$

IBPUOT provably converges to the UOT solution under summable errors, with $O(1/N)$ convergence and complexity essentially matching the true-solution complexity of classical scaling, but with far improved numerical stability for small regularization. The accelerated version AIBPUOT further reduces iteration count through estimate-sequence mixing, yielding $O(1/N^{1+\epsilon})$ rates (Chen et al., 2024).

5. Theoretical Guarantees and Convergence Rates

The canonical one-step decrease identity underpinning Bregman proximal-point convergence reads:

$\gamma_k [f(x^{k+1}) - f(x)] \le D_h(x, x^k) - D_h(x, x^{k+1}) - D_h(x^{k+1}, x^k),$

for arbitrary feasible $x$ (Jiang et al., 2022, Zhou et al., 2015, Yan et al., 2020). Upon summing over iterations, this yields telescoping bounds, with immediate consequences:

Monotonic descent: $f(x^k)$ is nonincreasing.
Ergodic/sublinear rate: For constant $\gamma_k$ , the suboptimality decays as $O(1/k)$ .
Quadratic scaling/acceleration: For kernels with triangle/“quadrangle” scaling properties (i.e., $D_h((1-t)x+t y, (1-t)x+t z) \le t^\lambda D_h(y, z)$ ), the rate improves to $O(1/k^\lambda)$ ; e.g., $\lambda=2$ for strongly convex and smooth $h$ (Yang et al., 2021, Chen et al., 2024, Yan et al., 2020).

When the inexactness sequence $\delta_k$ (from approximate subproblem solutions) is absolutely summable, convergence is preserved, and accelerated methods retain their improved rates under mild scaling-hypotheses (Chen et al., 2024, Yang et al., 2021).

6. Impact of the Divergence Generator and Problem Geometry

The choice of the Bregman kernel $h$ (“distance-generating function”) critically affects both convergence and the implicit bias of the method. For linear classification with separable data, BPPA with a fixed $h$ yields:

$\liminf_{t\to\infty} \min_{i} y_i \langle \theta_t / \|\theta_t\|, x_i \rangle \ge \sqrt{\mu/L} \gamma_*,$

where $\gamma_*$ is the maximal margin under the chosen norm, and $\mu, L$ are strong convexity and smoothness parameters of $h$ (Li et al., 2021). Thus the “condition number” of $h$ directly controls the guaranteed margin; ill-conditioning may degrade generalization guarantees.

Further, when $h$ reflects the manifold or simplex constraints (e.g., entropic regularization, Kullback–Leibler divergence), updates become multiplicative and naturally enforce sparse or simplex-structured solutions, which is advantageous for tasks such as optimal transport or variational inference (Chen et al., 2024, Guilmeau et al., 2022).

7. Extensions: Manifolds, Nonconvexity, and Equilibrium Problems

The Bregman regularized proximal point paradigm extends to Hadamard manifolds (complete simply connected spaces of nonpositive curvature). Here, the Bregman distance is defined in terms of geodesics, and convexity is replaced by geodesic convexity. Under additional boundedness and coercivity conditions on the kernel, convergence to equilibrium solutions can be established despite the local nonconvexity of the Bregman term (Sharma et al., 20 Jan 2026).

Nonconvex and composite problems are handled by replacing $f$ with locally accurate convex models; line-search and descent conditions ensure convergence to Clarke stationary points under minimal regularity and growth assumptions (Ochs et al., 2017, Wang et al., 2024).

References:

(Chen et al., 2024) An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport.
(Yang et al., 2021) Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Inertial Variant.
(Jiang et al., 2022) Bregman three-operator splitting methods.
(Zhou et al., 2015) A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm.
(Traoré et al., 18 Oct 2025) Bregman Stochastic Proximal Point Algorithm with Variance Reduction.
(Wang et al., 2024) A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems.
(Sharma et al., 20 Jan 2026) A Bregman Regularized Proximal Point Method for Solving Equilibrium Problems on Hadamard Manifolds.
(Guilmeau et al., 2022) Regularized Rényi divergence minimization through Bregman proximal gradient algorithms.
(Ochs et al., 2017) Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms.
(Yan et al., 2020) Bregman Augmented Lagrangian and Its Acceleration.
(Li et al., 2021) Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data.

Markdown Upgrade to Chat

References (11)

Bregman three-operator splitting methods (2022)

A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm (2015)

An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport (2024)

Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Inertial Variant (2021)

Bregman Augmented Lagrangian and Its Acceleration (2020)

Regularized Rényi divergence minimization through Bregman proximal gradient algorithms (2022)

Bregman Stochastic Proximal Point Algorithm with Variance Reduction (2025)

A Bregman Proximal Stochastic Gradient Method with Extrapolation for Nonconvex Nonsmooth Problems (2024)

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data (2021)

10.

A Bregman Regularized Proximal Point Method for Solving Equilibrium Problems on Hadamard Manifolds (2026)

11.

Non-smooth Non-convex Bregman Minimization: Unification and new Algorithms (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bregman Regularized Proximal Point Algorithm.