Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bregman Regularized Proximal Point Algorithm

Updated 27 January 2026
  • Bregman Regularized Proximal Point Algorithm is a generalization of the classical proximal point method that uses Bregman divergences to capture problem geometry and constraints.
  • The method supports inexact updates and accelerates convergence from O(1/N) to O(1/N²) through controlled error tolerances and Nesterov-type mixing.
  • It is widely applied in convex optimization, equilibrium problems, unbalanced optimal transport, and stochastic settings, offering both theoretical guarantees and practical computational benefits.

The Bregman Regularized Proximal Point Algorithm is a generalization of the classical proximal point approach for finding zeros of monotone operators or minimizers of convex (and more generally, nonconvex or composite) functions. It leverages Bregman divergences—parameterized by strictly convex, smooth “distance-generating” functions—to regularize the update steps, enabling iterations to better reflect problem geometry and constraints. The Bregman framework underpins methodological advances across convex optimization, equilibrium problems, optimal transport, and large-scale machine learning, offering both theoretical guarantees and practical computational benefits.

1. Foundations: Bregman Divergence and Proximal Updates

Let h:XR{+}h:X\to\mathbb{R}\cup\{+\infty\} be a Legendre function on a convex domain XX (i.e., strictly convex, differentiable on its interior, with h(x)\|\nabla h(x)\|\to\infty at the boundary). The associated Bregman divergence is

Dh(x,y)=h(x)h(y)h(y),xy,x,yint dom h.D_h(x, y) = h(x) - h(y) - \langle\nabla h(y), x - y\rangle, \quad x, y \in \text{int dom } h.

For h(x)=12x2h(x) = \frac{1}{2}\|x\|^2, DhD_h reduces to the squared Euclidean norm; for h(x)=xilogxih(x) = \sum x_i\log x_i, it delivers the Kullback–Leibler (KL) divergence.

The (exact) Bregman proximal point update for minimizing a convex f:XR{+}f : X\to\mathbb{R}\cup\{+\infty\} is given by:

xk+1=argminxX{f(x)+(1/γk)Dh(x,xk)},x^{k+1} = \arg\min_{x \in X} \{f(x) + (1/\gamma_k) D_h(x, x^k)\},

where γk>0\gamma_k > 0 is the stepsize parameter. The first-order optimality condition reads:

0f(xk+1)+(1/γk)[h(xk+1)h(xk)].0 \in \partial f(x^{k+1}) + (1/\gamma_k)[\nabla h(x^{k+1}) - \nabla h(x^k)].

This construction encompasses Euclidean PPA, mirror descent, and entropy-regularized iterations as special cases (Jiang et al., 2022, Zhou et al., 2015).

2. Inexact and Accelerated Bregman Proximal Point Methods

Solving each subproblem exactly is often prohibitively costly or impractical. Inexact variants relax the requirement by allowing a controlled error, typically subject to summability:

0δkf(xk+1)+(1/γk)[h(xk+1)h(xk)],kδk<,0 \in \partial_{\,\delta_k}f(x^{k+1}) + (1/\gamma_k)[\nabla h(x^{k+1}) - \nabla h(x^k)], \quad \sum_k \delta_k < \infty,

where δ\partial_{\,\delta} denotes the δ\delta-subdifferential (Chen et al., 2024, Yang et al., 2021).

Acceleration builds on estimate-sequence or Nesterov-type constructions. Auxiliary sequences {zk}\{z^k\} and mixing weights θk\theta_k are introduced, leading to iterations such as: \begin{align*} yk & = \theta_k zk + (1-\theta_k) xk,\ x{k+1} &\approx \arg\min_{x \in X} \big{ f(x) + (1/\gamma_k) D_h(x, yk) \big},\ z{k+1} & = \arg\min_{x \in X} H_{k+1}(x), \end{align*} with Hk+1H_{k+1} an appropriately defined estimate function. Rates improve from O(1/N)O(1/N) to O(1/Nλ)O(1/N^\lambda), where λ=2\lambda=2 under strong convexity and Lipschitz assumptions, so O(1/N2)O(1/N^2) convergence is attained (Yang et al., 2021, Chen et al., 2024, Yan et al., 2020).

Summary Table (rate and conditions):

Method Required Conditions Complexity Rate
BPPA Convex ff, strongly convex hh O(1/N)O(1/N)
Accelerated Quadratic scaling / Nesterov acceleration O(1/N2)O(1/N^2)
Entropic Joint convexity of DhD_h (e.g. KL) O(1/N)O(1/N)

3. Bregman Proximal Point in Structured and Stochastic Settings

Extensions encompass nonconvex, composite, and stochastic objectives. In composite minimization, the Bregman–proximal–gradient method updates via:

xk+1=argminxX{g(x)+f(xk),x+(1/γk)Dh(x,xk)},x^{k+1} = \arg\min_{x\in X}\{g(x) + \langle\nabla f(x^k), x \rangle + (1/\gamma_k) D_h(x, x^k)\},

where ff is smooth and gg is proximable (Zhou et al., 2015, Guilmeau et al., 2022).

Variance-reduced stochastic algorithms (e.g., SAGA/SVRG-like schemes) apply the Bregman regularization to each stochastic subproblem:

xk+1=argminx{fik(x)ek,x+(1/αk)Dh(x,xk)},x_{k+1} = \arg\min_{x}\{f_{i_k}(x) - \langle e_k, x \rangle + (1/\alpha_k) D_h(x, x_k)\},

where eke_k is a control variate correction ensuring (in expectation) unbiasedness for the global proximal mapping. Such schemes admit sublinear or linear rates depending on convexity and relative smoothness properties (Traoré et al., 18 Oct 2025, Wang et al., 2024).

4. Applications to Unbalanced Optimal Transport

The inexact Bregman proximal point method has demonstrated effectiveness for unbalanced optimal transport (UOT) problems, where the objective is: $\min_{P \ge 0} \langle C, P \rangle + \tau_1 \mathrm{KL}(P \mathbbm{1}_m \| a) + \tau_2 \mathrm{KL}(P^T \mathbbm{1}_n \| b).$ Choosing h(P)=ijPij(logPij1)h(P) = \sum_{ij} P_{ij}(\log P_{ij} - 1) produces a matrix–KL regularization, and the subproblem becomes a generalized Sinkhorn scaling (Chen et al., 2024). The IBPUOT algorithm runs a fixed number (often just one) of internal scaling updates per outer loop and terminates when the inexactness criterion is satisfied:

0δkf(Pk+1)+ϵk[h(Pk+1)h(Pk)].0 \in \partial_{\delta_k} f(P^{k+1}) + \epsilon_k [\nabla h(P^{k+1}) - \nabla h(P^k)].

IBPUOT provably converges to the UOT solution under summable errors, with O(1/N)O(1/N) convergence and complexity essentially matching the true-solution complexity of classical scaling, but with far improved numerical stability for small regularization. The accelerated version AIBPUOT further reduces iteration count through estimate-sequence mixing, yielding O(1/N1+ϵ)O(1/N^{1+\epsilon}) rates (Chen et al., 2024).

5. Theoretical Guarantees and Convergence Rates

The canonical one-step decrease identity underpinning Bregman proximal-point convergence reads:

γk[f(xk+1)f(x)]Dh(x,xk)Dh(x,xk+1)Dh(xk+1,xk),\gamma_k [f(x^{k+1}) - f(x)] \le D_h(x, x^k) - D_h(x, x^{k+1}) - D_h(x^{k+1}, x^k),

for arbitrary feasible xx (Jiang et al., 2022, Zhou et al., 2015, Yan et al., 2020). Upon summing over iterations, this yields telescoping bounds, with immediate consequences:

  • Monotonic descent: f(xk)f(x^k) is nonincreasing.
  • Ergodic/sublinear rate: For constant γk\gamma_k, the suboptimality decays as O(1/k)O(1/k).
  • Quadratic scaling/acceleration: For kernels with triangle/“quadrangle” scaling properties (i.e., Dh((1t)x+ty,(1t)x+tz)tλDh(y,z)D_h((1-t)x+t y, (1-t)x+t z) \le t^\lambda D_h(y, z)), the rate improves to O(1/kλ)O(1/k^\lambda); e.g., λ=2\lambda=2 for strongly convex and smooth hh (Yang et al., 2021, Chen et al., 2024, Yan et al., 2020).

When the inexactness sequence δk\delta_k (from approximate subproblem solutions) is absolutely summable, convergence is preserved, and accelerated methods retain their improved rates under mild scaling-hypotheses (Chen et al., 2024, Yang et al., 2021).

6. Impact of the Divergence Generator and Problem Geometry

The choice of the Bregman kernel hh (“distance-generating function”) critically affects both convergence and the implicit bias of the method. For linear classification with separable data, BPPA with a fixed hh yields:

lim inftminiyiθt/θt,xiμ/Lγ,\liminf_{t\to\infty} \min_{i} y_i \langle \theta_t / \|\theta_t\|, x_i \rangle \ge \sqrt{\mu/L} \gamma_*,

where γ\gamma_* is the maximal margin under the chosen norm, and μ,L\mu, L are strong convexity and smoothness parameters of hh (Li et al., 2021). Thus the “condition number” of hh directly controls the guaranteed margin; ill-conditioning may degrade generalization guarantees.

Further, when hh reflects the manifold or simplex constraints (e.g., entropic regularization, Kullback–Leibler divergence), updates become multiplicative and naturally enforce sparse or simplex-structured solutions, which is advantageous for tasks such as optimal transport or variational inference (Chen et al., 2024, Guilmeau et al., 2022).

7. Extensions: Manifolds, Nonconvexity, and Equilibrium Problems

The Bregman regularized proximal point paradigm extends to Hadamard manifolds (complete simply connected spaces of nonpositive curvature). Here, the Bregman distance is defined in terms of geodesics, and convexity is replaced by geodesic convexity. Under additional boundedness and coercivity conditions on the kernel, convergence to equilibrium solutions can be established despite the local nonconvexity of the Bregman term (Sharma et al., 20 Jan 2026).

Nonconvex and composite problems are handled by replacing ff with locally accurate convex models; line-search and descent conditions ensure convergence to Clarke stationary points under minimal regularity and growth assumptions (Ochs et al., 2017, Wang et al., 2024).


References:

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bregman Regularized Proximal Point Algorithm.