Bregman ADMM: Advanced Optimization

Updated 13 September 2025

Bregman ADMM is a generalized version of ADMM that replaces quadratic penalties with Bregman divergences to better match the problem's geometry.
It effectively handles convex, nonconvex, and multi-block composite problems, offering improved computational efficiency and convergence guarantees.
Practical applications include optimal transport, image processing, and network analysis, where adaptive penalties drive scalable parallel implementations.

The Bregman Alternating Direction Method of Multipliers (Bregman ADMM or BADMM) is a generalization of the classical ADMM, in which the usual quadratic (Euclidean) penalties and norms in the augmented Lagrangian are replaced by Bregman divergences generated by strictly convex functions. This extension enables the algorithm to adapt to the geometry of specific optimization problems, encompassing convex, nonconvex, and multi-block composite problems, and can provide computational and convergence advantages in structured or large-scale settings.

1. Foundations and Key Concepts

BADMM emerges from the observation that classical ADMM enforces linear constraints via quadratic penalties, which are intimately tied to the choice of the squared Euclidean distance. By replacing the quadratic penalty with a Bregman divergence $B_\phi(u, v) = \phi(u) - \phi(v) - \langle \nabla\phi(v), u-v \rangle$ for some strictly convex, differentiable $\phi$ , BADMM recovers a broader class of splitting algorithms, comparable to how mirror descent generalizes classical gradient descent (Wang et al., 2013).

The algorithm handles problems of the form: $\text{minimize} \quad f(x) + g(z) \quad \text{subject to} \quad Ax + Bz = c$ where $f$ and $g$ are convex (possibly nonsmooth) functions and the constraints may couple high-dimensional or structured variables (including blocks). By appropriate choice of $\phi$ , BADMM instances can be adapted to the feasibility set or the desired structure of the variables.

2. Mathematical Formulation and Algorithmic Steps

BADMM Updates

The main iteration of BADMM for the above problem can be written as follows (Wang et al., 2013):

$x$ -update:

$x_{t+1} = \arg\min_{x \in \mathcal{X}} \left[ f(x) + \langle y_t, Ax + Bz_t - c \rangle + \rho B_\phi(c - Ax, Bz_t) \right]$

$z$ -update:

$z_{t+1} = \arg\min_{z \in \mathcal{Z}} \left[ g(z) + \langle y_t, Ax_{t+1} + Bz - c \rangle + \rho B_\phi(Bz, c - Ax_{t+1}) \right]$

Dual update:

$y_{t+1} = y_t + \rho (Ax_{t+1} + Bz_{t+1} - c)$

Where $B_\phi$ is the Bregman divergence, $\rho>0$ is a penalty parameter, and the placement/order of the Bregman divergence arguments is chosen to ensure convexity in the variable being updated.

A generalized formulation can introduce additional Bregman-proximal terms in each subproblem: $\begin{align*} x_{t+1} &= \arg\min_x \{ f(x) + \cdots + \rho_x B_{\psi_x}(x, x_t) \} \ z_{t+1} &= \arg\min_z \{ g(z) + \cdots + \rho_z B_{\psi_z}(z, z_t) \} \end{align*}$ where $B_{\psi_x}$ and $B_{\psi_z}$ are additional Bregman divergences, further regularizing the updates (Wang et al., 2013).

3. Convergence Theory and Rate Guarantees

For convex problems, BADMM exhibits global convergence to a Karush-Kuhn-Tucker (KKT) point under mild conditions. If the Bregman generator $\phi$ is $\alpha$ -strongly convex, i.e.,

$B_\phi(u, v) \geq \frac{\alpha}{2} \|u - v\|^2$

then the residual (in a composite measure involving subproblem optimality, the Bregman penalty, and constraint violation) is shown to decrease. BADMM achieves an $O(1/T)$ ergodic convergence rate for both the objective gap and feasibility residuals (Wang et al., 2013).

In nonconvex settings, BADMM is shown to converge to stationary points under structural assumptions such as subanalyticity/Kurdyka–Łojasiewicz (K–L) property of the objective, strong convexity of one of the Bregman generators, and appropriate regularity of the constraint matrices (Wang et al., 2014, Wang et al., 2015). The convergence proofs involve construction of an auxiliary Lyapunov function that combines the augmented Lagrangian and squared differences of iterates, and establish that step sizes vanish while the function values decrease. For multi-block cases (more than two separable functions/blocks), these techniques enable convergence guarantees even absent strong convexity, provided a local error-bound holds (Hong et al., 2012), or a K-L inequality is satisfied (Wang et al., 2015).

4. Relations to Classical and Other ADMM Variants

BADMM generalizes and unifies several ADMM-like methods:

Mirror Descent Connection: With KL divergence or entropic penalties, the update steps coincide with mirror-descent or exponentiated gradient moves, providing geometry-adapted updates (Wang et al., 2013).
Special Cases: When the Bregman divergence is a squared norm, classical ADMM is recovered.
Generalized/Proximal/Inexact ADMM: These can be formulated as special BADMM instances by appropriate choice of Bregman generator and extra proximal terms.
Bethe ADMM, Exponential Multiplier Methods: BADMM provides a covering framework for ADMM variants used in graphical models and exponential family settings (Ma et al., 10 Sep 2025).

A rigorous equivalence between Bregman ADMM and Bregman Douglas–Rachford splitting (BDRS) when applied to the dual problem is established, extending the known equivalence between classical ADMM and Douglas–Rachford splitting (Ma et al., 10 Sep 2025).

5. Practical Implementation and Applications

Parallelism and Structure Exploitation

BADMM is particularly effective in large-scale problems or when the geometry of the feasible set is non-Euclidean:

Mass Transportation/Optimal Transport: By using a KL divergence as the penalty, closed-form multiplicative updates are derived for each entry in the transport plan, leading to massive parallelism and efficient GPU implementations. BADMM matches or outperforms highly optimized LP solvers (e.g., Gurobi) on large instances, especially in memory usage and scalability (Wang et al., 2013).
Image and Signal Processing: BADMM and its nonconvex multi-block extensions efficiently handle problems with sparse, low-rank, or structured penalties, including robust principal component analysis, video background subtraction, and compressed sensing (Wang et al., 2014, Wang et al., 2015).
Combinatorial and Networked Problems: Extensions to nonconvex graph partitioning, community detection, and other nonconvex discrete problems have been developed, leveraging BADMM's flexible update structure (Sun et al., 2018).

Adaptive and Stochastic Variants

BADMM can be adapted for stochastic settings (Zhao et al., 2013), where the Bregman generator is updated adaptively (e.g., using second-order or AdaGrad-type statistics), yielding regret guarantees comparable to the best proximal function in hindsight. Adaptive penalty schemes (e.g., Barzilai–Borwein or residual balancing) can further speed convergence in nonconvex or ill-conditioned scenarios (Xu et al., 2016).

6. Nonconvex Multi-Block Bregman ADMM

In multi-block problems,

$\text{minimize} \ \sum_{i=1}^N f_i(x_i) \quad \text{subject to} \ \sum_{i=1}^N A_i x_i = 0$

BADMM introduces Bregman divergence regularization in each block update: $x_i^{k+1} = \arg\min_{x_i} \{ L_\alpha(x_1^{k+1},...,x_i,...,x_N^k,p^k) + \Delta_{\phi_i}(x_i,x_i^k) \}$ with a suitably modified augmented Lagrangian and per-block Bregman terms (Wang et al., 2015). Under KL and strong convexity properties, these iterates converge globally to stationary points, enabling BADMM to solve high-dimensional, nonconvex constraints with convergence guarantees.

7. Outlook and Implications

BADMM and its modern developments provide a systematic and theoretically grounded approach for large-scale, structured, and nonconvex optimization problems. Key advantages include:

Flexibility in matching the penalty geometry to the problem's feasible region (e.g., simplex, cone, statistical manifold).
The ability to obtain closed-form or highly parallelizable updates (essential for GPU and distributed computing).
Theoretical convergence for classes of nonconvex or multi-block problems under KL or error-bound assumptions.
Direct unification with other operator splitting and monotone inclusion methods, such as Bregman Douglas–Rachford splitting (Ma et al., 10 Sep 2025).

Future work is focused on extending BADMM to broader classes of nonconvex objectives, adaptive penalty schemes, and further integration with scalable and parallel algorithmic infrastructures for scientific and machine learning problems.