Iterative Bregman Projection in Optimal Transport

Updated 28 April 2026

Iterative Bregman Projection is a framework that solves entropy-regularized optimal transport by alternating KL-divergence projections onto constraint sets.
It underpins algorithms like Sinkhorn–Knopp, providing linear convergence and adaptability for unbalanced, multimarginal, and constrained scenarios.
The method offers practical advantages such as scalable parallel computation, numerical stability, and robust approximation in data science and imaging.

Iterative Bregman projection is a general framework for solving entropy-regularized optimal transport (OT) and OT-like problems by alternately projecting onto constraint sets in the Kullback–Leibler (KL) Bregman divergence. In the context of optimal transport, this yields scalable, parallelizable algorithms—most notably, the Sinkhorn–Knopp algorithm and its extensions—which admit strong convergence guarantees and generalize to diverse constraint structures, regularizations, and practical settings.

1. Entropic Regularization and the Bregman Projection Paradigm

Entropy-regularized OT considers the minimization over couplings $\gamma$ : $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ subject to constraints such as marginals $\gamma 1 = p, \; \gamma^\top 1 = q$ . Here $H(\gamma) = -\sum_{i,j} \gamma_{ij} (\log \gamma_{ij} - 1)$ is negative Shannon entropy and $\varepsilon > 0$ is a regularization parameter. The entropy term induces strict convexity and ensures uniqueness of solutions even for degenerate or high-dimensional settings.

Rewriting the problem as a KL projection,

$\min_{\gamma \in \mathcal{C}} \mathrm{KL}(\gamma \| \xi), \qquad \xi = \exp(-C/\varepsilon)$

where $\mathcal{C}$ is the intersection of constraint sets (e.g., prescribed marginals), and $\mathrm{KL}$ denotes the separable KL-divergence, situates the entire approach within iterative Bregman projections as formalized by Bregman (1967) (Benamou et al., 2014, Kostic et al., 2021, Benamou et al., 2015). Each constraint set typically admits an explicit closed-form projection in the KL metric.

2. Iterative Bregman Projection Algorithms: Sinkhorn–Knopp and Beyond

For balanced OT with two marginals, the original Sinkhorn–Knopp algorithm alternates KL projections onto the affine sets characterized by each marginal: $\mathrm{Proj}_{C_1}^{\mathrm{KL}}(\bar \gamma) = \mathrm{Diag}(p / (\bar \gamma 1))\, \bar\gamma$

$\mathrm{Proj}_{C_2}^{\mathrm{KL}}(\bar \gamma) = \bar\gamma \,\mathrm{Diag}(q / (\bar\gamma^\top 1))$

This motivates the iterative updates: $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 0 with $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 1, converging linearly to the unique minimizer (Benamou et al., 2014, Kostic et al., 2021, Peyré, 1 Feb 2026).

Extensions to multimarginal settings (e.g., for barycenters, strictly correlated electrons, or regularized Brenier flows) or additional affine/convex constraints (partial/unbalanced OT, capacity limits, zero-patterns) proceed by cyclic or greedy projection onto each constraint set in turn (Benamou et al., 2014, Benamou et al., 2015, Kostic et al., 2021, Corless et al., 2024). When constraints are not all affine, Bregman–Dykstra iterations with auxiliary correction variables guarantee convergence.

3. Theoretical Guarantees and Convergence Rates

The efficacy of iterative Bregman projection algorithms is underpinned by strong theoretical guarantees:

Global existence, uniqueness, and linear convergence are guaranteed when all constraint sets are affine and the KL potential is strictly convex (see (Benamou et al., 2014, Kostic et al., 2021, Kostic et al., 2021, Peyré, 1 Feb 2026)). Q-linear convergence rates (i.e., exponential decay of error) are provided for Sinkhorn and Greenhorn (multimarginal) algorithms, under mild regularity assumptions (Kostic et al., 2021).
Sublinear (robust) $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 2 dual objective decay is established when only boundedness and block-structure assumptions are made, and these rates scale only linearly in the inverse regularization, a crucial advantage for approximating the unregularized OT problem (Peyré, 1 Feb 2026).
Locally optimal overrelaxation (as analyzed in overrelaxed Sinkhorn–Knopp (1711.01851)) further accelerates convergence by judiciously blending current and projected iterates, provably preserving monotonicity in a Lyapunov function.

Convergence persists under inexact iterative projections (as in inexact Bregman PPA (Yang et al., 2021, Chen et al., 2024)), with $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 3 or even $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 4 rates for accelerated variants when under appropriate "quadrangle-scaling" conditions.

4. Advanced Variants: Unbalanced, Constrained, and Accelerated Schemes

Iterative Bregman projection underpins a wide array of generalizations:

Unbalanced OT: Instead of hard marginal constraints, KL-penalties on marginal deviations are added, yielding Sinkhorn-like updates with exponents $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 5, ensuring flexibility for empirical histograms or unnormalized data (Chen et al., 2024). Inexact Bregman proximal point methods employ inner scaling iterations for the prox-subproblem, further enhancing computational stability.
Hard constraints: Zero-pattern constraints or capacity limits are incorporated by taking elementwise zeroing or thresholded projections, without breaking convexity or convergence (Corless et al., 2024, Benamou et al., 2014).
Acceleration: Inertial or overrelaxed projections, such as those following Nesterov estimate sequences or overrelaxed projections in dual/log-domain, significantly reduce the iteration count needed to reach high accuracy, while preserving convergence guarantees (1711.01851, Yang et al., 2021, Chen et al., 2024).

A broad family of OT‐related convex programs—including those with general strictly convex separable regularizers, multimarginal potentials (e.g., Coulomb cost), or compositional energies—are thus efficiently solved with iterative Bregman projection frameworks (Takatsu, 2021, Benamou et al., 2015, Benamou et al., 2014).

5. Practical Implementation Details and Complexity

Implementation of iterative Bregman projection exhibits several distinctive features:

Per-iteration complexity for balanced OT scales as $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 6, but extensions to grid-structured costs or sparse kernels can leverage $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 7 or $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 8 computations via FFT or convolution (Benamou et al., 2014).
Numerical stability is maintained by working in log-domain (for scaling vectors $\min_{\gamma \geq 0} \langle C, \gamma \rangle - \varepsilon H(\gamma)$ 9), thresholding small entries, and warm-starting successive runs (e.g., over a sequence of decreasing regularizations) (Benamou et al., 2014, 1711.01851).
Batch and greedy projections: Selective update strategies (as in Batch Greenkhorn) maintain KL-error contractivity while reducing the per-iteration computational burden, especially for multimarginal problems (Kostic et al., 2021).
Highly parallelized computation: Sinkhorn and its generalizations are naturally parallel, well-suited for GPU and distributed environments.

Entropy regularization parameter $\gamma 1 = p, \; \gamma^\top 1 = q$ 0 is the critical tuning knob: lower values yield less blurring but require more iterations and careful numerical safeguards due to rank-deficiency in the kernel matrix.

6. Applications and Impact in Optimal Transport

Iterative Bregman projection has become the canonical computational tool for OT and its generalizations. It enables:

Fast computation of regularized Wasserstein distances, barycenters, and flows on large-scale discrete and continuous data.
Robust and stable approximation of the OT linear program in high-dimensional or empirically degenerate settings, crucial for modern data science and machine learning pipelines.
Extensions to domains with unbalanced, partial, capacity-constrained, multimarginal, or martingale constraints in computational finance, quantum chemistry, imaging, and distributional learning.

The method has thus unified and generalized classic matrix-balancing techniques (Sinkhorn), provided blueprints for new "block-structured" problems (Peyré, 1 Feb 2026), and established a foundation for numerically robust, convergent algorithms across OT-type convex optimization (Benamou et al., 2014, Kostic et al., 2021, Kostic et al., 2021).

7. Martingale Optimal Transport and Bregman Projections

The application to Martingale Optimal Transport (MOT) requires augmenting the standard marginal constraints with linear martingale constraints such as $\gamma 1 = p, \; \gamma^\top 1 = q$ 1. While foundational work (Guo et al., 2017) establishes convergence of discretized LP relaxations, the entropic-regularized and Bregman projection approach is only outlined there; details, including three-block projection cycles (onto row constraints, column constraints, and martingale constraints), dual formulations involving concave envelope computations, and complexity analysis, are found in the regularized OT literature (notably (Benamou et al., 2014)), and are still a subject of active research.

Key References:

"Iterative Bregman Projections for Regularized Transportation Problems" (Benamou et al., 2014)
"Robust Sublinear Convergence Rates for Iterative Bregman Projections" (Peyré, 1 Feb 2026)
"Bregman Proximal Point Algorithm Revisited: A New Inexact Version and its Inertial Variant" (Yang et al., 2021)
"An inexact Bregman proximal point method and its acceleration version for unbalanced optimal transport" (Chen et al., 2024)
"Overrelaxed Sinkhorn-Knopp Algorithm for Regularized Optimal Transport" (1711.01851)
"Convergence of Batch Greenkhorn for Regularized Multimarginal Optimal Transport" (Kostic et al., 2021)
"Computational Methods for Martingale Optimal Transport problems" (Guo et al., 2017)