Sinkhorn Iterations in Optimal Transport

Updated 18 May 2026

Sinkhorn iterations are defined as alternating scaling updates that compute entropy-regularized optimal transport couplings with prescribed marginals.
They leverage mirror descent and block coordinate methods to yield geometric convergence and maintain numerical stability.
The algorithm is widely applied in machine learning, inverse problems, and optimal transport, with accelerated variants enhancing performance.

Sinkhorn iterations refer to a class of alternating matrix (or operator) scaling algorithms that compute entropy-regularized couplings in optimal transport (OT), as well as projections onto polytopes of matrices/tensors with prescribed marginals. This iterative proportional fitting paradigm realizes efficient solutions to entropy-regularized OT, is fundamental in computational and statistical optimal transport, and finds deep connections to mirror descent, stochastic processes, matrix analysis, PDEs, and scalable regularized linear algebra.

1. Mathematical Formulation and Canonical Algorithm

Given two discrete probability vectors $\mu \in \Delta_n$ , $\nu \in \Delta_m$ , and a cost matrix $C \in \mathbb{R}^{n \times m}$ , the entropy-regularized OT problem is

$\min_{\pi \in \Pi(\mu, \nu)} \sum_{i,j} C_{ij}\,\pi_{ij} + \varepsilon \sum_{i,j} \pi_{ij} (\log \pi_{ij} - 1),$

where $\Pi(\mu, \nu) = \{\pi \geq 0 : \pi \mathbf{1} = \mu,\, \pi^T \mathbf{1} = \nu\}$ is the polytope of couplings (Karimi et al., 2023, Karlsson et al., 2016, Genevay et al., 2017).

The scaling fixed point characterization follows: define the Gibbs kernel $K = \exp(-C/\varepsilon)$ elementwise. The optimal plan $\pi^* = \operatorname{diag}(u)\,K\,\operatorname{diag}(v)$ for some vectors $u > 0$ , $v > 0$ such that $u$ and $\nu \in \Delta_m$ 0 solve

$\nu \in \Delta_m$ 1

The Sinkhorn iterations then alternate these updates: $\nu \in \Delta_m$ 2 where division is elementwise (Karimi et al., 2023, Genevay et al., 2017).

2. Mirror Descent and Optimization-Theoretic Perspective

Sinkhorn iterations arise as alternating Bregman projections or mirror descent in the Kullback–Leibler geometry (Karimi et al., 2023, Chizat et al., 2023). The function $\nu \in \Delta_m$ 3 is convex over couplings, and the classical iteration may be interpreted as block coordinate mirror descent with step size $\nu \in \Delta_m$ 4 alternating projection onto row- and column-marginal sets. Discrete-time updates for the dual potentials (logarithms of $\nu \in \Delta_m$ 5 and $\nu \in \Delta_m$ 6) yield explicit forms: $\nu \in \Delta_m$ 7 with the primal-dual structure making the iteration amenable to convex analysis and stochastic approximation, and allowing for flexible discretizations and generalizations (Karimi et al., 2023).

3. Continuous-Time Sinkhorn Flow, PDE Limit, and Links to Wasserstein Dynamics

As the step size vanishes, time-rescaled Sinkhorn iterations converge to a continuous-time flow. The “Sinkhorn flow” satisfies: $\nu \in \Delta_m$ 8 A dual potential flow perspective (in terms of Schrödinger potentials) gives: $\nu \in \Delta_m$ 9 with the coupling reconstructed as $C \in \mathbb{R}^{n \times m}$ 0 (Karimi et al., 2023, Deb et al., 2023).

This continuous-time limit has been rigorously analyzed as a Wasserstein mirror gradient flow, interpolating between relative entropy (KL) and quadratic cost via the mirror functional $C \in \mathbb{R}^{n \times m}$ 1. The dynamics can be recast as a parabolic Monge--Ampère PDE in the convex potential $C \in \mathbb{R}^{n \times m}$ 2: $C \in \mathbb{R}^{n \times m}$ 3 again highlighting the synergy between discrete iteration and transport PDEs (Deb et al., 2023).

4. Convergence, Complexity, and Phase Transition Phenomena

In the classical square matrix scaling (“Sinkhorn–Knopp”) context, alternating normalization converges linearly in the Hilbert projective metric, and $C \in \mathbb{R}^{n \times m}$ 4 iterations suffice for dense matrices to achieve error $C \in \mathbb{R}^{n \times m}$ 5 (He, 13 Jul 2025, Genevay et al., 2017). However, for matrices with subcritical density $C \in \mathbb{R}^{n \times m}$ 6, the required iteration count can degrade to $C \in \mathbb{R}^{n \times m}$ 7, revealing a sharp phase transition (He, 13 Jul 2025).

For entropy-regularized OT, non-asymptotic rates show geometric convergence in the potentials and in Wasserstein and relative entropy metrics. These exponential rates extend to non-compact and weakly convex settings under log-concavity assumptions, and adapt to log domain implementations for numerical stability (Conforti et al., 2023, Greco et al., 2023, Conforti et al., 2023).

Complexity per iteration is dominated by two matrix–vector multiplies, $C \in \mathbb{R}^{n \times m}$ 8 for dense kernels, but can be reduced to $C \in \mathbb{R}^{n \times m}$ 9 using structured kernels and positive feature approximations (Scetbon et al., 2020, Liao et al., 2022). Damped and stochastic variants further enhance robustness and numerical behavior in practice (Chizat et al., 2023, Karimi et al., 2023).

5. Generalizations and Connections to Broader Mathematical Structures

Sinkhorn-type iterations generalize to a wide class of scaling and constraint imposition problems:

Generalized Sinkhorn: Extensions compute the proximal operator of regularized transport and appear in proximal splitting for inverse problems (Karlsson et al., 2016).
Operator Scaling and Sinkhorn: Non-commutative (operator) versions, e.g., “Operator Sinkhorn Iteration,” enable scaling for completely positive maps, with applications to quantum information and combinatorial optimization (Eisenmann et al., 13 Mar 2026, Franks et al., 2022).
Continuous, Torus, and Gaussian settings: The algorithm adapts to $\min_{\pi \in \Pi(\mu, \nu)} \sum_{i,j} C_{ij}\,\pi_{ij} + \varepsilon \sum_{i,j} \pi_{ij} (\log \pi_{ij} - 1),$ 0 with unbounded cost, the torus $\min_{\pi \in \Pi(\mu, \nu)} \sum_{i,j} C_{ij}\,\pi_{ij} + \varepsilon \sum_{i,j} \pi_{ij} (\log \pi_{ij} - 1),$ 1 (with spectral and HJB techniques), and closed-form Riccati flows in Gaussian models (Akyildiz et al., 2024, Conforti et al., 2023, Greco et al., 2023).

Table: Key Sinkhorn Variants and Domains | Domain/Problem | Features | Source | |--------------------------|-----------------------------------|-----------------| | Discrete OT (classical) | Alternating scaling, matrix form | (Genevay et al., 2017, Karimi et al., 2023) | | Operator scaling | Hilbert-metric geodesics | (Eisenmann et al., 13 Mar 2026, Franks et al., 2022) | | Continuous/Quadratic OT | PDE, parabolic Monge-Ampère | (Conforti et al., 2023, Deb et al., 2023) | | Gaussian models | Riccati/Kalman recursions | (Akyildiz et al., 2024) |

6. Algorithms, Stabilization, and Accelerated Schemes

The canonical (classical) implementation is numerically stable for moderate regularization but may require stabilization in the small- $\min_{\pi \in \Pi(\mu, \nu)} \sum_{i,j} C_{ij}\,\pi_{ij} + \varepsilon \sum_{i,j} \pi_{ij} (\log \pi_{ij} - 1),$ 2 regime due to floating point under/overflow. Log-domain Sinkhorn and periodic centering of dual potentials are standard (Karlsson et al., 2016, Genevay et al., 2017, Wu et al., 6 Feb 2025). Accelerated and hybrid algorithms blend Sinkhorn with sparse Newton steps, yielding super-exponential convergence when approaching the OT vertex (Tang et al., 2024, Wu et al., 6 Feb 2025), and variants—such as damped Sinkhorn, inexact Sinkhorn, and projected-gradient generalizations—enhance stability and extend applicability, e.g., to vector quantile regression (Karimi et al., 2023, Kato et al., 23 Mar 2026).

7. Applications and Extensions

Sinkhorn iterations underpin scalable computation in numerous domains:

Machine Learning: As differentiable “Sinkhorn layers” within deep architectures for generative models, with full support for back-propagation and GPU parallelism (Genevay et al., 2017, Scetbon et al., 2020).
Inverse Problems: As a black-box proximal operator in splitting schemes for inverse tomography and image reconstruction (Karlsson et al., 2016).
Adversarial Robustness: Projection onto Wasserstein balls is efficiently solved using projected Sinkhorn variants (Wong et al., 2019).
Stochastic Control and Schrödinger Bridges: Iterated Sinkhorn yields discrete-time approximations to Schrödinger bridges, with finite-sample error bounds combining statistical and algorithmic error (Maeda et al., 26 Oct 2025).
Barycenters and Vector Quantile Regression: Integration with barycentric OT and high-dimensional regression, with rigorous convergence under non-classical constraints (Chizat et al., 2023, Kato et al., 23 Mar 2026).

A plausible implication is that the versatility and scalability of Sinkhorn-type iterations make them a central primitive for regularized transport in modern statistics, generative modeling, theoretical computer science, and applied mathematics.

References

(Karimi et al., 2023) Sinkhorn Flow: A Continuous-Time Framework for Understanding and Generalizing the Sinkhorn Algorithm
(He, 13 Jul 2025) Phase transition of the Sinkhorn-Knopp algorithm
(Genevay et al., 2017) Learning Generative Models with Sinkhorn Divergences
(Karlsson et al., 2016) Generalized Sinkhorn iterations for regularizing inverse problems using optimal mass transport
(Conforti et al., 2023) Quantitative contraction rates for Sinkhorn algorithm: beyond bounded costs and compact marginals
(Chizat et al., 2023) Computational Guarantees for Doubly Entropic Wasserstein Barycenters via Damped Sinkhorn Iterations
(Deb et al., 2023) Wasserstein Mirror Gradient Flow as the limit of the Sinkhorn Algorithm
(Akyildiz et al., 2024) Gaussian entropic optimal transport: Schrödinger bridges and the Sinkhorn algorithm
(Tang et al., 2024) Accelerating Sinkhorn Algorithm with Sparse Newton Iterations
(Scetbon et al., 2020) Linear Time Sinkhorn Divergences using Positive Features
(Wu et al., 6 Feb 2025) PINS: Proximal Iterations with Sparse Newton and Sinkhorn for Optimal Transport
(Kato et al., 23 Mar 2026) Sinkhorn algorithms for entropic vector quantile regression
(Liao et al., 2022) Fast Sinkhorn II: Collinear Triangular Matrix and Linear Time Accurate Computation of Optimal Transport
(Eisenmann et al., 13 Mar 2026) Numerically stable variants of overrelaxation for operator Sinkhorn iteration
(Franks et al., 2022) Shrunk subspaces via operator Sinkhorn iteration
(Wong et al., 2019) Wasserstein Adversarial Examples via Projected Sinkhorn Iterations
(Nathanson, 2019) Matrix scaling limits in finitely many iterations