Entropic Sinkhorn Algorithm in Optimal Transport

Updated 22 February 2026

The entropic Sinkhorn algorithm is a scalable iterative method that computes entropy-regularized optimal transport plans by alternating matrix scaling to enforce marginal constraints.
It employs a dual formulation with Gibbs kernels to ensure a unique, strictly positive solution with smooth, well-conditioned derivatives, compatible with automatic differentiation.
Innovations like warm-starts, sparse Newton steps, and decentralized schemes have accelerated convergence and extended its application to high-dimensional, unbalanced, and constrained settings.

The entropic Sinkhorn algorithm is a scalable, iterative matrix scaling method that computes entropy-regularized optimal transport (OT) plans between discrete or continuous measures. It underpins high-dimensional and machine learning applications of OT by providing strongly convex surrogates to the classical Kantorovich problem, enabling efficient convergence, well-conditioned derivatives, and compatibility with modern autodiff. The method has been extended to unbalanced OT, constrained OT, decentralized and online computation, and has deep connections to PDEs, control, and statistical physics.

1. Entropic Optimal Transport: Formulation and Duality

Given discrete marginals $a \in \Delta^{n-1}$ , $b \in \Delta^{m-1}$ (i.e., vectors of strictly positive entries summing to one) and a cost matrix $C \in \mathbb{R}^{n \times m}$ , the entropic OT problem with regularization parameter $\varepsilon > 0$ is

$\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$

where $U = \{P \in \mathbb{R}_+^{n \times m}: P 1_m = a,\, P^T 1_n = b\}$ . The entropic regularization $- \varepsilon \mathrm{Ent}(P)$ makes the objective $\varepsilon$ -strongly convex, ensuring a unique, strictly positive solution.

The dual formulation introduces potentials $f \in \mathbb{R}^n$ , $g \in \mathbb{R}^m$ and defines the Gibbs kernel

$b \in \Delta^{m-1}$ 0

The dual reads

$b \in \Delta^{m-1}$ 1

Optimality yields the scaling form for the OT plan: $b \in \Delta^{m-1}$ 2 where $b \in \Delta^{m-1}$ 3 solve the marginal constraints via matrix scaling.

2. Sinkhorn–Knopp Iterations and Convergence

The Sinkhorn algorithm alternates between enforcing the marginals by iterative scaling: $b \in \Delta^{m-1}$ 4 with $b \in \Delta^{m-1}$ 5 and initialization by $b \in \Delta^{m-1}$ 6. Each outer iteration costs $b \in \Delta^{m-1}$ 7 for dense $b \in \Delta^{m-1}$ 8.

Under positivity of $b \in \Delta^{m-1}$ 9 and marginals, the iterates converge linearly in the Hilbert projective metric: $C \in \mathbb{R}^{n \times m}$ 0 with a contraction factor determined by the conditioning of $C \in \mathbb{R}^{n \times m}$ 1 (Thornton et al., 2022, Moral et al., 19 Jan 2026). For square $C \in \mathbb{R}^{n \times m}$ 2 of size $C \in \mathbb{R}^{n \times m}$ 3 and fixed $C \in \mathbb{R}^{n \times m}$ 4, the total arithmetic complexity to reach error $C \in \mathbb{R}^{n \times m}$ 5 in marginal constraints is $C \in \mathbb{R}^{n \times m}$ 6.

For continuous measures, the discrete algorithm generalizes via nonlinear conditional updates (convolutions or integral operators), yielding the same alternating structure (Srinivasan et al., 14 Oct 2025, Karimi et al., 2023).

3. Differentiation, Sensitivity, and Automatic Differentiation

The unique, positive solution mapping $C \in \mathbb{R}^{n \times m}$ 7 is smooth. Derivatives of the mapping can be computed recursively through the chain rule applied to Sinkhorn iterates: $C \in \mathbb{R}^{n \times m}$ 8 with Jacobians $C \in \mathbb{R}^{n \times m}$ 9, where $\varepsilon > 0$ 0, and $\varepsilon > 0$ 1 parameterizes data (Pauwels et al., 2022). The main theorem shows that derivatives of the iterates converge linearly, and explicit formulas relate unrolling $\varepsilon > 0$ 2 steps plus autodiff to forward/reverse sensitivity of the solution.

Implicit differentiation at the fixed point recovers the same formula, requiring solution of a rank-deficient linear system. This analysis justifies autodiff/backpropagation through Sinkhorn for end-to-end learning with consistent gradients and bounded errors.

In unbalanced OT, similar differentiability holds for generalized Sinkhorn (Séjourné et al., 2019, Pham et al., 2020).

4. Theoretical Analysis: Stability, Rate, and Extensions

Exponential Convergence: Under Kolmogorov semiconcavity or log-concavity conditions for marginals and cost, exponential convergence is ensured with a rate linear in $\varepsilon > 0$ 3: contraction factor $\varepsilon > 0$ 4 with $\varepsilon > 0$ 5 (Chiarini et al., 2024). For compactly supported measures and bounded Hessians, quantitative rates follow from Talagrand and Poincaré inequalities. This is sharp for quadratic cost and log-concave marginals.

Operator-Theoretic and Semigroup Analysis: Sinkhorn is equivalently the evolution of a Markov semigroup on couplings. Contraction in $\varepsilon > 0$ 6-divergence, weighted TVs, and Wasserstein distance is guaranteed by drift/minorization or log-Sobolev inequalities in the kernel (Moral et al., 19 Jan 2026). Continuous-time limits (mirror-gradient flows) reveal exact entropy production identities and clarify the role of spectral gaps (Srinivasan et al., 14 Oct 2025, Karimi et al., 2023). Exponential entropy decay is equivalent to LSI for the kernel.

Unbalanced and Constrained OT: In the unbalanced setting, Sinkhorn-type algorithms alternate the usual scaling with proximal updates or KL-penalized marginal mismatches, maintaining linear convergence to a unique solution (Séjourné et al., 2019, Pham et al., 2020). While complexity improves (to $\varepsilon > 0$ 7), the classical $\varepsilon > 0$ 8 barrier persists for balanced OT.

For OT under additional (in)equality constraints, augmented block-coordinate Sinkhorn alternations in the dual (row, column, and constraint multipliers) achieve sublinear or exponentially small error, with Newton or entropy scheduling acceleration for small $\varepsilon > 0$ 9 (Tang et al., 2024).

5. Algorithmic Innovations and Accelerations

Innovation	Methodology/Guarantee	Reference
Warm-starts	Data-dependent initial duals (DualSort, Gaus, GMM, subsample) reduce iteration count by 30–90%	(Thornton et al., 2022)
Neural Initialization	Neural operators (UNOT) predict optimal potentials; up to $\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$ 0 speedup; maintains discretization invariance	(Geuter et al., 2022)
Sparse Newton	After standard SK warmup, sparse Newton steps on Lyapunov potential achieve superlinear convergence; per-iteration $\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$ 1; $\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$ 2 speedup in some regimes	(Tang et al., 2024)
Spectral Accel	SK–NR targets slow spectral modes (low-frequency) via partial Newton steps for ill-conditioned (small- $\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$ 3) problems; iteration count improved by $\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$ 4	(Chhaibi et al., 23 May 2025)
Online/Streaming	Stochastic or compressed online SK processes data as a stream, with $\hat{P} = \arg\min_{P \in U} \langle P, C \rangle - \varepsilon \sum_{i,j} P_{ij}(\log P_{ij} - 1)$ 5 convergence and wall-time advantages, via moment-matching compression	(Wang et al., 2023)
Decentralized	Fully distributed SK for barycenters using log-domain geometric averaging, event-triggered, and quantized gossip, with linear scaling in network size	(Baheri et al., 18 Sep 2025)

Hybrid strategies, including multi-marginal decompositions (Benamou et al., 2017), entropy-scheduling annealing (Tang et al., 2024), or FFT-based convolution acceleration, further expand practical scalability in high-dimensional or structured domains.

6. Applications and Extensions

Machine Learning & Statistics: Sinkhorn is central to OT-based loss functions, generative modeling, clustering, domain adaptation, and robust statistics.
Generative Modeling: Log-Sobolev constants and spectral gap analysis inform regularizer design for fast inner Sinkhorn solves in latent spaces (Srinivasan et al., 14 Oct 2025).
Control & MPC: Sinkhorn can be embedded online within receding-horizon MPC controllers for population steering and adaptive assignment, with stability and boundedness (Ito et al., 2023).
Gaussian Models: For multivariate Gaussians, Sinkhorn reduces to finite-dimensional Kalman–Riccati iterations, with explicit exponential contraction in trace/KL distance (Akyildiz et al., 2024).
Unbalanced & Generalized Flows: Regularized Sinkhorn generalizes to unbalanced, multi-marginal, and Bregman-projected settings, preserving positivity, definiteness, and geometric contraction (Séjourné et al., 2019, Benamou et al., 2017).

7. Continuous-Time Limit, Mirror Descent, and Theoretical Insights

Sinkhorn admits a rigorous continuous-time limit as a mirror descent (Wasserstein mirror flow) in coupling space (Karimi et al., 2023, Srinivasan et al., 14 Oct 2025). This connects the SK iterations to gradient flows in KL-divergence, Onsager operators, and the high-level PDE literature (Schrödinger bridges, mean-field equations). The dynamics exhibit contractive properties governed by conditional expectation operators, and the entropy decay rate is precisely tied to the operator-theoretic spectral gap or LSI constant. These results unify and extend Hilbert Projective metric, Perron–Frobenius, and coupling-by-reflection analytic frameworks for Sinkhorn convergence (Greco et al., 2023).

Summary: The entropic Sinkhorn algorithm is the foundational scalable method for regularized OT; its derivatives, convergence, and stability have been fully characterized. Recent innovations, spanning initialization, acceleration, parallelization, and extensions, have eliminated prior limitations in runtime, conditioning, and applicability, making Sinkhorn and its variants ubiquitous in contemporary computational mathematics, statistics, and machine learning (Pauwels et al., 2022, Tang et al., 2024, Chhaibi et al., 23 May 2025, Moral et al., 19 Jan 2026).