Sinkhorn Gradient Descent

Updated 19 February 2026

SinkGD is a gradient-based optimization technique that generalizes the classical Sinkhorn algorithm for entropic regularized optimal transport problems using mirror descent.
It bridges discrete Sinkhorn iterations with continuous gradient flows, achieving sublinear to exponential convergence under specific smoothness and log-Sobolev conditions.
Variants of SinkGD provide robust methodologies for applications such as generative modeling, barycenter computation, and PDE-based dynamics in high-dimensional and probability spaces.

Sinkhorn Gradient Descent (SinkGD) refers to a broad class of first-order optimization algorithms applied to functionals involving entropic regularized optimal transport (OT) and, by extension, Sinkhorn divergences. These methods leverage the smooth structure conferred by entropy regularization, leading to efficient, scalable algorithms for problems in computational optimal transport, generative modeling, PDE-based dynamics on probability spaces, and statistical learning. SinkGD generalizes the classical Sinkhorn algorithm itself, which may be viewed as a mirror gradient descent in the geometry induced by the Kullback–Leibler (KL) divergence, to a wider variety of objectives and function spaces.

1. Foundation: Entropic Optimal Transport and the Sinkhorn Algorithm

Sinkhorn gradient-based methods are predicated on the entropic regularization of the OT problem between probability measures $(\mu, \nu)$ on spaces $X, Y$ with cost $c : X \times Y \to \mathbb R$ , and regularization parameter $\varepsilon > 0$ : $\inf_{\pi \in \Pi(\mu, \nu)} \iint_{X \times Y} c(x, y)\, \pi(dx, dy) + \varepsilon \iint_{X \times Y} \ln\left(\frac{d\pi}{d(\mu \otimes \nu)}\right) \pi(dx, dy).$ This can be equivalently phrased as entropic minimization $\min_{\pi \in \Pi(\mu, \nu)} \varepsilon H(\pi \mid R)$ , where $R(dx, dy) = e^{-c(x, y)/\varepsilon} \mu(dx)\nu(dy)$ is the Gibbs kernel (2002.03758). The classical Sinkhorn algorithm, or Iterative Proportional Fitting Procedure (IPFP), alternates KL projections onto the marginal constraints, and is formally equivalent to alternating row and column normalizations in the matrix case (Mishchenko, 2019). In continuous settings, the dual potentials $(u, v)$ evolve via coupled fixed-point updates.

These row/column normalizations or potential updates are instances of mirror descent in the appropriate information geometry. Notably, when viewed through the lens of Bregman gradients and KL-divergence mirror maps, the classic scaling process coincides precisely with mirror descent on the KL-divergence functional (2002.03758, Mishchenko, 2019).

2. Sinkhorn Gradient Descent as Mirror Descent and Gradient Flow

Mirror descent generalizes classical gradient descent by operating in non-Euclidean geometries tailored to the convexity properties of the problem. In Sinkhorn GD, the functional and its geometry are both induced by KL divergence: $\rho_{n+1} = \arg\min_{\rho} H(\rho_n) + \langle H'(\rho_n), \rho - \rho_n \rangle + G(\rho \mid \rho_n),$ where $H(\rho) = H(\rho \mid \nu)$ and $G = F^*$ is a convex mirror map derived from the dual formulation (2002.03758).

The continuous-time vanishing-step-size and regularization limits yield a Wasserstein mirror gradient flow, in which the evolution equation is: $\partial_t \rho_t = \nabla \cdot \left( \rho_t\, v_t \right), \quad v_t = -[\nabla^2 u_t]^{-1} \nabla_x (f + \log \rho_t),$ where $u_t$ is the potential evolving under a parabolic Monge–Ampère equation (Deb et al., 2023). This constructs an explicit link between discrete Sinkhorn iterations and a continuous evolution in probability space—the so-called Sinkhorn flow—which is the mirror gradient flow of the KL functional with respect to the mirror $\frac{1}{2} W_2^2(\cdot, \nu)$ (Deb et al., 2023, Srinivasan et al., 14 Oct 2025).

3. Algorithmic Variants and Computational Schemes

The SinkGD framework encompasses several concrete instantiations. Key variants include:

Functional Sinkhorn Descent: Direct gradient descent in spaces of mappings or probability measures equipped with RKHS structures or empirical particle representations, as in the computation of Sinkhorn barycenters (Shen et al., 2020).
Sinkhorn Natural Gradient: Steepest descent with respect to the Sinkhorn (entropic OT) geometry, involving the Sinkhorn information matrix (SIM) as a Riemannian metric (Shen et al., 2020).
Stochastic/Incremental/Mini-batch Variants: Stochastic mirror descent, Pinkhorn, Greenkhorn, and other stochastic partial-gradient forms; these arise from penalty-based objective formulations with KL-divergence penalties for the marginal constraints (Mishchenko, 2019).
Sharp Sinkhorn Gradient Descent: Gradient descent on the sharp approximation to the unregularized Wasserstein distance, using closed-form gradients for improved faithfulness to true OT (Luise et al., 2018).

A typical Sinkhorn-GD update in the discrete setting is:

Initialize a^0 in Delta_n
for t = 0, 1, ...:
    grad = sum_{i=1}^p w_i * grad_S(a^t, b^{(i)}; M, λ, ε)
    a_tilde = a^t - η_t * grad
    a^{t+1} = ProjectOntoSimplex(a_tilde)
    # stopping criterion

with explicit routines for efficient gradient and Hessian computation (Luise et al., 2018).

4. Convergence Properties and Rates

Formal analysis of Sinkhorn GD methods reveals several convergence regimes, often linked to the relative smoothness of KL divergence w.r.t. the mirror map (Mishchenko, 2019). Key results include:

Sublinear Convergence: Under general convexity and smoothness assumptions,

$H(\rho_n \mid \nu) \le \frac{H^*(\mu, \nu, R)}{n},$

where $H^*(\mu, \nu, R)$ is the minimal achievable KL divergence, robust to sparse or degenerate kernels (2002.03758).

Non-Asymptotic and Exponential Rates: On compact manifolds with appropriate regularity, and in settings where a log-Sobolev inequality (LSI) holds, Sinkhorn GD exhibits explicit exponential rates for both potentials and their gradients:

$\sup_x |\varphi^{\diamond n}(x) - \varphi^*(x)| \leq c \cdot \gamma^{2n-1} \|\psi^0 - \psi^*\|_{\mathrm{Lip}},$

with $\gamma = e^{-\pi^2 T}$ , leading to $O(\log(1/\epsilon))$ convergence in practice (Greco et al., 2023).

Connection to Functional Inequalities: Exponential decay in entropy is equivalent to the presence of an LSI in the evolving law, and in continuous-time these functional inequalities govern contraction and entropy production rates (Srinivasan et al., 14 Oct 2025, Deb et al., 2023).

5. Extensions, Generalizations, and Gradient Flows

The theory extends naturally to:

Multi-Marginal and Constraint Extensions: Stochastic mirror descent with arbitrary collections of KL-penalty constraints extends the Sinkhorn method to multi-marginal settings (Mishchenko, 2019).
Sinkhorn-JKO and Gradient Flows in Probability Space: The Sinkhorn-JKO (“Jordan-Kinderlehrer-Otto”) scheme replaces Wasserstein distances with Sinkhorn divergences in variational implicit Euler flows for probability measures, producing well-posed, globally minimizing evolutions (Hardion et al., 18 Nov 2025).
Onsager Gradient Flow and Markov Dynamics: The Sinkhorn flow defines a reversible Markov dynamic on the target marginal, with Onsager operator $K_\pi = \pi^Y L_\pi$ and Dirichlet form $\mathcal{E}_\pi(f, f)$ , ensuring spectral gaps and functional contraction (Srinivasan et al., 14 Oct 2025).

The following table categorizes representative Sinkhorn-GD schemes:

Variant	Objective Functional Type	Domain / Geometry
Classical Sinkhorn (IPFP)	Entropic OT (KL-divergence)	Coupling matrices / Measures
Functional SD (Barycenter)	Debiased Sinkhorn barycenter	Particle/RKHS embeddings
SiNG	Parametric Sinkhorn geometry	Latent GAN parameter spaces
Sharp Sinkhorn	Unregularized Sinkhorn approx.	Discrete measures / Simplex

6. Practical Implementation and Applications

Practical aspects of SinkGD include:

Numerical Stability: Log-domain implementation of Sinkhorn iterations, warm-starting, and parameter tuning for regularization $\lambda$ and tolerances (Luise et al., 2018).
Computational Complexity: Each Sinkhorn solve is $O(n m \epsilon^{-2} \lambda)$ , with further optimizations via low-rank or FFT methods, and gradient steps scaling as $O(n m^2)$ for typical discrete implementations (Luise et al., 2018, Hardion et al., 18 Nov 2025).
Robustness: Rates and stability constants depend only on entropy minima and not on lower-bounded positivity of the kernel, yielding robust performance even as some $R_{ij} \to 0$ (2002.03758).
Example Applications: Generative modeling (GANs, Sinkhorn-regularized flows), PDE dynamics under entropic regularization, probabilistic barycenter computation, stochastic control, and adapted sampling algorithms (Shen et al., 2020, Hardion et al., 18 Nov 2025, Shen et al., 2020).

7. Theoretical and Practical Implications

The unification of Sinkhorn procedures with gradient and mirror descent endows entropic OT and its variants with strong theoretical guarantees, links to PDEs and stochastic control, and flexible algorithms for high-dimensional and statistical learning settings. The mirror descent perspective clarifies why and how convergence, stability, and complexity depend on the regularization structure. Notably, the use of log-Sobolev and Poincaré inequalities to certify exponential convergence and stopping criteria is both theoretically deep and practically actionable (Srinivasan et al., 14 Oct 2025).

This synthesis has yielded a proliferation of algorithmic advances and a broadening of scope, connecting discrete optimization, infinite-dimensional flows, and machine learning via the shared mathematical language of entropy-regularized transport and its gradient-based solvers.