Flow-Sinkhorn Algorithm

Updated 8 February 2026

Flow-Sinkhorn is a continuous formulation of the classical Sinkhorn algorithm, unifying mirror descent, PDE analysis, and stochastic processes in optimal transport.
It provides exponential convergence guarantees and efficient computation via FFT and distributed strategies, reducing complexity for large-scale problems.
The framework underpins diverse applications, including improved generative modeling and high-dimensional flow-matching, while offering robustness through entropy decay and regularity conditions.

The Flow-Sinkhorn algorithm refers to a continuum limit and geometric re-interpretation of the classical Sinkhorn algorithm for solving entropy-regularized optimal transport (OT) problems. In this framework, the discrete Sinkhorn iterations converge (under vanishing regularization and infinitesimal step size) to a continuous-time dynamical system—termed the Sinkhorn flow or Flow-Sinkhorn—which can be rigorously realized as a mirror-descent ODE, as a parabolic Monge–Ampère PDE, or as a McKean–Vlasov stochastic process. This continuous-time paradigm provides deep insights into regularity, contraction, entropy decay, algorithmic acceleration, and stochastic robustness of large-scale OT computation and its modern machine learning applications.

1. Mathematical Formulation and Geometric Structure

The Flow-Sinkhorn framework emerges from considering the entropy-regularized OT problem between measures $\mu(dx)=e^{-f(x)}dx$ and $\nu(dy)=e^{-g(y)}dy$ with quadratic cost, at entropic regularization parameter $\varepsilon>0$ . The classical Sinkhorn algorithm produces a sequence of couplings (via alternating Bregman projections/scaling), but in the limit $\varepsilon\to0$ and with suitable time-rescaling $N=\lfloor t/\varepsilon\rfloor$ , these iterates trace out a curve $\rho_t$ in the $2$-Wasserstein space $W_2$ , called the Sinkhorn flow (Deb et al., 2023).

The flow admits several interconnected formulations:

Continuity equation: There exists a unique velocity field $v_t\in L^2(\rho_t)$ such that $\partial_t \rho_t + \nabla\cdot(\rho_t v_t) = 0.$
Wasserstein mirror gradient flow: The Sinkhorn flow is the mirror gradient flow of the Kullback–Leibler divergence $F(\rho) = \mathrm{KL}(\rho \|\mu)$ with mirror given by the squared $W_2$ -distance $U(\rho) = \tfrac{1}{2}W_2^2(\rho,\nu)$ . In the space of convex potentials $u_t$ pushing $\rho_t$ to $\nu$ , this yields

$\partial_t \nabla u_t(x) = \nabla \frac{\delta F}{\delta \rho}(\rho_t)(x).$

Hessian-metric mirror flow: Each tangent space is equipped with a Riemannian metric $g_U(v,w) = \int v^\top (\nabla^2 u_t) w\, d\rho_t$ . The velocity field is then $v_t(x) = -(\nabla^2 u_t(x))^{-1} \nabla F(\rho_t)(x)$ .

The parabolic Monge–Ampère PDE for the convex transport potential $u_t$ is

$\partial_t u_t(x) = f(x) - g(x^{u_t}) + \log\det\nabla^2 u_t(x),$

where $x^{u_t}$ denotes the Monge map from $x$ under $u_t$ (Deb et al., 2023, Berman, 2017).

2. Rigorous Convergence, Stability, and Complexity

The continuous-time Sinkhorn flow solves a strictly convex, time-dependent PDE or ODE under standard smoothness and regularity assumptions. Key convergence phenomena include:

Consistency with discrete Sinkhorn: The discrete Sinkhorn process is an explicit Euler discretization of the flow; for step size $h=1$ it recovers classical scaling, for $h\in(0,2)$ over-relaxed schemes, with provable linear convergence controlled by the spectral radius (second singular value) of the kernel (Modin, 2023).
Exponential convergence: When $\mu$ satisfies a log-Sobolev inequality (LSI), and the Hessian metric is uniformly positive definite, the flow exhibits exponential decay of $\mathrm{KL}(\rho_t \|\mu)$ ,

$\frac{d}{dt}\mathrm{KL}(\rho_t\|\mu) \le -2\kappa h_0\, \mathrm{KL}(\rho_t\|\mu)$

for LSI constant $\kappa$ and $h_0>0$ lower bound on the Hessian metric (Deb et al., 2023, Srinivasan et al., 14 Oct 2025).

Complexity per iteration: FFT or convolutional structure can lower each iteration to $O(N\log N)$ for periodic settings, with overall complexity $O(N(\log N)^2)$ for fixed accuracy (Berman, 2017). In high-dimensional mini-batch flow-matching contexts, distributed Sinkhorn with multi-GPU sharding enables scaling to very large batch sizes (Klein et al., 5 Jun 2025).

3. The Mirror-Descent and Entropy Production Perspective

Sinkhorn flow is formulated as a mirror descent in infinite-dimensional probability spaces, with the dual ODE:

$\partial_t h_t(x,y) = -\delta F/\delta\pi(\pi_t)(x,y) = -\log \frac{d\pi_t^Y}{d\nu}(y),$

and dual potential update, with time-varying Riemannian geometry (mirror Hessian). The flow contracts in $L^2$ norms weighted by the mirror metric and satisfies strict entropy production identities:

$\frac{d}{dt} H(\pi_t^Y \|\nu) = -\| (I-Q_{\pi_t})\,g_t \|_{L^2(\pi_t)}^2$

where $g_t(y) = \log(d\pi_t^Y/d\nu)(y)$ and $Q_{\pi_t}$ is the conditional expectation operator. The induced Dirichlet form (Onsager operator) defines a spectral gap bounded away from zero for $\varepsilon>0$ (Srinivasan et al., 14 Oct 2025).

Log-Sobolev inequality (LSI) holds if and only if exponential entropy decay occurs along Flow-Sinkhorn, providing both theoretical guidance (latent space design) and practical stopping criteria in discrete Sinkhorn implementations.

4. Relation to Parabolic Optimal Transport and Monge–Ampère Equations

In the large-scale limit, Sinkhorn iterates on manifolds (e.g., $T^n$ , $S^2$ ) converge to the solution of a parabolic, fully non-linear Monge–Ampère PDE. For the torus, the equation is:

$\partial_t u(t,x) = \log \det(I + \nabla^2 u(t,x)) - g(x+\nabla u(t,x)) + f(x)$

where $F_t(x) = x+\nabla u(t,x)$ is the transport map (Berman, 2017). Such flows can be computed efficiently—via convolution (FFT) or band-limited harmonic analysis on the sphere—matching the error and speed of static Sinkhorn for OT. On general manifolds, explicit complexity and error bounds are obtainable under quasi-Monte Carlo or mesh-free point clouds (Berman, 2017).

5. Multi-Marginal and Dynamic Flows

Flow-Sinkhorn generalizes to multi-marginal (entropic) OT, approximating geodesics of the incompressible Euler equations in Brenier's relaxation. Discretizing time leads to a multi-marginal OT problem, efficiently solved by multi-dimensional Sinkhorn iteration. The resulting couplings capture mass splitting, crossing, and mixing phenomena unseen in classical Lagrangian maps. Empirical studies include 1D reversals, periodic wraparound, and two-dimensional Beltrami flows (Benamou et al., 2017).

In generative modeling, Flow-Sinkhorn enables efficient supervised or unsupervised flow-matching, where large-scale OT couplings (via distributed Sinkhorn) produce “straight” low-curvature flows, thereby reducing inference steps and improving performance in both synthetic and image generation benchmarks (Klein et al., 5 Jun 2025).

6. Algorithmic Implementation and Practical Guidelines

A prototypical Flow-Sinkhorn step consists of:

Initialize π⁰(x,y)=exp(−c(x,y)/ε)/Z⁰(x), set h⁰=0.
for k=0,1,2,… until H(πₖ^Y‖ν)<τ do
  1. Compute current marginal dπₖ^Y/dν, set gₖ(y)=log (dπₖ^Y/dν)(y).
  2. Dual update:  h^{k+1}(x,y) = hᵏ(x,y)  –  γ gₖ(y).
  3. Primal re-projection: π^{k+1}(x,y) = π₀(x,y) exp(h^{k+1}(x,y)) / Z^{k+1}(x).
end for

Stopping heuristic: Under LSI with rate $\rho$ , convergence occurs in $O((1/(γρ))\log H_0)$ steps (Srinivasan et al., 14 Oct 2025).
Large-batch Sinkhorn: For mini-batch stochastic flow-matching in high dimensions, choose batch size $n$ as large as hardware allows, target renormalized entropy $\mathcal{E}(P)\approx0.05-0.2$ , use moderate Sinkhorn tolerance, log-domain updates, and distributed sharding for scalability (Klein et al., 5 Jun 2025).
Latent space regularity: Ensure the latent target admits a large uniform LSI constant for rapid Sinkhorn convergence.

7. Stochastic and Generative Extensions

The continuous Flow-Sinkhorn admits a McKean–Vlasov stochastic process realization. In direct analogy to Langevin diffusion, the SDE

$dX_t = \left[-\partial_{x^{u_t}}f(X_t) - \partial_{x^{u_t}}g(X_t^{u_t}) + \partial_{x^{u_t}}h_t(X_t)\right]dt + \sqrt{2(\nabla^2 u_t(X_t))^{-1}}\,dB_t,$

whose time-marginals follow the Sinkhorn flow (Deb et al., 2023). This structure allows expressing entropic-OT flows as diffusions, with applications in generative models.

The Flow-Sinkhorn continuous-time viewpoint unifies and extends classical Sinkhorn, mirror descent, mean-field Schrödinger bridges, and is directly connected to dynamic OT, diffusion modeling, and neural flow-based generative approaches. Noise and bias-robust variants (via step size scheduling) yield robust convergence guarantees in the presence of stochastic gradient estimates (Karimi et al., 2023).

Flow-Sinkhorn thus provides a principled, dynamic, and scalable foundation for the analysis, computation, and application of entropy-regularized optimal transport. Its impact spans both theoretical optimal transport, PDE analysis, fluid dynamics, machine learning, and modern generative modeling (Srinivasan et al., 14 Oct 2025, Deb et al., 2023, Karimi et al., 2023, Berman, 2017, Klein et al., 5 Jun 2025, Benamou et al., 2017, Modin, 2023).