Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sinkhorn Proxy Drift

Updated 9 May 2026
  • Sinkhorn Proxy Drift is a concept that uses the Sinkhorn divergence to approximate Wasserstein gradient flows, enabling efficient proxy drift fields.
  • It offers a practical method for trading off between numerical tractability and theoretical fidelity via entropic regularization.
  • Applications include stable neural generative modeling and scalable particle flow approximations in high-dimensional optimal transport problems.

Sinkhorn Proxy Drift is an umbrella term for procedures that use the Sinkhorn divergence—a symmetric, entropically regularized optimal transport (OT) functional—as a tractable surrogate ("proxy") in variational inference, gradient flows, and generative model training. This concept originates in the mathematical and algorithmic analysis of Wasserstein gradient flows (WGF) and their entropic approximations, leading to a spectrum of "proxy" drift fields that interpolate between exact transport-driven dynamics and more tractable, scalable alternatives. The paradigm encompasses rigorous mathematical consequences for continuous PDEs, practical drift-field approximations for particle flows, and state-of-the-art neural generative procedures. The defining feature of Sinkhorn proxy drift is the use of the Sinkhorn divergence, rather than the unregularized Wasserstein distance, to define and/or estimate the transporting vector field, often in a way that trades statistical or computational tractability for the theoretical fidelity of the gradient flow.

1. Mathematical Foundations: Sinkhorn Divergence and Entropic Optimal Transport

For probability measures ρ\rho and ν\nu on Rd\mathbb{R}^d, with quadratic cost c(x,y)=xy2c(x, y) = \|x - y\|^2, the entropic-regularized optimal transport cost is defined as

OTε(ρ,ν)=minπΠ(ρ,ν)c(x,y)dπ(x,y)+εKL(πρν).\mathrm{OT}_\varepsilon(\rho, \nu) = \min_{\pi \in \Pi(\rho, \nu)} \int c(x, y)\,d\pi(x, y) + \varepsilon\,\mathrm{KL}(\pi \|\rho \otimes \nu).

The Sinkhorn divergence is the symmetric, bias-corrected functional

Dε(ρν)=OTε(ρ,ν)12OTε(ρ,ρ)12OTε(ν,ν).D_\varepsilon(\rho \| \nu) = \mathrm{OT}_\varepsilon(\rho, \nu) - \tfrac{1}{2} \mathrm{OT}_\varepsilon(\rho, \rho) - \tfrac{1}{2} \mathrm{OT}_\varepsilon(\nu, \nu).

The entropic parameter ε>0\varepsilon > 0 controls the trade-off between transport accuracy and entropy regularization. As ε0\varepsilon \to 0, DεD_\varepsilon recovers the squared Wasserstein-2 distance.

The Sinkhorn divergence is computationally accessible via the Sinkhorn algorithm and admits dual characterizations via potentials ϕ,ψ\phi^*, \psi^*, which yield efficient minibatch and GPU-based estimators, enabling its integration as a "proxy" in high-dimensional inference and learning systems (Zhu et al., 2024).

2. Gradient Flows and the Notion of Drift

Taking ν\nu0 as a variational objective, the Wasserstein gradient flow (WGF) equation for the evolving law ν\nu1 is

ν\nu2

It is a steepest-descent evolution in the Wasserstein geometry. The first variation of the Sinkhorn divergence is given by

ν\nu3

where ν\nu4 is the solution to the entropic OT dual problem from ν\nu5 to ν\nu6, and ν\nu7 is the (self-)potential for ν\nu8. Thus, the Sinkhorn drift (editor's term) is

ν\nu9

This explicit cross-minus-self structure can be realized empirically via barycentric projections using the OT plans induced by Sinkhorn scaling (Zhu et al., 2024, He et al., 12 Mar 2026, Gretton et al., 6 May 2026).

3. Sinkhorn Proxy Drift: Particle Approximation and One-Shot Proxies

Full computation of the exact Sinkhorn drift at each iteration is computationally expensive, especially in large-batch or high-dimensional regimes. "Sinkhorn proxy drift" refers to a practical procedure where the drift is approximated as follows:

  • Compute softened pairwise costs between model (particle set Rd\mathbb{R}^d0) and data (particle set Rd\mathbb{R}^d1).
  • Use low-iteration (even Rd\mathbb{R}^d2) Sinkhorn scaling, leading to one-sided or geometric-mean normalized "proxy" couplings (i.e., not enforcing full row and column marginals).
  • Form the drift field for each sample as a cross-minus-self sum:

Rd\mathbb{R}^d3

where Rd\mathbb{R}^d4 and Rd\mathbb{R}^d5 are proxy Gibbs kernels or geometric-mean pseudo-plans (Gretton et al., 6 May 2026).

  • This proxy drift is consistent (vanishes if and only if model and target match) but is not always conservative; it generically fails to be a gradient field unless additional alignment conditions are met.

The method is computationally efficient, requires only a forward pass (no OT backpropagation), and remains stable at lower entropic regularization than unregularized OT, though it can underperform in resolving mass splits between well-separated modes (Gretton et al., 6 May 2026, He et al., 12 Mar 2026).

4. Neural and Algorithmic Realizations

The Neural Sinkhorn Gradient Flow (NSGF) framework parameterizes the time-dependent velocity field Rd\mathbb{R}^d6 via a neural network and trains it to regress to the empirical Sinkhorn-based velocity estimate using a velocity-matching loss: Rd\mathbb{R}^d7 where Rd\mathbb{R}^d8 is an unbiased estimator of the true Sinkhorn drift, constructed from mini-batches of samples from the source and target distributions using empirical Sinkhorn plans (Zhu et al., 2024). The NSGF++ scheme introduces a two-phase transport: an initial phase of Sinkhorn-driven flow deemed sufficient to reach the data manifold, followed by a straight-line refinement toward prescribed data points.

A widely adopted pseudocode for the "Sinkhorn drift" operation in batch-based settings is: OTε(ρ,ν)=minπΠ(ρ,ν)c(x,y)dπ(x,y)+εKL(πρν).\mathrm{OT}_\varepsilon(\rho, \nu) = \min_{\pi \in \Pi(\rho, \nu)} \int c(x, y)\,d\pi(x, y) + \varepsilon\,\mathrm{KL}(\pi \|\rho \otimes \nu).3 where Rd\mathbb{R}^d9 is the number of Sinkhorn iterations. When c(x,y)=xy2c(x, y) = \|x - y\|^20, the result reduces to a naive one-sided "drifting" update as in recent GMD algorithms; for moderate c(x,y)=xy2c(x, y) = \|x - y\|^21, the coupling approaches the true doubly-stochastic plan, yielding higher-fidelity drifts (He et al., 12 Mar 2026).

5. Theoretical Properties and Identifiability

The Sinkhorn divergence is strictly positive definite: c(x,y)=xy2c(x, y) = \|x - y\|^22 if and only if c(x,y)=xy2c(x, y) = \|x - y\|^23 (He et al., 12 Mar 2026, Gretton et al., 6 May 2026). Consequently, its gradient flow admits a unique equilibrium at the target law. In the context of proxy drift, the proxy vector field vanishes only if empirical or population-level distributions match—the so-called identifiability property—resolving a well-documented gap in previous "drifting" frameworks based only on kernel means or one-sided normalization. This property holds both in continuous and empirical particle settings under mild nondegeneracy (e.g., support points distinct) (He et al., 12 Mar 2026, Gretton et al., 6 May 2026). For the approximated (proxy) field, the consistency property is preserved, though higher-order mass transportation properties may be compromised.

6. Sinkhorn Proxy Drift in the JKO Scheme and PDEs

In the time-discrete Jordan–Kinderlehrer–Otto (JKO) minimization scheme for constructing Wasserstein gradient flows, replacing c(x,y)=xy2c(x, y) = \|x - y\|^24 with the entropic (Sinkhorn) cost yields the entropic JKO step: c(x,y)=xy2c(x, y) = \|x - y\|^25 It is shown that in the diffusive regime c(x,y)=xy2c(x, y) = \|x - y\|^26, the limiting PDE acquires an extra linear diffusion term: c(x,y)=xy2c(x, y) = \|x - y\|^27 This additional c(x,y)=xy2c(x, y) = \|x - y\|^28 drift is termed the "Sinkhorn proxy drift" in the PDE context. For c(x,y)=xy2c(x, y) = \|x - y\|^29 (i.e., OTε(ρ,ν)=minπΠ(ρ,ν)c(x,y)dπ(x,y)+εKL(πρν).\mathrm{OT}_\varepsilon(\rho, \nu) = \min_{\pi \in \Pi(\rho, \nu)} \int c(x, y)\,d\pi(x, y) + \varepsilon\,\mathrm{KL}(\pi \|\rho \otimes \nu).0), the classical Wasserstein gradient flow is recovered. The scaling OTε(ρ,ν)=minπΠ(ρ,ν)c(x,y)dπ(x,y)+εKL(πρν).\mathrm{OT}_\varepsilon(\rho, \nu) = \min_{\pi \in \Pi(\rho, \nu)} \int c(x, y)\,d\pi(x, y) + \varepsilon\,\mathrm{KL}(\pi \|\rho \otimes \nu).1 allows practitioners to stabilize Sinkhorn computations at the expense of diffusive bias, facilitating a trade-off between numerical tractability and fidelity to pure Wasserstein-driven evolution (Baradat et al., 18 Feb 2025).

7. Empirical and Practical Significance

In generative modeling tasks—particularly those prone to low-temperature pathologies such as mode collapse—Sinkhorn proxy drift methods consistently outperform one-sided and kernel-density-induced drifts. For example, Sinkhorn drifting reduces mean FID score on FFHQ-ALAE from 187.7 to 37.1 and mean EMD from 453.3 to 144.4 at the lowest temperature, while maintaining full class coverage on MNIST across temperature sweeps. The overhead of using Sinkhorn drift (i.e., a handful of forward Sinkhorn iterations per batch) is modest and does not alter the inference procedure at test time (He et al., 12 Mar 2026).

8. Limitations and Open Problems

While Sinkhorn proxy drift provides a tractable and theoretically justified surrogate for optimal transport-driven flows, the proxy field (when not fully Sinkhorn-scaled) may fail to be conservative (not generally expressible as the gradient of any global functional) and may not transport mass optimally between widely separated modes at finite entropic regularization. Empirical proxy drifts converge to the true WGF drift only asymptotically as batch sizes tend to infinity and iteration count OTε(ρ,ν)=minπΠ(ρ,ν)c(x,y)dπ(x,y)+εKL(πρν).\mathrm{OT}_\varepsilon(\rho, \nu) = \min_{\pi \in \Pi(\rho, \nu)} \int c(x, y)\,d\pi(x, y) + \varepsilon\,\mathrm{KL}(\pi \|\rho \otimes \nu).2 in Sinkhorn scaling. Thus, practical deployments must tune entropic parameters and iteration budgets to balance statistical error, numerical stability, and approximation quality (Gretton et al., 6 May 2026, Baradat et al., 18 Feb 2025).

References

Definition Search Book Streamline Icon: https://streamlinehq.com
References (4)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sinkhorn Proxy Drift.