Entropic Wasserstein Gradient Flows

Updated 2 July 2025

Entropic Wasserstein gradient flows are a framework for evolving probability measures using entropy functionals on Wasserstein space, modeling diffusion and nonlinear PDEs.
They combine entropic regularization with optimal transport methods, enabling efficient numerical schemes such as the Sinkhorn algorithm.
They bridge microscopic stochastic models with macroscopic variational principles, extending to discrete, unbalanced, and multi-phase systems in various applications.

Entropic Wasserstein gradient flows describe the evolution of probability measures driven by the interplay of entropy (or entropic-type functionals) and the geometry of optimal transport, typically formulated in Wasserstein space. They provide a rigorous variational framework for modeling diffusion, nonlinear evolution equations, and statistical mechanics phenomena, and underpin a diverse set of algorithms in scientific computing, statistics, and machine learning. In this context, “entropic” refers both to the entropic regularization of optimal transport and the central role of entropy-like functionals as driving energies for the flows.

1. Mathematical Foundations

The foundational structure of entropic Wasserstein gradient flows arises from the recognition that many evolution equations for probability densities can be viewed as gradient flows of entropy functionals on the Wasserstein space of probability measures. The quadratic Wasserstein space $(\mathscr{P}_2(\mathbb{R}^d), W_2)$ is endowed with a Riemannian (or metric) structure where squared distance is defined by

$W_2^2(\mu, \nu) = \min_{\pi \in \Pi(\mu, \nu)} \int_{\mathbb{R}^d\times\mathbb{R}^d} \|x-y\|^2\,d\pi(x, y).$

A prototypical example is the heat equation, $\partial_t \rho = \Delta \rho$ , which is the gradient flow of the Boltzmann entropy $E(\rho) = \int \rho\log\rho\,dx$ in Wasserstein space. The time-discrete JKO (Jordan-Kinderlehrer-Otto) variational scheme (1004.4076) updates the distribution via

$\rho^n = \underset{\rho}{\arg\min}\;\frac{1}{2h}W_2^2(\rho, \rho^{n-1}) + E(\rho).$

This formulation extends to nonlinear diffusion and porous medium equations, where more general entropies (e.g., R\'enyi or power-type) serve as driving energies (1212.1129), and to settings with boundary conditions, interaction potentials, or additional constraints.

Entropic regularization, wherein an entropy penalty is added to the optimal transport cost, leads to strictly convex, smoother problems: $\min_{\pi \in \Pi(\mu, \nu)} \int c(x, y)\,d\pi(x, y) + \epsilon H(\pi|\mu\otimes\nu),$ where $H$ is the Kullback-Leibler divergence and $\epsilon > 0$ controls regularization strength. This facilitates computational tractability (via the Sinkhorn algorithm) and enables convergence analysis for relaxed schemes (2309.08598).

2. Micro- and Macroscopic Origins

A rigorous link between microscopic random particle models and deterministic gradient flows of entropy in Wasserstein space is established via large-deviations principles. For $n$ i.i.d. Brownian particles, the probability that their empirical measure at time $h$ is close to a prescribed $\rho$ is exponentially small: $\mathbb{P}(L_n^h \approx \rho \mid L_n^0 \approx \rho_0) \approx \exp\left[-n J_h(\rho; \rho_0)\right],$ where $J_h$ is a rate functional involving entropy and Wasserstein distance between $\rho$ and $\rho_0$ (1004.4076). As $h \to 0$ , this large-deviation functional $\Gamma$ -converges to the time-discretized JKO functional, providing a microscopic justification for Wasserstein gradient flows as the macroscopic limit of the most likely empirical evolution.

The entropic structure is thus physically grounded: the Wasserstein distance arises from Gaussian particle fluctuations, while the entropy term reflects the combinatorial likelihood of microstates (Sanov’s theorem). This correspondence persists in discrete settings, such as for finite Markov chains (1102.5238) and for discrete porous medium models (1212.1129), with appropriate modifications to metrics and entropy definitions.

3. Variational and Computational Approaches

Implementing entropic Wasserstein gradient flows, especially for large-scale or high-dimensional problems, involves both variational formulations and efficient numerical schemes.

Variational time discretization (JKO): Evolution is approximated by iterative minimization: $\rho^{n+1} = \underset{\rho}{\arg\min}\;\frac{1}{2h}W_2^2(\rho, \rho^n) + F(\rho),$ for an energy $F$ . This forms the theoretical underpinning for weak solutions and long-time behavior (1502.06216, 2105.05677).

Entropic regularization and Sinkhorn algorithm: Entropic regularization converts the Wasserstein distance to an entropic OT divergence, making each discrete JKO step a strictly convex, smooth optimization (often over couplings), solvable by Sinkhorn scaling or KL-proximal alternating minimization (1502.06216). On grids or for discrete data, this reduces to efficient matrix scaling, fast convolution (for translation-invariant costs), or Laplacian-based kernel multiplication (for manifold domains).

Score-free, particle-based methods: Recent advances demonstrate explicit, score-free forward discretization schemes for gradient flows using iterated Schrödinger bridge steps (2406.10823). These schemes only require samples and Sinkhorn computation, circumventing the need for density or score estimation, and are particularly well-suited for high-dimensional and particle-based simulations.

ICNN-based scalable optimization: For very high-dimensional continuous distributions, convex neural network parameterizations enable pushing forward measures via optimal transport maps, making JKO steps feasible in practice (2106.00736, 2112.02424). These approaches combine stochastic gradient descent, sample-based functionals, and convex representation theory for modeling Wasserstein flows.

4. Extensions and Generalizations

The entropic Wasserstein gradient flow framework extends in several directions:

Generalized distances and entropies: Beyond the $W_2$ metric, analogous gradient flows can be formulated in conic or spherical Hellinger-Kantorovich distances, which accommodate unbalanced mass transport and broader classes of nonlinear PDEs (1809.03430).
Wasserstein mirror flows: Flows defined by combining the geometry of one functional (e.g., the squared Wasserstein distance) with the gradient of another (e.g., entropy) lead to mirror descent analogues in measure spaces, providing new perspectives and acceleration properties (2307.16421). The Sinkhorn algorithm’s limit as its regularization parameter vanishes is shown to interpolate a Wasserstein mirror gradient flow of the relative entropy.
Non-gradient and constrained flows: Entropic regularization provides a principled extension to non-gradient PDEs (e.g., degenerate diffusions with drift), leveraging variational and stochastic control formulations and unifying classical and non-classical dissipative equations under a generalized framework (2104.04372, 2310.18678).
Coupled and incompressible multi-phase systems: Systems where multiple phases or densities are evolved jointly under mutual constraints are modeled as coupled Wasserstein gradient flows, often via a fibered Wasserstein metric coupled by volume or mass conservation (2411.13969).

5. Discrete and Graph-Based Settings

Wasserstein gradient flows and their entropic analogues on discrete spaces require novel definitions of transport distances and entropy functionals:

Discrete Benamou–Brenier formulations: Metrics analogous to $W_2$ can be constructed on finite graphs or Markov chains, with appropriate dynamic representations and continuity equations (1102.5238). These metrics ensure the continuous-time evolution (heat or Fokker–Planck flow) is realized as the gradient flow of entropy in the discrete geometry.
Non-local and weighted transport metrics: For discrete porous medium equations, non-local transportation metrics parametrized by entropy functions capture the relevant dynamics and are compatible with Gromov–Hausdorff convergence to continuous Wasserstein metrics (1212.1129).
Graph-based semi-discretizations: Fokker–Planck and related equations on graphs are approximated by gradient flows of discrete free energy, preserving entropy dissipation and convergence to discrete Gibbs measures (1608.02628).

6. Applications and Impact

Entropic Wasserstein gradient flows and their associated computational methods find applications across numerous domains:

Statistical inference and generative modeling: The entropic Wasserstein framework enables particle-based, score-free algorithms for sampling from complex distributions (e.g., for Bayesian inference), training energy-based models, and scalable generative machine learning (1910.14216, 2106.00736, 2112.02424).
Inverse problems and data assimilation: Grid-free, entropy-regularized particle methods address ill-posed inverse problems such as density deconvolution and epidemiological incidence reconstruction, offering robust convergence and scalability (2209.09936).
Crowd and multi-agent simulation: Nonlinear diffusion and congestion models are naturally described as entropic gradient flows, permitting stable simulations of crowd motion and granular media, with consistent treatment of boundary and constraint effects (1502.06216, 2411.13969).
Unbalanced transport and reaction-diffusion: Generalizations to unbalanced and coupled transport handle biochemical networks, reaction-diffusion systems, and image interpolation, via Hellinger-Kantorovich geometries (1809.03430).
Network and graph-based modeling: Gradient flow structures on discrete and metric graphs enable analysis and simulation for diffusion, aggregation, and self-organization over networked domains, such as traffic, epidemics, and transport on complex structures (2105.05677).

7. Historical Context and Relations

The theory of entropic Wasserstein gradient flows builds on seminal advances in optimal transport and variational PDEs:

The JKO scheme [Jordan, Kinderlehrer, Otto, 1998] introduced the variational time-discretization interpreting Fokker–Planck evolution as the gradient flow of entropy in Wasserstein space.
Otto’s geometric calculus offered a formal Riemannian metric structure, while Ambrosio, Gigli, and Savaré established a rigorous metric space theory for gradient flows of entropy-like functionals.
Subsequent developments incorporated large deviation theory, stochastic control (Schrödinger bridges), mirror descent, and extensions to discrete spaces, Riemannian manifolds, and unbalanced or constrained probability spaces.

Current research broadens these ideas to mirror flows, coupled multi-phase systems, and machine learning applications, while providing new stochastic, variational, and computational paradigms for high-dimensional analysis and simulation.

Summary Table: Core Elements of Entropic Wasserstein Gradient Flows

Setting	Driving Functional	Metric Structure	Evolution Equation	Numerical Scheme
Euclidean (continuous)	Entropy/KL	$W_2$	$\partial_t \rho = \nabla\cdot(\rho\nabla (\delta F/\delta \rho))$	JKO/implicit Euler, SB
Discrete/Markov chain	Entropy	Maas metric $W$	Markov semigroup / discrete heat flow	Discrete minimization
Entropic OT	Relative entropy + cost	Entropic OT divergence	Sinkhorn iterations / SB diffusion	Sinkhorn, KL-proximal
Multi-phase/coupled	Entropic OT, constraint	Fibered $W_2$	PDE with pressure, volume constraint	Minimizing movement

Entropic Wasserstein gradient flows thus integrate microscopic statistical behaviors, variational PDE analysis, stochastic control, and efficient simulation schemes into a unified framework, with broad implications in mathematics, computational science, and machine learning.