Wasserstein Gradient Flow & Entropic Schemes
- Wasserstein gradient flow is a geometric framework where probability measures evolve as steepest descent curves of functionals in the 2-Wasserstein metric.
- The entropic regularization and KL-proximal splitting create a fast, scalable numerical scheme that mitigates the computational challenges of optimal transport.
- Applications include nonlinear diffusion, crowd dynamics, and multi-agent systems, ensuring stable approximations and preservation of weak solutions.
Wasserstein gradient flow is a geometric framework describing the evolution of probability measures as steepest descent curves of a functional with respect to the 2-Wasserstein metric. In "Entropic Wasserstein Gradient Flows," Peyré introduces a fast and scalable numerical scheme leveraging entropic regularization and KL-proximal splitting to efficiently and robustly approximate these flows for a broad class of convex and some nonconvex nonlinear evolution equations. The method maintains key modeling properties—such as stability and preservation of weak solutions—while overcoming the chief computational bottlenecks of traditional optimal transport-based time-discretization.
1. Numerical Scheme for Wasserstein Gradient Flows
The foundation of the approach is the time-discrete JKO scheme, which advances the probability density by solving
where is the Wasserstein distance and a given energy functional (e.g., internal or interaction energy, entropy, or indicator constraints). This recursion is an implicit Euler discretization in probability space: The JKO step corresponds to computing a minimizer for a convex functional involving the Wasserstein metric at each time step, a computation that can be substantially expensive due to the nature of the transport problem.
2. Entropic Regularization and KL Proximal Splitting
To make the JKO iterative step tractable, the paper applies entropic regularization to the optimal transport cost. The entropy-regularized (a.k.a. "Sinkhorn") version is
with the negative entropy and a regularization parameter. The main benefits are:
- Strict convexity and smoothness: the added entropy makes the problem well-conditioned for optimization.
- The optimal transport fidelity term is replaced by a Kullback-Leibler (KL) divergence to a Gibbs kernel: with .
- The computational complexity shifts from quadratic (original optimal transport) to near-linear in the number of points and suitable for GPU or parallel computation.
The trade-off is that large yields smoother (blurred) flows, while small approximates the unregularized Wasserstein case.
3. Algorithmic Implementation
Each JKO step under entropic regularization is formulated as a KL-divergence proximal minimization, involving additional convex constraints and energy terms: where encode the marginal constraints (e.g., fixing row and column sums), and possibly other convex indicators or energy penalties.
The optimization is carried out by KL-proximal splitting schemes—in particular, a Bregman-Dykstra type algorithm that alternates KL projections for each constraint or proximal operator. All updates are element-wise and amount to iteratively multiplying by (or normalizing rows and columns of) the Gibbs kernel, meaning:
- The main operation is Gibbs kernel multiplication, which becomes a gaussian (or other translation-invariant) filter in Euclidean domains.
For domains with geometric structure, like manifolds or meshes, the convolution with the kernel is approximated via heat kernel diffusion, i.e.,
where is the discrete Laplacian.
On arbitrary meshes, this translates to a sequence of sparse linear solves, which can be efficiently handled (e.g., via Cholesky or multigrid, for pre-factorized Laplacians).
Algorithmic Advantages
- Speed: all updates involve only convolutions or matrix-vector products with structured sparse matrices.
- Parallelizability: all operations are map-reduce or otherwise highly parallel.
- Flexibility: applies to wide classes of functionals, constraints, and domains.
4. Applications
This entropic JKO scheme is illustrated for several modeling scenarios:
- Nonlinear diffusion equations: porous medium equations , with functionals corresponding to nonlinear entropies or internal energies.
- Crowd motion and congestion: The inclusion of indicator functions or hard constraints on the maximum density produces flows with congestion effects. Sharp congestion fronts and shocks are captured robustly.
- Generalized flows: Simultaneous evolution of multiple densities (e.g., in multi-species models), whether with interaction potentials (e.g., attraction/repulsion) or summation constraints, is handled by imposing constraints in the proximal splitting.
The approach applies on:
- Regular grids (images, Euclidean lattices)
- Polygonal domains and nonconvex shapes
- Meshes for surfaces or manifolds, using Laplacian-based diffusion
The same methodology yields efficiency for:
- Anisotropic or spatially varying diffusion (by using tensor-valued metrics or weighted Laplacians)
- Multi-component crowd models (by using multi-marginal KL projections).
5. Summary of Key Formulas
Operation | Formula / Description |
---|---|
Entropic JKO step | |
Entropic OT distance | |
Gibbs kernel (cost/regularization) | |
KL-proximal operator (for f) | |
Heat kernel approximation for general domains |
Conclusion
The entropic Wasserstein gradient flow framework enables the robust, efficient, and parallelizable approximation of complex nonlinear evolution equations regulated by optimal transport geometry. By recasting the problem into a tractable KL-divergence proximal scheme with entropic regularization, it accommodates both theoretical advances (stable, weak solution approximations) and practical demands (speed, scalability, flexibility over domain and structure). This methodology has wide application in simulation of non-linear PDEs, multi-agent systems, and modeling on complex geometric domains, with robustness under tight constraints and highly nonuniform distributions.