Wasserstein Gradient Flows (WGF)
- Wasserstein Gradient Flows are continuous-time dynamical systems that characterize the steepest descent evolution of functionals over probability measures via the 2-Wasserstein metric.
- Discrete-time schemes like the JKO and forward–backward splitting methods provide practical approximations with provable convergence rates under convexity assumptions.
- This framework bridges optimal transport, PDE analysis, and machine learning, enabling rigorous and scalable optimization in infinite-dimensional spaces.
Wasserstein Gradient Flows (WGF) are continuous-time dynamical systems that characterize the steepest descent evolution of a functional over the space of probability measures endowed with the 2-Wasserstein metric. The WGF framework provides a rigorous, geometrically-intrinsic generalization of gradient descent to infinite-dimensional spaces, with foundational relevance across optimal transport, partial differential equations, and probabilistic machine learning.
1. The 2-Wasserstein Space: Metric, Geometry, and Geodesics
The space of Borel probability measures on ℝᵈ with finite second moments,
equipped with the 2-Wasserstein distance,
becomes a geodesic metric space, where is the set of couplings of μ and ν. When μ is absolutely continuous, the optimal transport map is given by the gradient of a convex function (Brenier's theorem), and constant-speed geodesics can be constructed as pushforwards via interpolated maps: for t∈[0,1], where is the optimal transport map. The geodesic structure is central for defining “steepest descent” in this space (Salim et al., 2020).
2. Continuous-Time Formulation: Evolution Equation and Variational Characterization
For a given functional , the curve solving the Wasserstein gradient flow is characterized by the Evolution Variational Inequality (EVI):
Under regularity conditions, this is equivalent to a PDE for the density :
where 0 denotes the first variation of 1. For example, if 2, the gradient flow yields the Fokker-Planck equation, a prototypical diffusive evolution (Salim et al., 2020).
3. Discrete-Time Schemes: JKO and Forward-Backward Splitting
The canonical time-discretization of WGF is the Jordan–Kinderlehrer–Otto (JKO) implicit Euler scheme:
3
This yields a sequence whose piecewise-constant interpolation converges to the continuous WGF as 4.
When the objective function decomposes as 5 with 6 smooth and 7 possibly nonsmooth but geodesically convex, the Forward–Backward (FB) proximal-gradient algorithm over 8 is defined as:
- Forward (gradient) step for 9: 0,
- Backward (proximal) step for 1: 2,
mirroring the classical Euclidean proximal-point framework. Here, 3 is a JKO step for 4 only (Salim et al., 2020).
4. Convergence Theory for Proximal Splitting and Rates
Suppose 5 is 6-smooth and 7-strongly convex, and 8 is proper, lower semicontinuous, and convex along generalized geodesics. If 9, the FB scheme satisfies a discrete EVI:
0
- If 1, 2.
- If 3, 4 (linear convergence).
This result establishes WGF-FB as an infinite-dimensional analog of the proximal gradient method, retaining convergence guarantees familiar from convex Euclidean optimization (Salim et al., 2020).
5. Practical Implementation, Computational Aspects, and Examples
Continuous-time WGF enjoys exact decay rates, while discrete-time schemes (JKO, FB) match these rates up to step-size constraints. The main numerical challenge is evaluating the proximal map (JKO subproblem), which, depending on 5, may admit:
- Closed-form solutions (e.g., negative entropy/heat flow),
- PDE-based solvers (for more complex energies),
- Entropic regularization or Sinkhorn algorithms for approximation.
FB splitting reduces the implicit computation to the 6 part only, with the 7 part handled by a simple push-forward. In the canonical quadratic-plus-entropy example (sampling from a Gaussian), each FB step maintains Gaussianity, and closed-form recursions for mean and covariance yield linear W₂-convergence. Particle-based (sample-wise) push-forward strategies with optional heat flow accurately reflect continuous-time contraction, even in high dimensions (Salim et al., 2020).
6. Extensions, Applications, and Open Directions
The FB splitting framework for Wasserstein gradient flows enables:
- Handling composite objectives with both smooth and nonsmooth contributions,
- Direct generalization from Euclidean optimization,
- Provable convergence under geodesic convexity,
- Scalability to high dimensions when approximate or closed-form JKO operators are available.
Ongoing research targets efficient algorithms for more general energy landscapes (including non-convex energies, non-Euclidean underlying domains), adaptive schemes, high-dimensional and large-scale applications, and connections to stochastic optimization and sampling (Salim et al., 2020).
Table: Summary of Classical vs. Proximal-Splitting WGF Schemes
| Method | Iteration Definition | Complexity per Step |
|---|---|---|
| JKO | 8 | Full proximal (often hard/expensive) |
| FB-Splitting | First pushforward by 9, then Prox0 | Cheaper: only Prox1 |
The Wasserstein Proximal Gradient framework thus defines and analyzes an efficient and theoretically well-founded approach to composite optimization over the space of measures, with direct applicability to variational inference, sampling, and PDE evolution models (Salim et al., 2020).