- The paper demonstrates that diffusion and flow matching models are reformulated as gradient flows and geodesics in Wasserstein space, unifying disparate generative approaches.
- It rigorously connects the Fokker-Planck equation with the JKO scheme, highlighting energy minimization principles underlying DDPM and score-based models.
- The study shows that flow matching provides efficient deterministic sampling by tracking optimal transport geodesics, contrasting with the entropy-driven paths of gradient descent.
The Geometry Behind Diffusion and Flow Matching: Gradient Flows and Geodesics in Wasserstein Space
Introduction
This paper provides a comprehensive geometric framework for the analysis of generative models—diffusion models and flow matching—by formalizing their relationship in terms of gradient flows and geodesics on the manifold of probability measures equipped with the quadratic Wasserstein metric. Specifically, the authors interpret the space of probability measures with finite second moments, P2(Rd), as a formal infinite-dimensional Riemannian manifold, leveraging the structure induced by W2, and then systematically relate stochastic differential equation (SDE)-based diffusion generative models and deterministic flow matching generative models to complementary variational principles on this space.
Wasserstein Geometry and the Continuity Equation
The key insight is that W2 not only defines a metric, but also canonically endows P2(Rd) with the structure of a (formal) Riemannian manifold in the sense of Otto. The tangent space at a measure p consists of L2 gradient vector fields with respect to p, with the inner product
(v1,v2)p=∫⟨v1(x),v2(x)⟩p(x)dx.
This geometric framing is critical: the Benamou-Brenier formula establishes that W2 is the geodesic distance, with the minimal kinetic energy path between two distributions characterized by
W22(μ0,μ1)=(pt,vt)inf∫01∫∣vt(x)∣2pt(x)dxdt
where W20 solves the continuity equation W21.
The continuity equation is thus interpreted as the "equation of motion" on Wasserstein space: evolution of probability measures via a velocity field is the geometric analog of a curve on a Riemannian manifold.
Gradient Flows, Free Energy, and Fokker-Planck Equation
Diffusion generative models, such as DDPM and score-based SDEs, are shown to correspond precisely to the gradient flow of the free energy functional
W22
with respect to the Wasserstein metric. This is the KL divergence from W23 to the target (equilibrium) distribution proportional to W24. The Fokker-Planck equation,
W25
is rigorously derived as the steepest descent (gradient flow) dynamics of W26.
A central theorem established is that the "velocity field" in the Fokker-Planck equation coincides with (is minus) the Wasserstein gradient of free energy, W27:
W28
Thus, the evolution dictated by popular diffusion generative models is understood as a continuous path of steepest descent in distributional space, minimizing free energy monotonically. The rate of decrease is quantified by the relative Fisher information.
The JKO Scheme and Discretizations
The JKO (Jordan-Kinderlehrer-Otto) scheme—the implicit Euler discretization of Wasserstein gradient flows—emerges as a geometrically principled approach to progressive denoising. The implicit step
W29
recovers denoising diffusion probabilistic models (DDPM), DDIM, as well as energy matching and other generative frameworks as special cases, with the update interpretable as a proximal step in probability space rather than a heuristic "denoising" move.
Strong convergence results are available under suitable convexity and growth conditions on W20: as step size W21, the iterates recover the continuous Wasserstein gradient flow (Fokker-Planck dynamics).
Flow Matching, Optimal Transport Geodesics, and Benamou-Brenier
In contrast to diffusion models, flow matching approaches (e.g., conditional flow matching, OT flow) do not realize a gradient flow but rather track the Wasserstein geodesic (the OT interpolation) between two endpoint measures. This is formalized as a Benamou-Brenier minimization (minimum kinetic action path), which is a boundary value rather than initial value problem:
W22
subject to W23.
Empirically and theoretically, these OT geodesics are straight lines in distribution space, requiring less kinetic energy and yielding more efficient sampling with fewer integration steps compared to the tortuous (entropy-driven) paths traced by gradient flows of free energy. This reflects the optimality, in the transport metric, of flow matching paths.
Unified Perspective: Complementary Variational Principles
The paper's central contribution is to formalize that both families of models—diffusions and flow matching—are not unrelated or merely analogous, but are exactly instantiations of complementary variational principles on Wasserstein space:
- Diffusion models (Fokker-Planck/score-based): Each step is an implicit Euler (JKO) update, defining a gradient flow of the free energy—an initial value problem.
- Flow Matching models: Each trajectory is the minimal action path between endpoints—an optimal transport geodesic, a boundary value problem.
Both reach the same endpoints (simple base to data distribution) but via distinct geometric paths.
The "probability flow ODE" connects the two, expressing the deterministic trajectory whose marginal evolution coincides with the SDE-induced Fokker-Planck flow, absorbing the stochastic diffusion into the velocity field via the score. Flow Matching appears as the special case where the optimal interpolation is a straight line with constant velocity.
Practical and Theoretical Implications
By grounding the connection between generative models and Wasserstein geometry, the paper justifies why diffusion models and flow matching models make different, but principled, choices regarding the trade-off between entropy, energy, and path optimality. The JKO viewpoint directly motivates energy-based and equilibrium matching approaches, where a single time-independent potential naturally encodes both transport and equilibrium. Conversely, the boundary value structure of flow matching delivers significant efficiency advantages for deterministic sampling.
This geometric lens provides:
- Formal explanations for empirical phenomena (e.g., why flow matching can require drastically fewer steps than diffusion).
- A systematic taxonomy of generative modeling algorithms based on their associated variational principles.
- Pathways for new algorithms (e.g., explicit JKO implementations, adaptive free energy schedules, hybrid flows/geodesics) by interpolating between the two principles.
- Foundational language to analyze questions of training dynamics, convergence, and sample quality in terms of geometric and functional-analytic properties.
Conclusion
The paper rigorously establishes that both diffusion and flow matching generative models are instantiations of fundamental geometric flows on the Wasserstein manifold of probability measures: gradient flows of free energy (Fokker-Planck/Wasserstein descent) or geodesics/minimal action curves (optimal transport). The continuity equation, gradient flow formalism (Fokker-Planck/JKO), and Benamou-Brenier formulation are shown to be not only implicit in existing models but are organizing principles that unify a wide class of generative methodologies in a single mathematical language.
This paradigm enables both a mechanistic understanding of algorithmic behavior and a blueprint for future developments in geometric, variational, and optimization-based generative modeling (2606.24157).