The Geometry Behind Diffusion and Flow Matching: Gradient Flows and Geodesics in Wasserstein Space

Published 23 Jun 2026 in cs.AI | (2606.24157v1)

Abstract: The space $\mathcal{P}_2(\mathbb{R}^d$) of probability measures with finite second moment carries a natural geometry: the quadratic Wasserstein distance W_2 makes it a complete metric space and, following Otto, a (formal) Riemannian manifold whose geodesics are the optimal-transport interpolations. On this manifold, the gradient flow of the free energy F(rho) = KL(rho || π) is exactly the Fokker-Planck equation, and its implicit-Euler discretization is the JKO scheme. This is the geometry underlying diffusion models: the forward process descends the free energy, and each denoising step realizes one JKO step, which recovers DDPM, DDIM, NCSN/SMLD, and Energy Matching; this is one scheme, not separate theories. The same manifold supports a second variational principle. Its geodesics - the minimum-action curves of the Benamou-Brenier formula - are precisely the optimal-transport paths that Flow Matching learns. Fixing both endpoints and following the geodesic, generation becomes a deterministic ODE along a straight line, hence far fewer sampling steps. Placing both families of models on one manifold makes their relationship exact: diffusion follows a free-energy gradient flow, an initial-value problem; optimal-transport Flow Matching follows a Wasserstein geodesic, a boundary-value problem. The two reach the same endpoints along different paths.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that diffusion and flow matching models are reformulated as gradient flows and geodesics in Wasserstein space, unifying disparate generative approaches.
It rigorously connects the Fokker-Planck equation with the JKO scheme, highlighting energy minimization principles underlying DDPM and score-based models.
The study shows that flow matching provides efficient deterministic sampling by tracking optimal transport geodesics, contrasting with the entropy-driven paths of gradient descent.

The Geometry Behind Diffusion and Flow Matching: Gradient Flows and Geodesics in Wasserstein Space

Introduction

This paper provides a comprehensive geometric framework for the analysis of generative models—diffusion models and flow matching—by formalizing their relationship in terms of gradient flows and geodesics on the manifold of probability measures equipped with the quadratic Wasserstein metric. Specifically, the authors interpret the space of probability measures with finite second moments, $\mathcal{P}_2(\mathbb{R}^d)$ , as a formal infinite-dimensional Riemannian manifold, leveraging the structure induced by $W_2$ , and then systematically relate stochastic differential equation (SDE)-based diffusion generative models and deterministic flow matching generative models to complementary variational principles on this space.

Wasserstein Geometry and the Continuity Equation

The key insight is that $W_2$ not only defines a metric, but also canonically endows $\mathcal{P}_2(\mathbb{R}^d)$ with the structure of a (formal) Riemannian manifold in the sense of Otto. The tangent space at a measure $p$ consists of $L^2$ gradient vector fields with respect to $p$ , with the inner product

$(v_1, v_2)_p = \int \langle v_1(x), v_2(x) \rangle p(x) dx.$

This geometric framing is critical: the Benamou-Brenier formula establishes that $W_2$ is the geodesic distance, with the minimal kinetic energy path between two distributions characterized by

$W_2^2(\mu_0, \mu_1) = \inf_{(p_t, v_t)} \int_0^1 \int |v_t(x)|^2 p_t(x) dx\,dt$

where $W_2$ 0 solves the continuity equation $W_2$ 1.

The continuity equation is thus interpreted as the "equation of motion" on Wasserstein space: evolution of probability measures via a velocity field is the geometric analog of a curve on a Riemannian manifold.

Gradient Flows, Free Energy, and Fokker-Planck Equation

Diffusion generative models, such as DDPM and score-based SDEs, are shown to correspond precisely to the gradient flow of the free energy functional

$W_2$ 2

with respect to the Wasserstein metric. This is the KL divergence from $W_2$ 3 to the target (equilibrium) distribution proportional to $W_2$ 4. The Fokker-Planck equation,

$W_2$ 5

is rigorously derived as the steepest descent (gradient flow) dynamics of $W_2$ 6.

A central theorem established is that the "velocity field" in the Fokker-Planck equation coincides with (is minus) the Wasserstein gradient of free energy, $W_2$ 7:

$W_2$ 8

Thus, the evolution dictated by popular diffusion generative models is understood as a continuous path of steepest descent in distributional space, minimizing free energy monotonically. The rate of decrease is quantified by the relative Fisher information.

The JKO Scheme and Discretizations

The JKO (Jordan-Kinderlehrer-Otto) scheme—the implicit Euler discretization of Wasserstein gradient flows—emerges as a geometrically principled approach to progressive denoising. The implicit step

$W_2$ 9

recovers denoising diffusion probabilistic models (DDPM), DDIM, as well as energy matching and other generative frameworks as special cases, with the update interpretable as a proximal step in probability space rather than a heuristic "denoising" move.

Strong convergence results are available under suitable convexity and growth conditions on $W_2$ 0: as step size $W_2$ 1, the iterates recover the continuous Wasserstein gradient flow (Fokker-Planck dynamics).

Flow Matching, Optimal Transport Geodesics, and Benamou-Brenier

In contrast to diffusion models, flow matching approaches (e.g., conditional flow matching, OT flow) do not realize a gradient flow but rather track the Wasserstein geodesic (the OT interpolation) between two endpoint measures. This is formalized as a Benamou-Brenier minimization (minimum kinetic action path), which is a boundary value rather than initial value problem:

$W_2$ 2

subject to $W_2$ 3.

Empirically and theoretically, these OT geodesics are straight lines in distribution space, requiring less kinetic energy and yielding more efficient sampling with fewer integration steps compared to the tortuous (entropy-driven) paths traced by gradient flows of free energy. This reflects the optimality, in the transport metric, of flow matching paths.

Unified Perspective: Complementary Variational Principles

The paper's central contribution is to formalize that both families of models—diffusions and flow matching—are not unrelated or merely analogous, but are exactly instantiations of complementary variational principles on Wasserstein space:

Diffusion models (Fokker-Planck/score-based): Each step is an implicit Euler (JKO) update, defining a gradient flow of the free energy—an initial value problem.
Flow Matching models: Each trajectory is the minimal action path between endpoints—an optimal transport geodesic, a boundary value problem.

Both reach the same endpoints (simple base to data distribution) but via distinct geometric paths.

The "probability flow ODE" connects the two, expressing the deterministic trajectory whose marginal evolution coincides with the SDE-induced Fokker-Planck flow, absorbing the stochastic diffusion into the velocity field via the score. Flow Matching appears as the special case where the optimal interpolation is a straight line with constant velocity.

Practical and Theoretical Implications

By grounding the connection between generative models and Wasserstein geometry, the paper justifies why diffusion models and flow matching models make different, but principled, choices regarding the trade-off between entropy, energy, and path optimality. The JKO viewpoint directly motivates energy-based and equilibrium matching approaches, where a single time-independent potential naturally encodes both transport and equilibrium. Conversely, the boundary value structure of flow matching delivers significant efficiency advantages for deterministic sampling.

This geometric lens provides:

Formal explanations for empirical phenomena (e.g., why flow matching can require drastically fewer steps than diffusion).
A systematic taxonomy of generative modeling algorithms based on their associated variational principles.
Pathways for new algorithms (e.g., explicit JKO implementations, adaptive free energy schedules, hybrid flows/geodesics) by interpolating between the two principles.
Foundational language to analyze questions of training dynamics, convergence, and sample quality in terms of geometric and functional-analytic properties.

Conclusion

The paper rigorously establishes that both diffusion and flow matching generative models are instantiations of fundamental geometric flows on the Wasserstein manifold of probability measures: gradient flows of free energy (Fokker-Planck/Wasserstein descent) or geodesics/minimal action curves (optimal transport). The continuity equation, gradient flow formalism (Fokker-Planck/JKO), and Benamou-Brenier formulation are shown to be not only implicit in existing models but are organizing principles that unify a wide class of generative methodologies in a single mathematical language.

This paradigm enables both a mechanistic understanding of algorithmic behavior and a blueprint for future developments in geometric, variational, and optimization-based generative modeling (2606.24157).

Markdown Report Issue