Variational Gradient Flow Formulations

Updated 2 April 2026

Variational Gradient Flow Formulations is a framework that interprets system evolution as the steepest descent of an energy functional over structured metric spaces.
It underpins advanced computational methods through discrete schemes like the JKO and BDF2, enabling robust solutions for nonlinear PDEs and Bayesian inference.
The approach guarantees convergence and stability, driving applications in deep generative modeling, particle dynamics, and data assimilation.

A variational gradient flow formulation provides a powerful unifying principle for the analysis and computation of evolution equations that describe the irreversible relaxation of systems toward equilibrium in a geometric or probabilistic space. Such formulations view the trajectory of a system as a curve of steepest descent for a specified energy (or entropy) functional with respect to a chosen metric structure. The variational gradient flow approach has generated substantial advances in computational probability, nonlinear PDEs, Bayesian inference, and generative modeling, encompassing both classical and non-classical settings. This article systematically surveys major theoretical structures, concrete algorithmic instantiations, and representative applications of variational gradient flow formulations, drawing on foundational and recent research.

1. Core Concept: Gradient Flows as Variational Evolution

A gradient flow in a metric or Riemannian manifold $(\mathcal{X}, d)$ is the evolution that most steeply decreases a given energy functional $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ with respect to the geometry induced by $d$ . In classical settings, such as $\mathcal{X} = \mathbb{R}^n$ and $d$ the Euclidean distance, the gradient flow equation is the ODE

$\dot{u}(t) = -\nabla \mathcal{E}(u(t)),$

whose solution evolves towards local minima of $\mathcal{E}$ .

The variational characterization generalizes this to abstract metric spaces and infinite-dimensional settings, such as spaces of measures. The time-continuous evolution is characterized variationally via the Evolution Variational Inequality (EVI), and discretely using the minimizing movement (JKO) scheme: $u_{k+1} = \arg\min_{v \in \mathcal{X}} \left\{ \frac{1}{2\tau} d^2(u_k, v) + \mathcal{E}(v) \right\}.$ As $\tau \to 0$ , these time-discrete approximations converge to a continuous trajectory that is the curve of steepest descent of $\mathcal{E}$ according to the metric $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 0 (Matthes et al., 2017, Pietschmann et al., 2022).

2. Gradient Flows in the Space of Probability Measures

Gradient flows gain special significance in the space of probability measures with the Wasserstein metric, denoted as $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 1 with distance $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 2. This structure underlies a large class of nonlinear PDEs and probabilistic inference methods.

Wasserstein Gradient Flow (WGF)

Given a free energy functional $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 3 (e.g., the relative entropy/Kullback-Leibler (KL) divergence with respect to a target measure $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 4), the WGF formulation produces the PDE: $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 5 where $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 6 denotes the first variation with respect to $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 7. For $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 8 the KL divergence, this specializes to the Fokker–Planck equation, and the corresponding stochastic process is the overdamped Langevin diffusion (Lambert et al., 2022, Yao et al., 2022, Trillos et al., 2017).

Mean-Field and Mixture Structures

Restriction to variational families, such as Gaussians or Gaussian mixtures, projects the infinite-dimensional gradient flow onto a submanifold, yielding ODEs for mean and covariance parameters with the Bures–Wasserstein metric structure (Lambert et al., 2022). For mean-field variational inference, iterative updates mimic the coordinate ascent VI (CAVI) form but now with fixed-point and convergence properties guaranteed by the geometric structure of WGF (Yao et al., 2022).

3. Variational Formulations Beyond Wasserstein: f-Divergences and Stein Geometry

The variational principle of gradient flows extends to broader choices of functionals and metrics beyond the Wasserstein framework.

General f-Divergence Flows and VGrow

If the energy is an $\mathcal{E} : \mathcal{X} \to (-\infty, +\infty]$ 9-divergence $d$ 0, the continuous evolution is expressed as: $d$ 1 specializing to different flows for KL, Jensen–Shannon, or other divergences. VGrow unifies a family of deep generative learning algorithms (VAEs, GANs, flow-based models) under this flow perspective, with training implemented by learning functional gradients via discriminate approximation (Gao et al., 2019).

Stein Variational Gradient Descent (SVGD)

In SVGD, the gradient flow is defined for the KL energy with a RKHS-induced Stein metric, producing a nonlinear deterministic evolution for the distribution of particles: $d$ 2 where $d$ 3 is a reproducing kernel (Liu, 2017). Theoretical analysis shows monotonic decrease in KL and convergence in measure, with the Stein discrepancy controlling convergence.

4. Computational and Algorithmic Realizations

Time Discretization Schemes: JKO, BDF2, and Beyond

The JKO (Jordan–Kinderlehrer–Otto) scheme provides a variational time-discretization matching the gradient flow at the level of measures and is central to both analysis and numerics (Matthes et al., 2017, Pietschmann et al., 2022). Variational second-order (BDF2) methods have been developed for higher-order accuracy whilst retaining variational structure and convergence guarantees in general metric settings (Matthes et al., 2017, Dong et al., 2024).

Variational Dual Formulations and Empirical Flows

Recent advances replace density-based objectives with variational dual (sample-based) objectives, leveraging variational characterizations of f-divergences and enabling gradient flows to be defined and computed for empirical distributions accessible only via samples. These methods scale naturally to high dimensions, harnessing stochastic minimax optimization and deep neural parameterizations for transport maps and dual functions (Fan et al., 2021).

Mesh-Free and Neural Discretization

Structure-preserving time discretizations can be combined with mesh-free neural network parametrizations to discretize high-dimensional PDEs directly via the energy-dissipation law, allowing for stability and scalability in gradient-flow-based solvers (Hu et al., 2022).

5. Applications: Inference, Sampling, Generative Models, and PDEs

Gradient flow formulations underpin a range of contemporary algorithmic paradigms.

Variational Inference and Bayesian Computation: Wasserstein and Fisher–Rao gradient flows yield natural dynamics toward approximate posteriors; continuous counterparts of Bayesian update, with convergence rates determined by log-concavity or convexity of the energy (Trillos et al., 2017, Yi et al., 6 May 2025).
Deep Generative Modeling: VGrow, flow-based models, and GANs can all be conceptualized as realizing gradient flows for appropriate f-divergence functionals, providing a principled framework for generator training and convergence analysis (Gao et al., 2019).
Particle-Based and Accelerated Flows: Stein particle flows, SVGD, and their accelerated variants employ variational principles in nonlinear functional geometries, with provable asymptotic convergence, practical acceleration, and adaptive RKHS/NN parameterizations (Liu, 2017, Stein et al., 30 Mar 2025, Zhang et al., 2024).
Nonlinear and Degenerate PDEs: Classical and non-classical diffusion, porous-medium, and nonlinear aggregation equations admit variational gradient-flow formulations in extended metric or measure-theoretic spaces, including cases with Dirichlet boundary data (utilizing modified Wasserstein distances), time-fractional evolution, and one-homogeneous functionals (Meyer, 2023, Erbar et al., 2024, Duong et al., 2019, Briani et al., 2011).
Variational Data Assimilation: The discrete, convex-optimization structure of gradient flows supports robust data assimilation and control for non-linear evolution problems via measurement-driven energy terms (Pietschmann et al., 2022).

6. Theoretical Guarantees and Convergence

Convergence and stability of variational gradient flows are dictated by the convexity properties of the driving functional in the relevant metric geometry.

For functionals that are $d$ 4-convex along geodesics (e.g., KL with a strongly log-concave target), the gradient flow converges exponentially to the unique minimizer with explicitly computable rates in Wasserstein or alternative information metrics (Lambert et al., 2022, Yao et al., 2022, Trillos et al., 2017).
In the absence of strict convexity ( $d$ 5), subexponential (e.g., $d$ 6) decay rates are still available (Lambert et al., 2022).
When gradient flows are restricted to finite-dimensional or function-approximator manifolds (e.g., Gaussian submanifolds or neural ansatzes), induced metrics such as Bures–Wasserstein or Fisher–Rao dictate the optimality guarantees and ODE structure for the parameters (Lambert et al., 2022, Yi et al., 6 May 2025).
Discrete schemes (e.g., JKO, BDF) inherit analogous stability and convergence rates under mild regularity and convexity assumptions (Matthes et al., 2017, Dong et al., 2024).

7. Extension to Constraints, Boundary Data, and General Integrands

Gradient flow theory accommodates constraints, boundary behaviors, and general functionals:

Flows for domains with Dirichlet boundary data are naturally formulated as variational evolutions with respect to modified Wasserstein distances allowing exchange of mass with reservoirs; the energy-dissipating structure and geodesic properties are preserved (Erbar et al., 2024).
Flows associated with convex, possibly non-smooth and non-homogeneous functionals, including total variation, symmetric-gradient energies, and one-homogeneous functionals, are realized as maximal monotone gradient flows—well-posed in the sense of contraction semigroups—by abstract duality, relaxation, and obstacle problem techniques (Meyer, 2023, Briani et al., 2011).
Projection-free and accelerated flows permit efficient treatment of nonconvex and quadratic constraints using projection-free tangent-space linearization combined with BDF discretizations; error analysis shows high-order constraint satisfaction and algorithmic efficiency (Dong et al., 2024).

The framework of variational gradient flow formulations unites diverse strands of modern applied mathematics, probability, and computational science, anchoring the analysis of nonlinear evolution, inference, and generative learning within a geometric, measure-theoretic, and variationally robust foundation (Lambert et al., 2022, Fan et al., 2021, Meyer, 2023, Matthes et al., 2017, Liu, 2017).