Continuous Normalizing Flows

Updated 18 June 2026

Continuous Normalizing Flows are generative models that use neural ODEs to create an invertible mapping from a simple base distribution to complex target distributions.
They compute exact changes in log-density through integration and employ optimal transport regularization to smooth training and enhance model performance.
CNFs are versatile, supporting applications in high-dimensional density estimation, stochastic processes, and flow matching on manifolds.

A continuous normalizing flow (CNF) is a generative model parameterized by a neural ordinary differential equation (ODE) that defines an invertible mapping between a simple base distribution and a complex target distribution via continuous-time transformation. CNFs extend the classical framework of normalizing flows by representing the flow as the solution of a time-dependent ODE, thereby bypassing the architectural constraints and determinant computations required by traditional finite-step flows. This continuous viewpoint admits efficient computation of exact change in log-density through trajectory integration, facilitates flexible modeling in high dimensions and non-Euclidean geometries, and supports integration with optimal transport, stochastic processes, and a variety of domain-specific constraints.

1. Mathematical Foundations and Basic Formulation

A CNF models a sample transformation through the ODE

$\frac{d}{dt} z(t) = f(z(t), t; \theta), \quad z(0) = x,$

where $f$ is a time-dependent vector field (usually parameterized by a neural network) and $x$ is sampled from the data distribution. The map $f$ is constructed so that $z(T)$ has a tractable base density—typically $\mathcal{N}(0, I)$ —and the induced law on $x$ is made highly flexible through the expressivity of $f$ .

The instantaneous change-of-variables (log-density evolution) is given by Jacobi’s formula: $\frac{d}{dt} \log p(z(t)) = -\mathrm{Tr}\left(\frac{\partial f}{\partial z}(z(t), t)\right),$ which, upon integration, yields

$\log p(x) = \log p_Z(z(T)) + \int_0^T \mathrm{Tr}\left(\frac{\partial f}{\partial z}(z(t), t)\right) dt.$

This removes the need to compute high-dimensional Jacobian determinants at each step and supports scalable likelihood evaluation (Vidal et al., 2022, Mathieu et al., 2020).

Sampling is invertible: $f$ 0. Importantly, the ODE flow is guaranteed to be a diffeomorphism under mild regularity of $f$ 1.

2. Training Objectives and Regularization

Maximum Likelihood and KL-based Losses

The standard CNF objective maximizes data likelihood: $f$ 2 This is often equivalent to minimizing the KL divergence between the pushforward of data and a base distribution (Vidal et al., 2022, Onken et al., 2020).

Optimal Transport Regularization

CNFs are underconstrained by maximum likelihood alone—infinitely many vector fields can match the terminal law in expectation. Regularization via optimal transport (OT) penalizes kinetic energy (Benamou–Brenier cost) and enforces “straightness” of trajectories: $f$ 3 The OT-regularized objective includes a tunable parameter $f$ 4 trading kinetic cost and terminal divergence: $f$ 5 Empirically, $f$ 6 must be tuned; large values stiffen optimization, while small values reduce expressivity. The JKO-Flow algorithm [Jordan–Kinderlehrer–Otto step] transforms this into a proximal Wasserstein gradient flow framework, eliminating $f$ 7 tuning and replacing it with a sequence of subproblems each solved with fixed $f$ 8 (Vidal et al., 2022).

3. Implementation Algorithms and Practical Considerations

Solver Dynamics and Trace Computation

CNFs are integrated with ODE solvers (e.g., RK4, Dormand–Prince). The trace required for log-density can be computed:

Exactly with structured parameterization (e.g., OT-Flow’s analytic trace computation for quadratic potentials) (Onken et al., 2020).
Approximately with stochastic estimators such as Hutchinson’s trick, which requires only $f$ 9 per-step computation (Falorsi, 2021, Mathieu et al., 2020).

Adjoint sensitivity methods allow memory-efficient gradient computation with respect to network parameters by solving an augmented backward ODE (Vaitl et al., 2022).

Path-Gradient Estimation

Path-gradient estimators avoid the variance contributions of score terms in stochastic variational inference for CNFs. Rather than the total derivative, only path-terms are retained: $x$ 0 Empirically, path-gradient estimators accelerate convergence, reduce gradient variance, and achieve superior effective sample size compared to standard total-gradient approaches (Vaitl et al., 2022).

Flow Matching and Linear Interpolation

Recent approaches favor simulation-free training with flow-matching objectives: $x$ 1 where $x$ 2, $x$ 3, and $x$ 4 is a neural network (Gao et al., 2024). This admits non-asymptotic convergence analysis in Wasserstein-2 distance under log-concavity, bounded support, or Gaussian mixture conditions.

4. Extensions to Geometry, Stochasticity, and Physics Constraints

Flows on Manifolds and Lie Groups

CNFs are extended to curved domains by parameterizing flows on Riemannian manifolds $x$ 5, replacing the Euclidean divergence with the Riemannian divergence: $x$ 6 These require manifold-aware ODE solvers, chart- or generator-based parameterizations of vector fields, and unbiased stochastic divergence estimation (Mathieu et al., 2020, Falorsi, 2021). Applications include learning distributions on spheres, Stiefel manifolds, and matrix Lie groups, with demonstrated improvements in density modeling and sampling for directional data, rotations, and positive-definite matrices.

In lattice gauge theory, gauge-equivariant CNFs are built by parameterizing ODEs on matrix groups $x$ 7 (e.g., SU( $x$ 8)), with vector fields constructed from Wilson loops and convolutional neural networks, achieving state-of-the-art effective sample sizes in high-dimensional lattice models (Gerdes et al., 2024).

Stochastic Dynamic CNFs

CNFs have been adapted to time series by warping base stochastic processes (e.g., Wiener process) via dynamic normalizing flows. The induced process is governed by a stochastic differential equation: $x$ 9 providing exact likelihoods, continuous paths, and strong results on both synthetic and real-world time series (Deng et al., 2020).

Physics-Informed Flows and PDE Constraints

Physics-Informed Normalizing Flows (PINF) integrate CNFs with physical PDE constraints such as the Fokker–Planck equation: $f$ 0 By enforcing the PDE intrinsically in the flow, PINF efficiently solves high-dimensional FP equations in a mesh-free and causality-free fashion, outperforming mesh-based or PINN approaches in moderate/high dimensions (Liu et al., 2023).

5. Applications in Generative Modeling, Scientific Computing, and Uncertainty Quantification

Density Estimation and Large-Scale Generation

CNFs are deployed in unsupervised density estimation and generative modeling, including high-dimensional image, physics, and protein structure data. Multi-Resolution CNFs (MRCNF) enable scalable modeling of large images by decomposing them into hierarchical detail channels, preserving invertibility and exact likelihoods at each scale (Voleti et al., 2021).

Bayesian Inference, Variational Methods, and Adversarial Applications

In Bayesian inference, CNFs serve as highly expressive variational posteriors in VAEs and are used in lattice field theory to accelerate sampling and inference by providing uncorrelated, high-quality samples in regimes where traditional methods are slowed by criticality (Vaitl et al., 2022, Caselle et al., 2023).

For adversarial purification, CNFs trained with conditional flow matching learn mappings from adversarial/noisy to clean data, surpassing diffusion approaches in clean accuracy preservation and robustness under a variety of threat models, and supporting integrated adversarial detection (Collaert et al., 19 May 2025).

Scientific Computing and High-Dimensional PDEs

CNFs have been applied to estimate partition functions, flux-tube observables, and rare-event probabilities in lattice models, mean-field games, and high-dimensional uncertainty quantification scenarios, achieving stable and scalable performance (Liu et al., 2023, Caselle et al., 2023).

6. Theoretical Guarantees, Convergence, and Practical Implications

Recent work has established non-asymptotic convergence rates for simulation-free CNFs with flow-matching training. Under strong log-concavity, bounded support, or (infinite) Gaussian mixture targets, CNF-based distribution estimators achieve Wasserstein-2 error rates scaling as $f$ 1, where $f$ 2 is sample size and $f$ 3 is the dimension. The analysis accounts for discretization, velocity field approximation, and early stopping, assuming explicit Lipschitz regularity of the neural network parameterizations (Gao et al., 2024).

Typical implementations use subpolynomially-growing neural networks in $f$ 4, with practical Euler step sizes, and guarantee pushforward stability of the learned flows under controlled regularity.

7. Ongoing Directions and Limitations

Computational Trade-Offs

CNFs, especially as neural ODE solvers, can become stiff with challenging objectives or suboptimal step sizes. The JKO-Flow proximal scheme and OT-Flow's regularization alleviate some of these issues by enforcing smoothness and offering exact, efficient trace computation (Vidal et al., 2022, Onken et al., 2020).

Extensions and Integrations

There is active development in adaptive scheduling of proximal steps, combining flows with Schrödinger bridge regularization, and extending physics-informed flows to mean-field games, rare-event estimation, and high-dimensional stochastic control (Vidal et al., 2022, Liu et al., 2023).

Limitations

Each JKO-Flow iteration requires full solve of a CNF subproblem, though total steps $f$ 5 suffice in practice.
Subproblems remain nonconvex and high-dimensional. Warm starting and inertia can accelerate convergence.
Out-of-distribution detection shares the same challenges as other likelihood-based generative models, with CNFs only partially mitigating spurious likelihood assignments (Voleti et al., 2021).

Empirical evaluations demonstrate that CNFs, especially when properly regularized and efficiently implemented, yield superior sample quality, faster convergence, and state-of-the-art performance in a range of high-dimensional generating, uncertainty quantification, and scientific modeling tasks.