Continuous Normalizing Flows (CNF)

Updated 21 January 2026

Continuous Normalizing Flows are generative models that use neural ODEs to transform simple base densities into complex data distributions with exact likelihood computation.
They model a continuous-time, invertible mapping via a time-dependent vector field, enabling efficient sampling, density estimation, and uncertainty quantification across diverse domains.
Recent advances optimize training through flow matching, regularization techniques, and manifold adaptations, significantly reducing computational overhead and improving convergence.

Continuous Normalizing Flows (CNF) are generative models that construct highly expressive, invertible mappings between simple base distributions and complex data distributions by integrating parametrized neural ordinary differential equations (ODEs). In the CNF framework, the transformation is defined as the solution of an ODE whose velocity field is modeled by a neural network, enabling continuous-time, diffeomorphic transport of probability densities. Unlike classical (discrete) normalizing flows, CNFs naturally accommodate exact change-of-variables formulas for the evolving density, facilitating tractable and scalable density estimation, sampling, and uncertainty quantification across diverse data domains—including Euclidean spaces, Riemannian manifolds, images, and physical systems.

1. Mathematical Foundations and Core Formulation

A continuous normalizing flow defines a path of random variables $z_t$ by integrating a time-dependent vector field $f(z_t, t; \theta)$ : $\frac{d z_t}{dt} = f(z_t, t; \theta), \quad z_0 \sim q_0$ where $q_0$ is a simple base density (e.g. standard Gaussian) and $\theta$ parameterizes the neural vector field.

The evolution of the log-density along solution trajectories satisfies the instantaneous change-of-variables formula: $\frac{d}{dt} \log q_t(z_t) = -\operatorname{tr}\left(\frac{\partial f(z_t, t)}{\partial z}\right)$ which integrates to yield

$\log q_T(z_T) = \log q_0(z_0) - \int_0^T \operatorname{tr}\left(\frac{\partial f(z_t, t)}{\partial z}\right)\, dt$

This ensures exact likelihood computation and invertibility via ODE integration. The machinery is readily adapted to Riemannian manifolds, where the divergence operator and base measure are replaced by their intrinsic geometric analogues (Mathieu et al., 2020, Falorsi, 2021, Ben-Hamu et al., 2022).

2. Algorithmic Strategies: Training, Optimization, and Numerical Issues

Training CNFs typically involves maximizing likelihoods or minimizing divergences (e.g. KL) between the model and data, necessitating the computation of log-densities and their gradients through ODE solutions. Backpropagation is enabled by the adjoint method for neural ODEs, giving an efficient and memory-constant algorithm for gradient evaluation (Vaitl et al., 2022). Practical implementations leverage stochastic trace estimators (Hutchinson’s trick) to compute the Jacobian trace in high dimensions.

Recent advances address the high computational burden associated with ODE solver steps (“number of function evaluations,” NFE). Regularization schemes such as trajectory polynomial regularization (Huang et al., 2020) and optimal transport-based kinetic energy penalties or proximal splitting via the JKO scheme (Vidal et al., 2022, Onken et al., 2020) have been shown to dramatically reduce NFEs and solver stiffness, accelerating both training and inference.

Flow matching methods bypass the need for log-density evaluation at every step, training directly via regression to analytically prescribed or sampled velocity fields along linear or optimal-transport interpolants between base and target distributions (Gao et al., 2024, Dern et al., 3 Sep 2025, Cabezas et al., 2024). Path-gradient estimators further reduce variance in gradient estimates, improving convergence and sample efficiency, particularly near the optimum (Vaitl et al., 2022).

3. Theoretical Guarantees and Convergence Rates

Theoretical analysis of CNF estimators trained via flow matching demonstrates non-asymptotic convergence (in Wasserstein-2 distance) under mild assumptions on the target distribution, including strong log-concavity, bounded support, or Gaussian mixtures (Gao et al., 2024). The rate

$\mathbb{E} \left[ \mathcal{W}_2(\hat p_{1-\delta}, p_1) \right] \lesssim n^{-1/(d+5)}\,\text{polylog}\,n$

matches minimax rates up to logarithmic factors and an extra polynomial dependence on dimension (due to time regularity constraints). This analysis relies on carefully bounding discretization, velocity-estimation, and early-stopping errors for neural-parametric velocity fields. Lipschitz-regular approximation by deep ReLU networks is essential for ensuring that statistical and numerical errors diminish at optimal rates.

4. Extensions: Manifolds, Symmetry, and Physical Systems

Continuous normalizing flows naturally extend to Riemannian manifolds by formulating the ODE in terms of vector fields tangent to the manifold and integrating with respect to the Riemannian volume element. The divergence term in the change-of-variables formula is replaced by the intrinsic divergence of the manifold's metric (Mathieu et al., 2020, Falorsi, 2021). Specialized parameterizations—such as equivariant architectures for matrix Lie groups—ensure compatibility with symmetries present in lattice gauge theory and other structured physical domains (Gerdes et al., 2024).

Empirical results confirm that CNFs respecting manifold geometry and symmetry outperform traditional and ad hoc approaches on benchmark tasks, including high-dimensional spherical distributions, lattice field theory, and robotics pose inference (Mathieu et al., 2020, Ben-Hamu et al., 2022, Gerdes et al., 2024).

5. Computational Advances and Applications

Recent work has addressed the scaling and efficiency of CNFs. OT-Flow demonstrates 8× faster training and 24× faster inference on high-dimensional density estimation tasks by enforcing geodesic (OT-type) transport and providing an exact $O(d)$ trace formula (Onken et al., 2020). Trajectory polynomial regularization decreases NFEs by 42–71% for density estimation and 19–32% for variational autoencoding, without loss in test likelihood (Huang et al., 2020). JKO-Flow eliminates the need for global hyperparameter tuning by decomposing transport into a sequence of proximal gradient steps, accelerating convergence and making performance robust to step size (Vidal et al., 2022).

Applications of CNFs span image modeling (multi-resolution CNFs for high-resolution images (Voleti et al., 2021)), time-series modeling with dynamic normalizing flows based on stochastic process priors (Deng et al., 2020), conditional inference (InfoCNF with partitioned latent codes and adaptive ODE solvers (Nguyen et al., 2019)), uncertainty-aware human pose estimation (Liu et al., 4 May 2025), and efficient sampling for Boltzmann distributions in molecular systems via energy-weighted flow matching (Dern et al., 3 Sep 2025). In probabilistic inference, CNFs have been integrated with MCMC to provide adaptive global proposals that accelerate mode discovery and mixing in high dimensions (Cabezas et al., 2024).

6. Duality, Flow Matching, and Alternative Training Objectives

Dual formulations based on entropy-regularized optimal transport circumvent ODE integration entirely during training by learning only the scalar Entropy-Kantorovich potentials. The CNF is constructed post-facto as the gradient of these potentials, recovering the entropic interpolation path (Finlay et al., 2020). Probability Path Divergence (PPD) minimization provides an additional flexible training criterion—sidestepping inner ODE solves and generalizing classic f-divergences to the space of paths (Ben-Hamu et al., 2022). These techniques are particularly advantageous for large-scale or computationally constrained scenarios.

Flow matching has enabled Boltzmann and lattice models to be trained using only energy evaluations, exploiting importance sampling and iterative/annealed proposal refinement for stable and scalable learning in challenging energy landscapes (Dern et al., 3 Sep 2025). In practice, this has resulted in 100×–1000× reductions in energy calls relative to prior state-of-the-art, while maintaining or surpassing sample quality (as assessed by negative log-likelihood and Wasserstein distance).

7. Open Problems and Future Research Directions

Ongoing challenges in CNFs include further reducing computational overhead (especially for very stiff dynamics), developing more robust and scalable training procedures for extremely high dimensions or complex geometries, handling unnormalized or energy-based targets, and integrating CNFs with stochastic differential equations for modeling diffusion processes.

Extensions to 3+1-dimensional lattice field theory, real-world manifolds, and conditional and structured generative modeling remain active areas of development (Gerdes et al., 2024). Incorporation of advanced architectures—such as Lie-group equivariant networks, hierarchical and multiscale designs, and domain-specific physically-informed parameterizations—promise to further extend the power and applicability of continuous normalizing flows across machine learning, statistical mechanics, and scientific computing.