Continuous-Time Normalizing Flow (CNF)

Updated 8 May 2026

Continuous-Time Normalizing Flow (CNF) is a generative model that constructs invertible mappings between complex target distributions and simple base distributions using continuous ODE integration.
CNFs enable exact likelihood computation and efficient sampling, and they extend to manifold-structured data with specialized numerical methods.
Training relies on maximum likelihood and flow-matching objectives, complemented by regularization techniques, to achieve competitive performance in density estimation and scientific computing.

Continuous-Time Normalizing Flow (CNF) is a parametric generative model that constructs invertible mappings between complex target distributions and tractable base distributions (commonly isotropic Gaussians) by integrating differential equations parameterized by neural networks. This continuous-time, ODE-based approach generalizes the classical discrete-layer normalizing flow architecture and enables exact likelihood computations, efficient sampling, and high expressivity on both Euclidean and manifold-structured data.

1. Mathematical Formulation and Core Principles

A CNF parameterizes an invertible map $\psi_{0 \to 1}\colon \mathbb{R}^d \to \mathbb{R}^d$ via an initial value problem: $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ where $v_\theta$ is a neural network, and $x_0$ is sampled from a base distribution $\rho_0$ (e.g., standard Gaussian).

The evolution of log-density along the path $z(t)$ is governed by the instantaneous change-of-variables equation: $\frac{d}{dt}\log p(z(t)) = -\mathrm{Tr}\left( \frac{\partial v_\theta}{\partial z}(z(t), t) \right)$ or, equivalently, by integrating

$\log p(z(1)) = \log p(z(0)) - \int_{0}^{1} \mathrm{Tr}\left( \frac{\partial v_\theta}{\partial z}(z(t), t) \right) dt$

Invertibility is guaranteed by the uniqueness of ODE solutions for smooth $v_\theta$ .

This continuous correspondence extends to manifolds $M$ by replacing Euclidean derivatives with those associated to the manifold's geometry, allowing CNFs to operate on nontrivial topologies (Falorsi, 2021).

2. Training Objectives and Algorithms

Maximum likelihood training seeks to maximize the likelihood of target data under the model, requiring ODE solves for both the state and the accumulated log-density. The principal loss is: $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 0 subject to the ODE system detailed above.

Flow-matching objectives sidestep costly ODE solves in training by reframing the problem as learning $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 1 so that the induced particle path distributions match a target probability path. For data pairs $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 2 and $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 3, the loss for the deterministic interpolation paradigm is: $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 4 This technique facilitates unbiased density estimation and enables training in contexts lacking access to true target samples, e.g., Boltzmann distributions when only energy evaluations are available (Dern et al., 3 Sep 2025).

KL divergence and optimal transport regularization:

Instead of pure MLE, many practical CNF variants add transport-cost penalties to encourage minimal-kinetic flows: $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 5 and, variationally, incorporate Wasserstein gradient flows or proximal-point (JKO) updates to robustify optimization (Vidal et al., 2022, Onken et al., 2020).

3. Computational and Representational Aspects

Invertibility and efficient Jacobian calculation: CNFs are fully invertible, and the log-determinant of the flow map's Jacobian is obtained by integrating the trace of the network Jacobian over time. High-dimensional trace computations are handled either by Hutchinson estimators or, for special architectures, by closed-form or layerwise diagonal accumulation (Onken et al., 2020).

Numerical ODE integration: Adaptive or fixed-step ODE solvers (Dormand–Prince, RK4) are used in forward and backward passes. Two distinct training paradigms have been formalized:

Optimize-Discretize (Opt-Disc): Differentiate through the continuous adjoint system; mesh-independent but sensitive to solver error alignment.
Discretize-Optimize (Disc-Opt): Discretize the ODE system with a fixed grid before training, ensuring exact gradients of the discrete objective and efficient memory usage. Proper time-step selection is crucial to maintain invertibility and accuracy (Onken et al., 2020).

Regularization for efficient integration: To combat excessive function evaluation (NFE), methods like Trajectory Polynomial Regularization penalize deviations of ODE trajectories from low-degree polynomials, exploiting ODE solver step-size adaptation directly and reducing wall-clock time by factors up to 70%, without compromising model accuracy (Huang et al., 2020).

4. Extensions for Structure and Conditional Modeling

Manifold CNFs: By parameterizing vector fields tangent to manifolds, and employing either local chart embeddings or orthogonal projections in ambient space, CNFs extend to Riemannian settings with guaranteed tangent-image flows and unbiased divergence estimators (Falorsi, 2021, Ben-Hamu et al., 2022). Applications include modeling densities on spheres, Lie groups, Stiefel, and positive-definite matrix manifolds.

Conditional and compositional CNFs: Architectural innovations such as InfoCNF partition the latent space into supervised and unsupervised codes, enabling efficient class-conditional generation and reducing computation in high-dimensional conditional generative tasks (Nguyen et al., 2019). Multi-scale and multi-resolution variants further decompose the data hierarchically for tractable modeling of high-resolution objects (Voleti et al., 2021).

Flow-Matching on Manifolds: The Probability Path Divergence (PPD) enables direct CNF training on manifolds without ODE solves per iteration, with theoretical guarantees controlling classical divergences between true and model densities, and with state-of-the-art empirical sample quality on structured-data benchmarks (Ben-Hamu et al., 2022).

5. Recent Theoretical Developments

Nonparametric convergence: For CNFs trained with linear interpolation and flow-matching, the Wasserstein-2 convergence rate scales as $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 6, under bounded, log-concave, or Gaussian-mixture targets, provided the velocity field is Lipschitz and early stopping is enforced near $\frac{d}{dt} z(t) = v_\theta(z(t), t),\qquad z(0) = x_0,\qquad t\in[0,1]$ 7 to avoid singularities in time-derivatives (Gao et al., 2024). Discretization errors, estimator errors (from finite sample and limited network capacity), and early-stop bias are all quantified.

Energy-weighted flow matching: For unnormalized densities where sampling the target is infeasible, the Energy-Weighted Flow Matching method enables CNF training using only energy (potential) evaluations and importance weights. Iterative and annealed algorithms achieve sample quality comparable to state-of-the-art score-based and flow-based Boltzmann samplers, at vastly reduced energy evaluation counts (Dern et al., 3 Sep 2025).

6. Applications and Empirical Performance

Density estimation and generative modeling: CNFs achieve state-of-the-art or competitive likelihoods on a broad class of datasets—from tabular UCI benchmarks to image modeling (MNIST, CIFAR-10, BSDS300, etc.)—with reductions in model size, training time, and integration steps compared to discrete normalizing flows (Onken et al., 2020, Huang et al., 2020, Voleti et al., 2021).

Scientific computing and high-dimensional sampling: CNFs have been deployed successfully in physics event generation, attaining order-of-magnitude improvements in unweighting efficiency for collider-phase-space sampling. Hybrid techniques ("RegFlow" distillation) transfer CNF performance into much faster architectures for practical event generation (Bothmann et al., 3 Apr 2026).

Stochastic process modeling: Dynamic CNF variants for continuous stochastic processes combine neural SDEs with invertible transforms, supporting exact and efficient likelihoods, analytical interpolation, and adaptation to irregularly sampled timeseries (Deng et al., 2020).

7. Open Challenges and Future Directions

Scalability, ODE solver optimization, and the ability to handle multimodal, singular, or implicitly defined targets remain engineering and theoretical fronts. Recent lines of research point towards increasingly tight theoretical rates (nonparametric W2), direct energy-based training, and highly expressive structured flows on non-Euclidean spaces (Gao et al., 2024, Dern et al., 3 Sep 2025, Ben-Hamu et al., 2022). Exploration of regularization, solver selection strategies, and adaptive architectural mechanisms continue to be active topics for performance and efficiency gains (Huang et al., 2020, Onken et al., 2020).

References

(Falorsi, 2021): "Continuous normalizing flows on manifolds"
(Gao et al., 2024): "Convergence of Continuous Normalizing Flows for Learning Probability Distributions"
(Ben-Hamu et al., 2022): "Matching Normalizing Flows and Probability Paths on Manifolds"
(Vidal et al., 2022): "Taming Hyperparameter Tuning in Continuous Normalizing Flows Using the JKO Scheme"
(Dern et al., 3 Sep 2025): "Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling"
(Onken et al., 2020): "Discretize-Optimize vs. Optimize-Discretize for Time-Series Regression and Continuous Normalizing Flows"
(Bothmann et al., 3 Apr 2026): "Monte Carlo Event Generation with Continuous Normalizing Flows"
(Voleti et al., 2021): "Multi-Resolution Continuous Normalizing Flows"
(Onken et al., 2020): "OT-Flow: Fast and Accurate Continuous Normalizing Flows via Optimal Transport"
(Nguyen et al., 2019): "InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers"
(Huang et al., 2020): "Accelerating Continuous Normalizing Flow with Trajectory Polynomial Regularization"
(Deng et al., 2020): "Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows"