Continuous Normalizing Flow Models

Updated 23 April 2026

Continuous normalizing flows are continuous-time, invertible transformations modeled by neural ODEs that capture complex probability distributions.
They leverage neural network-defined vector fields and ODE-based change-of-variables to compute exact log-likelihoods for efficient density estimation.
CNFs support applications in image synthesis, speech generation, and scientific sampling while requiring strategies to mitigate high computational costs.

A Continuous Normalizing Flow (CNF) is a family of invertible, differentiable transformations between probability distributions, constructed as the solution to a neural ordinary differential equation (ODE). CNFs generalize discrete normalizing flows by replacing a finite composition of flows with a continuous-time flow parameterized by a neural network–defined vector field. This framework enables exact evaluation of the log-likelihood via the change-of-variables formula and provides highly flexible, expressive density estimation and generative modeling. CNFs are foundational for scalable likelihood-based generative models, conditional density estimation, manifold modeling, and stochastic process learning.

1. Mathematical Foundation of CNFs

CNFs replace the discrete sequence of invertible maps in standard normalizing flows with continuous-time transformations governed by ODEs. The transformation of a data point $x$ is cast as an initial value problem: $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ where $f$ is a learnable vector field (typically a neural network), and $z(T)$ is the mapped latent variable. The corresponding change in density follows the instantaneous change-of-variables formula: $\frac{d}{dt}\log p(z(t)) = -\mathrm{Tr} \left( \frac{\partial f}{\partial z}(z(t), t; \theta) \right)$ Integrating, the log-likelihood for a data point $x$ is: $\log p(x) = \log p_{\mathrm{base}}(z(T)) - \int_{0}^{T} \mathrm{Tr}\left( \frac{\partial f}{\partial z}(z(t), t; \theta) \right) dt$ where $p_{\mathrm{base}}$ is the base distribution, usually standard normal. The invertibility and differentiability of the flow are ensured by the ODE formulation (Du et al., 2022, Kim et al., 2020, Onken et al., 2020).

2. Training Objectives and Likelihood Computation

The primary objective is maximum likelihood estimation, where the model is trained to maximize the log-likelihood of data under the learned flow. This is efficiently implemented by augmenting the ODE state with the log-density and integrating both quantities jointly. Stochastic trace estimators, such as the Hutchinson method, reduce the cost of computing the Jacobian trace from $O(d^2)$ to $O(d)$ per evaluation (Kim et al., 2020, Onken et al., 2020).

Recent advancements introduce regularizations and alternative objectives to address computational bottlenecks:

Trajectory Polynomial Regularization penalizes deviation of ODE solutions from low-degree polynomials in time to reduce truncation error and the number of function evaluations (NFE), achieving 20–70% reduction in NFE without loss of accuracy (Huang et al., 2020).
Optimal Transport Regularization encourages geodesic (straight-line) flows in latent space by penalizing the kinetic energy of trajectories and enforcing Hamilton–Jacobi–Bellman constraints, thereby improving stability and efficiency (Onken et al., 2020, Vidal et al., 2022).
Temporal Optimization treats the ODE integration time horizon as a trainable parameter, enabling the model to minimize NFE by adapting the evolutionary time per batch (Du et al., 2022).

The loss function may include additional penalties or regularization terms for stability or computational efficiency, such as quadratic regularization on the integration time (Du et al., 2022).

3. Algorithmic and Architectural Innovations

To address the high computational cost and enhance model expressivity or conditional capabilities, several algorithmic innovations are employed:

Multi-Resolution CNFs structure flows as a pyramid of conditional models operating at different spatial scales, preserving exact likelihood and enabling efficient high-resolution image modeling with fewer parameters (Voleti et al., 2021).
Conditional CNFs and Gating Networks partition latent space into supervised and unsupervised codes, with adaptive ODE solver tolerances learned by small gating policies, significantly reducing conditioning network size and NFEs in tasks like image classification (Nguyen et al., 2019).
Energy-Weighted Flow Matching (EWFM) enables CNF training on Boltzmann distributions using only unnormalized energy evaluations, by leveraging importance sampling over flexible proposals and iterative/annealed training (Dern et al., 3 Sep 2025).

Recent works have also demonstrated scalable implementations on Riemannian manifolds via closed-form PDE losses (e.g., Probability Path Divergence), eliminating ODE solves during training while retaining theoretical guarantees (Ben-Hamu et al., 2022, Falorsi, 2021).

4. Theoretical Guarantees and Convergence

Rigorous analysis of CNFs has yielded non-asymptotic convergence guarantees for simulation-free, flow-matching CNF estimators. Under conditions such as bounded support, strong log-concavity, or mixture-of-Gaussian target distributions, the $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 0-distance between the learned and true distributions can be bounded at rate $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 1, with explicit error decomposition into discretization, estimation, and stopping terms (Gao et al., 2024). These results depend on Lipschitz regularity of the neural network–parameterized velocity field and cover regression-based estimators for velocity matching.

Convexity in certain regularized CNF optimization subproblems—for example, convexity in integration time for quadratic penalization—ensures stable coordinate-descent schemes (Du et al., 2022). On manifolds, divergence estimators and vector field parameterizations are formulated to maintain measure-theoretic consistency and scalability (Falorsi, 2021).

5. Practical Applications

CNFs have been successfully applied in several domains:

Density Estimation and Generative Modeling: Exact likelihood computation and invertible sampling for image, speech, and time-series data (Kim et al., 2020, Onken et al., 2020, Kim et al., 2020, Deng et al., 2020).
Efficient Monte Carlo Event Generation: In high-energy physics, CNFs trained via flow matching attain up to $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 2 improvements in unweighting efficiency over traditional methods, with amortized wall-clock speedups when paired with fast coupling-layer flows (RegFlow) (Bothmann et al., 3 Apr 2026).
Speech Synthesis: CNF-based vocoders such as WaveNODE provide parameter efficiency and training by maximum likelihood, obviating the need for teacher networks, with synthesis rates up to 51k samples/sec (Kim et al., 2020).
Scientific Sampling: Sampling from high-dimensional unnormalized Boltzmann distributions with competitive sample quality and three orders of magnitude fewer energy evaluations than prior energy-only methods (Dern et al., 3 Sep 2025).
Reinforcement Learning: Expressive policy distributions for on-policy algorithms such as PPO are constructed using CNFs, with efficient surrogate objective computation and entropy regularization based on Brownian dynamics (Yang et al., 1 Feb 2026).
Modeling Stochastic Processes: Irregular time series and interpolation/extrapolation with closed-form or numerically exact likelihoods through time-indexed CNF warping of Gaussian processes (Deng et al., 2020).
Manifold Data Modeling: Scalable CNFs designed for non-Euclidean spaces—e.g., spheres, Lie groups, product manifolds—with unbiased divergence estimators and high effective sample size (Ben-Hamu et al., 2022, Falorsi, 2021).

6. Computational Complexity, Stability, and Limitations

While CNFs offer greater expressivity than discrete flows, they typically require more ODE function evaluations per data point, especially on large or complex datasets. Key strategies for complexity reduction include:

Temporal Optimization: Adaptive training of integration time can reduce function evaluations by 30–60% and training time by up to 2× without loss in held-out likelihood (Du et al., 2022).
Polynomial Trajectory Penalization: Enforcing near-polynomial ODE solution paths reduces local truncation error, with empirical NFE reductions up to 71% (Huang et al., 2020).
Optimal Transport Regularization: Ensures non-intersecting, straight flows, favoring smaller step size and parameter count (Onken et al., 2020).
Manifold-Preserving Algorithms: Vector fields on manifolds may require specialized network architectures or projection steps, but tractable divergence estimation (e.g., stochastic trace estimators) keeps training feasible (Ben-Hamu et al., 2022, Falorsi, 2021).

Limitations of CNFs remain in terms of high-dimensional stiffness, solver-induced computational bottlenecks, and the tendency toward increased NFE as models learn more complex dynamics. While regularizations and architectural advances ameliorate many concerns, careful hyperparameter tuning (e.g., regularization strength, ODE tolerances) may still be necessary and is the subject of recent developments such as JKO iterative schemes for hyperparameter robustness (Vidal et al., 2022).

7. Empirical Results and Benchmarks

The empirical performance of CNFs is documented across multiple benchmarks:

Domain	Task/Metric	CNF Innovation	Empirical Finding	Reference
Density Estimation	Test NLL, bits/dim	Temporal Optimization	1.5–2× faster, 0.01–0.05 nats within baseline	(Du et al., 2022)
Generation	Image BPD, training time	Multi-Resolution CNF	Comparable BPD, faster convergence	(Voleti et al., 2021)
HEP Event Gen	Unweighting efficiency ( $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 3)	Flow Matching CNF	Up to 184× improvement on 5 jet process	(Bothmann et al., 3 Apr 2026)
Speech Synthesis	Subjective MOS, NFE	WaveNODE	MOS ≈ 3.5, 1/6 Glow params, double speed	(Kim et al., 2020)
Boltzmann Sampling	NLL, $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 4, energy evals	EWFM/iEWFM/aEWFM	10³× reduction in energy cost, SOTA quality	(Dern et al., 3 Sep 2025)
Manifold Modeling	NLL, KL, ESS on $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 5, SO(3)	Manifold CNF, PPD	Outperforms vMF mixtures, scales to $\frac{dz(t)}{dt} = f(z(t), t; \theta), \qquad z(0) = x, \quad t \in [0, T]$ 6	(Ben-Hamu et al., 2022)

Interpretation: CNFs, across architectures and regularization methods, exhibit competitive or superior likelihood-based performance with substantial computational gains over prior baselines in image, time-series, manifold, and scientific modeling contexts.

In summary, CNFs form a unifying framework for continuous-time, invertible generative modeling, combining mathematical tractability, training flexibility, and domain adaptability. Ongoing research focuses on further improving computational efficiency, theoretical scaling, and domain-specific architectures, broadening the applicability of CNFs across scientific, engineering, and machine learning domains.