Normalizing Flows in Generative Modeling

Updated 27 April 2026

Normalizing Flows are deep generative models that use invertible, differentiable transformations to convert simple base distributions into complex, high-dimensional data densities.
They enable exact log-likelihood evaluation and efficient bidirectional sampling, making them vital for density estimation and probabilistic inference.
They employ diverse architectures—such as coupling, autoregressive, and residual flows—that balance computational efficiency with powerful, expressive modeling capabilities.

Normalizing Flows (NF) are a class of deep generative models that parameterize flexible probability distributions via a sequence of invertible, differentiable transformations. They stand out for providing exact log-likelihood evaluation and efficient sampling in both directions between data and latent spaces, underpinning their use in density estimation, generative modeling, probabilistic inference, and scientific computing.

1. Mathematical Definition and Core Principle

At the foundation of a normalizing flow lies the change-of-variables formula for probability density transformation. Let $x \in \mathbb{R}^D$ denote a data point, and $z \in \mathbb{R}^D$ a latent variable (often assumed to have a simple base density, usually $\mathcal{N}(0, I)$ ). A normalizing flow defines a bijective mapping $f: \mathbb{R}^D \to \mathbb{R}^D$ , such that: $z = f(x),\qquad x = f^{-1}(z)$ With invertibility, the induced density on $x$ is: $p_X(x) = p_Z(f(x)) \cdot \left| \det \frac{\partial f(x)}{\partial x} \right|$ or, equivalently, in terms of log-likelihood: $\log p_X(x) = \log p_Z(f(x)) + \log \left| \det \frac{\partial f(x)}{\partial x} \right|$ Modeling is performed by composing $K$ simple invertible layers—each supporting tractable inverses and Jacobian determinants—yielding a flexible, highly expressive family of diffeomorphisms for density modeling (Papamakarios et al., 2019, Kobyzev et al., 2019, Zhai et al., 2024).

2. Flow Architectures: Coupling, Autoregressive, Residual, ODE-based

2.1 Coupling Layers

Coupling layers [RealNVP, Glow] split the variables into two groups, transforming one group conditioned on the other: $\begin{cases} y_A &= x_A \ y_B &= x_B \odot \exp(s(x_A)) + t(x_A) \end{cases}$ $z \in \mathbb{R}^D$ 0 are typically realized as neural networks. The Jacobian is block-triangular, yielding efficient calculation of the determinant $z \in \mathbb{R}^D$ 1 (Papamakarios et al., 2019, Kobyzev et al., 2019).

2.2 Autoregressive Layers

Autoregressive flows (MAF/IAF, NAF) leverage triangular map structures: $z \in \mathbb{R}^D$ 2 Here, $z \in \mathbb{R}^D$ 3 is an invertible univariate function with parameters conditioned on predecessors (Papamakarios et al., 2019, Kobyzev et al., 2019). The Jacobian is triangular, allowing

$z \in \mathbb{R}^D$ 4

Sampling is sequential (or parallel in IAF), while likelihood computation is parallel (in MAF).

2.3 Neural Spline and Piecewise-Bijective Flows

Expressivity can be increased by replacing affine maps with monotonic bijective splines (neural splines [Neural Spline Flows]) or mixture-based bijections (Flow++, RAD):

Neural Spline Flows: univariate monotonic rational-quadratic splines parameterized by neural networks, supporting tractable inverses and Jacobians (Kobyzev et al., 2019, Reyes-Gonzalez et al., 2022).
RAD and related architectures allow flows over discrete or semi-discrete domains.

2.4 Residual and Continuous Flows

Residual Flows and continuous normalizing flows (CNFs, e.g., FFJORD, Deep Diffeomorphic Normalizing Flows) represent transformations as discretized (or continuous) flows generated by neural ODEs: $z \in \mathbb{R}^D$ 5 Jacobian determinants are computed by integrating the trace of the Jacobian of $z \in \mathbb{R}^D$ 6 along the trajectory, leveraging properties of the Lie group of diffeomorphisms (Salman et al., 2018, Vidal et al., 2022).

3. Training Objectives and Learning Procedures

Maximum likelihood estimation (MLE) is canonical for NFs: maximize

$z \in \mathbb{R}^D$ 7

For all invertible architectures above, this objective is amenable to unbiased stochastic gradient descent using automatic differentiation (Papamakarios et al., 2019, Kobyzev et al., 2019). For continuous-flow models, differentiability through black-box ODE solvers and efficient estimation of trace terms (via Hutchinson’s estimator) are used (Salman et al., 2018).

In variational inference, flows augment variational posteriors—most commonly as flexible families for the approximation $z \in \mathbb{R}^D$ 8 within VAEs (Dong et al., 2022). More recent developments include hybrid loss functions involving optimal transport costs and Wasserstein gradient flows that bridge NFs and statistical physics (Vidal et al., 2022, Morel et al., 2022).

4. Expressivity, Universality, and Theoretical Properties

A key theoretical property of NFs is universality: with sufficient depth, triangular (autoregressive) flows or interleaved coupling flows are capable of approximating any target density that is positive and smooth a.e. (Papamakarios et al., 2019, Kobyzev et al., 2019). In practice, the representational power dramatically increases once every variable has mixed with all others—e.g., after at least three coupling layers, all marginals and conditionals can be represented nonlinearly (Wehenkel et al., 2020).

However, there are limitations:

Purely affine (linear) transformations, regardless of depth, cannot model independent non-Gaussian marginals (Wehenkel et al., 2020).
Standard flows may struggle to model distributions on lower-dimensional manifolds embedded in high-dimensional spaces due to the invertibility constraint; this is often addressed via noise injection or by learning flows on the data manifold (Postels et al., 2022).

Regularization and architectural details (e.g., Jacobian-norm penalty, Tikhonov regularization) help address failures on low-intrinsic-dimensionality data and stabilize training (Feinman et al., 2019).

5. Practical Considerations: Design, Implementation, and Empirical Regimes

5.1 Key Implementation Patterns

Efficient log-determinant calculation and invertibility constraints drive flow design:

Block- or strictly lower-triangular Jacobian matrices (as in coupling/AR layers) enable $z \in \mathbb{R}^D$ 9 determinant computation.
Learnable permutations (e.g., invertible $\mathcal{N}(0, I)$ 0 convolutions in Glow) ensure that all variables interact over multiple layers (Kobyzev et al., 2019, Papamakarios et al., 2019).
For continuous flows, neural ODEs and their discretizations are standard; in practice, depth, step size, and regularization balance expressivity and computational cost (Salman et al., 2018, Vidal et al., 2022).

5.2 Software and Modularity

Packages such as normflows (Stimper et al., 2023) implement all standard architectures (RealNVP, Glow, MAF, Neural Spline Flows, Residual Flows) as composable invertible layer stacks, supporting both research prototyping and application in larger probabilistic machine learning systems.

5.3 Empirical Performance and Benchmarking

Flow models have achieved state-of-the-art log-likelihoods for image, tabular, density estimation, and scientific domains (e.g., turbulence closure (Bezgin et al., 2021), Diagrammatic Monte Carlo (Leoni et al., 2024)). Scaling to high-dimensional problems remains challenging; autoregressive rational-quadratic spline flows demonstrate superior robustness on multimodal and correlated densities in high dimensions (up to $\mathcal{N}(0, I)$ 1) (Reyes-Gonzalez et al., 2022).

Recent advances have applied transformer-based architectures for flows (TARFlow, iTARFlow) to close the gap with diffusion and GAN-based models in image synthesis (Zhai et al., 2024, Chen et al., 21 Apr 2026, Chen et al., 27 Nov 2025). Incorporating multi-scale architectures, entropy-driven weighting/shuffling, and feature-dependent splitting improves memory efficiency and expressivity (Chen et al., 2024).

6. Extensions and Frontiers

6.1 Beyond Basic Flows: Mixtures and Latent Variables

Gradient Boosted Normalizing Flows (GBNF) implement mixtures of flows, dramatically increasing flexibility for multimodal tasks without deepening any single component (Giaquinto et al., 2020). Flows with variational latent representations further improve performance on highly multimodal or clustered data by conditioning the flow on a learned discrete/continuous latent variable, optimized via an ELBO objective (Dong et al., 2022).

6.2 Flows in Optimal Transport and Geometry

Optimal transport theory provides a principled lens for analyzing and improving flow maps: given that many invertible maps exist with identical likelihood, Monge maps correspond to the minimal transport cost. Recent algorithms post-process pretrained NFs to produce “OT-efficient” flows by learning a volume-preserving rearrangement of the latent base, regularized to be a geodesic in the diffeomorphism group (Morel et al., 2022). Deep Diffeomorphic Flows construct the flow directly as an integrated velocity field, enabling connections to geodesics and Riemannian geometry in function space (Salman et al., 2018).

6.3 Representation Learning and Alignment

A persistent challenge for NFs is the poor semantic coherence of internal representations under pure likelihood training. Alignment strategies targeting the generative pathway—such as reverse representation alignment (R-REPA)—inject semantic structure by aligning NF features with those from robust vision foundation models, yielding substantial improvements in both sample fidelity and discriminative use of NF features (Chen et al., 27 Nov 2025).

7. Applications, Open Problems, and Future Directions

Normalizing flows are established tools for:

Likelihood-based generative modeling (image, audio, video, point clouds)
Variational inference for deep Bayes
Likelihood-free or simulation-based inference in physics and scientific domains
Dimension reduction and interpretable representation learning (Papamakarios et al., 2019, Feinman et al., 2019, Bezgin et al., 2021, Leoni et al., 2024)

Despite strong theoretical foundations and empirical performance, major open challenges include:

Efficient scaling to very high-dimensional and manifold-structured data
Universal discrete and hybrid flows for non-Euclidean or structured domains
Memory and compute bottlenecks in deep or ODE-based flows
Generalization bounds and theoretical expressivity quantification
Integration with modern inductive biases (transformers, symmetry, geometric priors)
Robustness of flows on marginal/conditional likelihoods and out-of-distribution generalization (Kobyzev et al., 2019, Zhai et al., 2024, Reyes-Gonzalez et al., 2022)

Recent work demonstrates that with architectural and algorithmic advances—including flow-transformer hybrids, entropy-based channel operations, and guidance/inference adaptations—normalizing flows are competitive with or surpass other leading generative modeling paradigms, while retaining exact likelihoods and tractable sampling (Zhai et al., 2024, Chen et al., 21 Apr 2026, Chen et al., 27 Nov 2025, Chen et al., 2024).