Conditional Normalizing Flows

Updated 28 August 2025

Conditional normalizing flows are generative models that use invertible, differentiable mappings to parameterize conditional probability distributions.
They leverage techniques like affine and spline coupling layers to enable tractable likelihood evaluation and efficient sampling from complex inputs.
They are applied in high-dimensional settings such as super-resolution and climate downscaling, offering robust uncertainty quantification and inference.

Conditional normalizing flows are a class of generative models that parameterize families of probability distributions over outputs, conditioned on auxiliary inputs, using sequences of invertible, differentiable mappings (flows). By jointly leveraging the change-of-variables formula and input-dependent transformations, these models facilitate exact likelihood evaluation and efficient sampling from complex conditional distributions. Recent research has advanced both the architectural foundations and application domains of conditional normalizing flows, demonstrating their effectiveness for high-dimensional structured prediction, inverse problems, statistical downscaling, scientific inference, and uncertainty quantification.

1. Foundational Principles and Formulation

Conditional normalizing flows extend standard normalizing flows by introducing side information ("conditions") into the invertible transformation. Given an auxiliary input $c$ (sometimes denoted $x$ ), a base random variable $z \sim p_0(z)$ , and an output variable $y$ , the flow applies an invertible map $y = f_\phi(z; c)$ , so that the density over $y$ becomes:

$p(y|c) = p_0\big(f_\phi^{-1}(y;c)\big) \left| \det \frac{\partial f_\phi^{-1}(y;c)}{\partial y} \right|$

This permits the tractable computation of both conditional likelihoods and exact samples. The conditioning can be implemented in a variety of ways:

Directly modulating neural network weights or affine parameters by $c$
Concatenating $c$ to inputs of coupling/transform layers
Adapting priors in latent space $p_0(z|c)$

Conditional flows thus provide explicit models for families of distributions $\{p(\cdot|c)\}$ and unify conditional generative modeling with invertibility (Winkler et al., 2019, Abdelhamed et al., 2019, Klein et al., 2022).

2. Architectural and Algorithmic Advances

Several architectural strategies have been developed to parameterize conditional normalizing flows:

Affine and Spline Coupling Layers: Conditional affine coupling layers modify the scale-and-shift networks $s, t$ to take both the transformed part of input and the condition as input: $y_2 = x_2 \odot \exp(s(x_1,c)) + t(x_1,c)$ (Winkler et al., 2019, Atanov et al., 2019). Neural spline flows generalize this to more expressive conditional nonlinear transformations (Zhao et al., 25 Mar 2025).
Multi-scale and Multi-stage Flows: Decomposition into deep unconditional flows (computationally heavy, class-agnostic) and lightweight conditional flows (label- or condition-dependent), as in the semi-conditional normalizing flow (SCNF) model, enables efficient inference with shared computation across classes (Atanov et al., 2019).
Continuous-time Flows: ODE-based flows allow the definition of conditional mappings via parametrized vector fields, facilitating applications to geometric or graph data while handling invariance constraints (Rozenberg et al., 2023).
Variational and Amortized Inference: Where exact optimization is impractical, variational distributions over latent variables or amortized inference architectures (e.g., inference networks for missing data) enable scalable conditional density modeling and conditional sampling (Moens et al., 2021, Whang et al., 2020, Cannella et al., 2020).
Hybrid and Composite Architectures: Stacking conditional flows (flows-of-flows) facilitates mappings between arbitrary pairs of distributions, with exact maximum likelihood objectives (Klein et al., 2022).

These advances underpin the flexibility of conditional flows in capturing complex multimodal dependencies and facilitate practical deployment in settings where conditioning variables are high-dimensional or structured.

3. Application Domains and Methods

Conditional normalizing flows have been successfully deployed across a wide spectrum of scientific and engineering domains:

Application Domain	Conditioning Variable	Output Variable	Key Metrics / Outcomes
Image super-resolution	LR image $x$	HR image $y$	Bits/dim, PSNR, SSIM, high-freq fidelity
Statistical downscaling	Coarse climate $x$	Fine climate $y$	MAE, RMSE, CRPS, calibrated uncertainty
Survival analysis	Covariates $X$	Event time $T$	Concordance index, customized quantiles
Particle physics	Protected attr. $m$	Discriminant $s$	Decorrelation, background rejection
Scientific inference	Event features $y$	Energy, Dir. $x$	Entropy, KL-divergence diagnostics
Mean field control	State, time $(x, t)$	Controlled state $y$	Optimality, transport cost
Noise modeling	Sensor, gain, signal	Noise patch $n$	NLL, PSNR/SSIM downstream
Inverse problems	Measurement $y^*$	Source image $x$	FID, MSE, MMSE, uncertainty bands

In each domain, conditional flows leverage invertibility for calibrated uncertainty quantification and sampling, enable joint density/training objectives, and support generalized conditioning architectures (Winkler et al., 2019, Winkler et al., 31 May 2024, Glüsenkamp, 2023, Friedman et al., 2022).

4. Training Objectives and Inference

Training conditional normalizing flows commonly employs maximum likelihood estimation over the joint input/conditioned data: $\mathcal{L} = \mathbb{E}_{(y,c) \sim \text{data}} \left[ -\log p(y|c) \right]$ using the explicit change-of-variables expressions for tractable and stable optimization (Winkler et al., 2019).

For scenarios such as semi-supervised learning, the total objective combines log-likelihood terms for labeled pairs and marginal likelihood (via summation or integration over possible labels) for unlabeled instances:

$\mathcal{L}_\text{semi-sup} = \sum_{(x_i, y_i) \in \mathcal{L}} \log p(x_i, y_i) + \sum_{x_j \in \mathcal{U}} \log \sum_{y=1}^K p(x_j|y)p(y)$

(Atanov et al., 2019).

Inference tasks such as conditional sampling, distribution mapping, or statistical CDF evaluation use the explicit invertibility. For missing data problems or in complex scientific settings, methods such as projected latent MCMC (Cannella et al., 2020), variational Schur complement sampling (Moens et al., 2021), and boundary-flux–based CDF estimation (Sastry et al., 2022) further extend inference capabilities, often combining exact and approximate approaches.

5. Theoretical and Practical Limitations

Several fundamental challenges and trade-offs are recognized:

Depth-Conditioning Trade-off: The expressivity of affine coupling and related flows can be characterized by a universal approximation property if ill-conditioning is permitted, but shallow networks may incur extremely high Jacobian condition numbers, hampering training (Koehler et al., 2020). Sufficient depth is necessary for efficiently capturing complex joint/conditional dependencies.
Computation and Memory: Although conditional flows afford efficient density and sample computation for each condition, model scalability can still be limited by memory and compute (especially in high-dimensional settings); architectural re-use of shared computation (e.g., blockwise decoupling) improves efficiency (Atanov et al., 2019).
Training Instabilities: While maximum likelihood–based flows are more stable than adversarial models, choice of parameterization, partitioning, and invertibility constraints can influence gradient propagation and convergence—particularly relevant in deep or multi-modal settings (Koehler et al., 2020).
Amortization vs. Instance-specific Inference: Amortized inference (learning mappings from conditions to flow parameters or variational posteriors) speeds up generic tasks but may slightly reduce distributional fidelity compared to instance-specific optimization (Whang et al., 2020).
Generalization to Low-dimensional Data Manifolds: When the target distribution is supported on a low-dimensional manifold, standard flows (which are full-dimension invertible maps) can suffer from near-singular Jacobians, requiring careful regularization (Koehler et al., 2020).

Model and architectural choices, such as flow depth, type of coupling layer, and conditioning embedding strategy, are therefore application-specific and influenced by representational and optimization constraints.

6. Recent Research Directions and Outlook

Conditional normalizing flows have catalyzed a number of methodological and domain advances:

Integrated Variational Conditional Frameworks: Combinations of variational inference and CNFs yield scalable and efficient solvers for high-dimensional control/transport problems, as demonstrated in variational CNFs for mean field control (Zhao et al., 25 Mar 2025).
Algorithmic Generalization (Flowification): Broader classes of (not strictly invertible) neural networks can be "flowified" through stochastic inverse passes and explicit likelihood tracking, generalizing the applicability of flows to more architectures and suggesting new directions for conditional generative models (Máté et al., 2022).
Domain-specific Model Design: Implementations such as HIGlow (for HI map generation (Friedman et al., 2022)), IceCube event reconstruction (Glüsenkamp, 2023), climate downscaling (Winkler et al., 31 May 2024), and receptor-aware ligand sampling (Rozenberg et al., 2023), demonstrate the ability of CNFs to absorb auxiliary information, respect geometric or physical invariances, and support efficient parameter inference and uncertainty calibration.
Distillation and Efficiency Improvements: Knowledge distillation transfers the solution of complex CNF models to faster non-invertible architectures with minimal performance degradation, greatly improving deployment potential for resource-constrained settings such as real-time synthesis or scientific inference (Baranchuk et al., 2021).
Calibration, CDF Evaluation and Risk Analysis: Integrating divergence-theorem–based CDF calculation (Sastry et al., 2022) and entropy/KL-divergence diagnostics allows for robust uncertainty quantification and calibration, crucial in scientific domains and trustworthy AI applications.

Overcoming the limitations of invertibility, scaling to very high-dimensional outputs, and incorporating richer structural or physical priors are major ongoing research directions.

7. Summary Table: Core Modeling, Training, and Inference Aspects

Aspect	Technical Approach	Citation(s)
Density Formulation	$p(y\|c) = p_0(f^{-1}(y;c)) \left\| \det \frac{\partial f^{-1}(y;c)}{\partial y} \right\|$	(Winkler et al., 2019)
Conditional Layer	Affine/spline coupling, conditional priors, direct input modulation	(Atanov et al., 2019, Zhao et al., 25 Mar 2025, Abdelhamed et al., 2019, Ausset et al., 2021)
Sampling	Sample $z$ , invert $f^{-1}(z;c)$ for draws from $p(y\|c)$	(Winkler et al., 2019, Winkler et al., 31 May 2024)
Marginalization	Efficient sum/integration over labels/classes via decoupled flows	(Atanov et al., 2019, Xiao et al., 2019)
Training Objective	Maximum likelihood, variational lower bounds, amortization	(Atanov et al., 2019, Whang et al., 2020, Zhao et al., 25 Mar 2025)
Inference Algorithms	MCMC (PL-MCMC), variational Schur complement, boundary flux CDF	(Cannella et al., 2020, Moens et al., 2021, Sastry et al., 2022)
Domains of Application	Super-resolution, survival analysis, particle physics, inverse problems, control, denoising, scientific modeling	(Winkler et al., 2019, Glüsenkamp, 2023, Friedman et al., 2022, Ausset et al., 2021, Xiao et al., 2019, Abdelhamed et al., 2019)

Conditional normalizing flows provide a theoretically principled and practically versatile framework for modeling, sampling, and inferring complex conditional distributions, unifying efficient invertibility, calibrated uncertainty, and exact likelihood training across a broad range of contemporary scientific and engineering challenges.