SurVAE Flows: Unifying VAEs and Normalizing Flows

Updated 27 April 2026

The paper introduces SurVAE Flows as a modular framework that integrates bijective, surjective, and stochastic transformations to generalize VAEs and normalizing flows.
It enables exact likelihood evaluation by composing layers that can alter data dimensionality, unifying techniques like dequantization, augmentation, and symmetrization.
Empirical results show that SurVAE models achieve competitive density estimation and anomaly detection performance while reducing latent dimensionality.

SurVAE Flows are a modular generative modeling framework that generalizes normalizing flows and variational autoencoders (VAEs) by incorporating surjective, stochastic, and invertible transformations within a single composable architecture. The formalism enables exact likelihood evaluation with layers that may alter the data dimensionality—a property not available in standard normalizing flows—and naturally unifies previously disparate approaches, such as dequantization, augmentation, and symmetrization, within a single likelihood-based paradigm (Nielsen et al., 2020).

1. Foundational Principles and Mathematical Formalism

SurVAE Flows are constructed as compositions of three transformation types between random vectors $z \in \mathcal{Z}$ and data $x \in \mathcal{X}$ :

Bijective (invertible) transformations: $x = f(z)$ , $z = f^{-1}(x)$ . Exact data density: $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ , $z = f^{-1}(x)$ .
Surjective (dimension-altering) transformations: Either generative surjections (deterministic $z \to x$ , stochastic $x \to z$ ) or inference surjections (deterministic $x \to z$ , stochastic or deterministic $z \to x$ ). For inference surjections with deterministic $x \in \mathcal{X}$ 0 and $x \in \mathcal{X}$ 1,

$x \in \mathcal{X}$ 2

$x \in \mathcal{X}$ 3 is supported on the fiber $x \in \mathcal{X}$ 4.

Stochastic layers: $x \in \mathcal{X}$ 5 with inference $x \in \mathcal{X}$ 6. The classic (ELBO-based) VAE structure arises as a special case.

SurVAE Flows compute either the exact data log-density or a tractable variational bound, depending on the position and type of stochasticity in the chain (Nielsen et al., 2020, Baggenstoss et al., 2023). The central likelihood identity for a sequence of $x \in \mathcal{X}$ 7 transformations is

$x \in \mathcal{X}$ 8

where $x \in \mathcal{X}$ 9 is a tractable likelihood contribution and $x = f(z)$ 0 is a non-negative variational gap (zero for bijections and inference surjections).

2. Relationship with PDF Projection and Normalizing Flows

The SurVAE framework generalizes the PDF projection theorem of Baggenstoss, which constructs maximum entropy densities on the preimage of lower-dimensional features $x = f(z)$ 1 (Baggenstoss et al., 2023). In PDF projection, the back-projected density $x = f(z)$ 2 is defined as

$x = f(z)$ 3

where $x = f(z)$ 4 is a reference prior and $x = f(z)$ 5 is its push-forward.

Normalizing flows correspond to the case where $x = f(z)$ 6 is invertible and dimension-preserving. SurVAE Flows extend this to deterministic surjections $x = f(z)$ 7 (dimension-reducing or more general maps), specifying the fiber density $x = f(z)$ 8 explicitly, rather than deriving it from a reference prior.

Layerwise, the SurVAE approach allows chaining surjective, bijective, and stochastic blocks, with the exact log-likelihood accruing additively as the sum of log-Jacobian determinants or fiber log-densities (Baggenstoss et al., 2023).

3. SurVAE Layers: Archetypes and Compositionality

SurVAE Flows formalize a range of canonical layers, including:

Layer Type	Forward (Gen.)	Inverse (Inf.)	Likelihood Contribution
Bijection	$x = f(z)$ 9	$z = f^{-1}(x)$ 0	$z = f^{-1}(x)$ 1
Inference Surjection	$z = f^{-1}(x)$ 2	$z = f^{-1}(x)$ 3	$z = f^{-1}(x)$ 4
Generative Surjection	$z = f^{-1}(x)$ 5	$z = f^{-1}(x)$ 6	$z = f^{-1}(x)$ 7 (ELBO contribution)
Stochastic	$z = f^{-1}(x)$ 8	$z = f^{-1}(x)$ 9	ELBO; see (Nielsen et al., 2020)

Key instantiations:

Dequantization/Rounding: $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 0, $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 1 uniform or flow-based on $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 2.
Slicing: drop dimensions or slice off variables (e.g., as in multi-scale RealNVP); log-likelihood includes auxiliary/reconstruction density.
Absolute Value: $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 3, with $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 4 defined on fiber $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 5, log-likelihood involves a fiber mass term.
Sorting: $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 6; likelihood includes a term for the permutation index.

This architecture enables seamless composition: bijective and inference-surjective layers preserve an exact likelihood, while only stochastic or generative-surjective layers introduce a variational bound gap (Nielsen et al., 2020).

4. Dimension-Altering Flows and Funnel Layers

Dimension reduction or expansion is achieved by inference surjection layers. The "funnel" layer is a canonical SurVAE block that reduces dimension with exact likelihoods (Klein et al., 2021). Typical funnel constructions include:

Convolutional funnel: Partition $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 7; perform local linear/diffeomorphic mapping to $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 8 (reduced dimension), and model the residual slice $\log p_X(x) = \log p_Z(z) + \log |\det \nabla_x f^{-1}(x)|$ 9 with a reconstruction density $z = f^{-1}(x)$ 0.
MLP funnel: Linear/Triangular mappings split as $z = f^{-1}(x)$ 1; the Jacobian of the invertible subblock is tractable.
Generalized surjective mapping: Any layer with a tractable right-inverse and log-determinant supports SurVAE composition.

Empirically, funnel-equipped SurVAE Flows attain generative and anomaly detection performance close to or exceeding conventional flows, but with dramatically reduced latent space dimensionality. For instance, F-NSF with $z = f^{-1}(x)$ 2th latent dimensionality on CIFAR-10 achieved bits-per-dim of 1.71 versus 1.70 for a standard NSF, while significantly improving out-of-distribution detection scores (Klein et al., 2021).

5. Training and Computational Pipeline

The training procedure for SurVAE Flows parallels that of normalizing flows, except that each module computes either a log-Jacobian or a fiber/slice log-density: $z = f^{-1}(x)$ 5 For cascaded architectures, layerwise contributions are summed, with gradients flowing unambiguously through all blocks (bijective, surjective, stochastic) (Baggenstoss et al., 2023). For deterministic surjections satisfying the right-inverse condition, the resulting model admits exact maximum likelihood training without a variational bound or looseness term.

Key hyperparameters include partitioning fraction (percentage of dimensions reduced per block), the expressivity of the reconstruction density $z = f^{-1}(x)$ 3, and regularization (e.g., orthogonality of Jacobian sub-blocks) (Klein et al., 2021).

6. Applications and Theoretical Impact

SurVAE Flows provide a systematic foundation for several previously ad hoc model extensions:

Dequantization (e.g., Flow++): Unified as variational or uniform surjective mappings (Nielsen et al., 2020).
Augmented flows/ANF: Expressed as generative surjections plus slicing.
Symmetrization/Sorting: Surjective layers folding data onto orbits/noises.
Funnel-based compression: State-of-the-art density modeling with reduced representation complexity (Klein et al., 2021).
Sampling in discrete/combinatorial spaces: SurVAE Flow–augmented MCMC improves mixing and effective sample size by smooth transport into continuous latent spaces, then mapping back to discrete states, as demonstrated on Ising and quantized logistic regression benchmarks (Jaini et al., 2021).

The SurVAE Flow formalism thus subsumes and extends standard NFs and VAEs, affording new flexibility in architecture design, expressivity, and tractable inference in both continuous and discrete support domains.

7. Limitations and Open Challenges

While SurVAE Flows considerably expand the space of parameterizable likelihood-based generative models, challenging aspects remain:

Defining tractable, expressive conditional densities $z = f^{-1}(x)$ 4 for highly complex or nonlinear surjective maps remains nontrivial; traditional PDF projection techniques offer inversion tools not yet fully exploited in SurVAE flows (Baggenstoss et al., 2023).
The practical stability and expressiveness of very deep surjective chains have yet to be fully characterized for large-scale, real-world tasks.
For stochastic/generative-surjection layers, a variational gap is inevitable, mirroring conventional VAE limitations.
Large-scale empirical benchmarks have yet to systematically survey SurVAE Flows versus deep flows, VAEs, and other likelihood-based models on diverse modalities (Baggenstoss et al., 2023).

A plausible implication is that continued advances in fiber and slice density estimation, and connections with classical PDF projection, may further improve these architectures' robustness and tractability.

Key literature: "SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows" (Nielsen et al., 2020), "A Comparison of PDF Projection with Normalizing Flows and SurVAE" (Baggenstoss et al., 2023), "Funnels: Exact maximum likelihood with dimensionality reduction" (Klein et al., 2021), "Sampling in Combinatorial Spaces with SurVAE Flow Augmented MCMC" (Jaini et al., 2021).