Injective Probability Flow Regularized Autoencoder

Updated 2 March 2026

The paper introduces a framework blending injective probability flows with autoencoder regularization, enabling tractable likelihood estimation and high sample quality.
The methodology employs a smooth injective decoder, a left-inverse encoder, and Jacobian-based penalties to maintain reconstruction fidelity and local injectivity.
Empirical results on CelebA, CIFAR-10, and MNIST show improved reconstruction and sample FIDs compared to standard VAEs and β-VAEs.

The Injective Probability Flow Regularized Autoencoder (RAE) is a generative modeling framework that bridges injective probability flow models and regularized autoencoders, providing tractable likelihood estimation and superior sample quality without the architectural and computational constraints of bijective flow-based generative models. Unlike invertible normalizing flows that require dimensionality preservation between latent and data space, this approach only requires injectivity of the generator (decoder), unlocking efficient modeling of high-dimensional data manifolds with lower-dimensional latent spaces and explicit regularization directly tied to probability flow objectives (Kumar et al., 2020).

1. Model Structure and Injective Mapping

The foundational principle of the Injective Probability Flow RAE is the use of a smooth, differentiable, and one-to-one map $g_\theta: Z \to X$ , where $Z \subset \mathbb{R}^d$ is the low-dimensional latent space, and $X \subset \mathbb{R}^D$ is the high-dimensional data space with $D \gg d$ . Injectivity ensures every $x \in g_\theta(Z)$ has a unique $z$ such that $x = g_\theta(z)$ . The encoder $h_\phi : X \to Z$ acts as a left-inverse on the image, enabling unique inference for any $x \in g_\theta(Z)$ . This construction pushes forward a Gaussian prior $p_z(z) = \mathcal{N}(z; 0, \sigma^2 I)$ through $g_\theta$ to define the induced density $p_x(x)$ on the data manifold.

2. Likelihood Formulation and Relaxed Probability Flow

The probability flow perspective establishes the data probability density via a change-of-variables adapted to injective mappings. For $x = g_\theta(z)$ , the pushforward density on $g_\theta(Z)$ is given by

$p_x(x) = \frac{p_z(z)}{\sqrt{\det[J_g(z)^T J_g(z)]}}$

where $J_g(z)$ is the Jacobian of $g_\theta$ at $z$ . The log-likelihood can be expressed as

$\ln p_x(x) = \ln p_z(h_\phi(x)) - \frac{1}{2} \ln \det[J_g(h_\phi(x))^T J_g(h_\phi(x))]$

Computing the log-determinant is computationally expensive; to address this, a concavity-based lower bound is employed involving the Frobenius norm of the Jacobian, facilitating a tractable estimation. The core tight bound is

$\ln p_x(x) \geq \ln p_z(h_\phi(x)) - \frac{d}{2} \ln \left( \frac{1}{d} \lVert J_g(h_\phi(x)) \rVert_F^2 \right)$

and, equivalently, another lower bound parameterized by a fixed $\mu$ :

$\ln p_x(x) \geq \ln p_z(h_\phi(x)) - \frac{1}{2\mu} \lVert J_g(h_\phi(x)) \rVert_F^2 - \frac{d}{2} \ln \mu + \frac{d}{2}$

3. Autoencoder-Style Objective and Regularization

Translating the likelihood lower bound into a learnable objective yields a regularized autoencoder formulation,

$\mathcal{L}(\theta, \phi; x) = \frac{1}{2\sigma^2} \lVert h_\phi(x) \rVert^2 + \mu \lVert x - g_\theta(h_\phi(x)) \rVert^2 + \frac{d}{2} \ln \left[ \max \left( \lVert J_g(h_\phi(x))v \rVert^2, \eta^2 \lVert v \rVert^2 \right) \right ] + \mu_{in} \left[ \frac{\lVert J_g(h_\phi(x))v \rVert}{\lVert v \rVert} - \eta \right ]_{-}^2$

with $v \sim \mathcal{N}(0, I_d)$ . This formulation includes:

Prior-norm penalty: Encouraging latent codes close to the Gaussian prior.
Reconstruction loss: Penalizing deviation between the data and its reconstruction.
Jacobian-smoothness loss: Penalty derived from the log-Frobenius bound, favoring smooth $g_\theta$ .
Injectivity penalty: Enforces a lower bound $\eta > 0$ on the smallest singular value of the Jacobian, ensuring local injectivity.

The injection of a soft penalty for reconstruction (as opposed to a hard constraint) is critical for stable optimization.

4. Algorithmic Implementation

Both encoder $h_\phi$ and decoder $g_\theta$ are constructed using five-layer convolutional or transposed convolutional neural networks, with ELU activations and batch normalization. Latent dimension settings include $d=32$ for MNIST and $d=128$ for CIFAR-10 and CelebA. Training uses Adam optimization with an initial learning rate of $1 \times 10^{-3}$ , batch size 128, and 100k training steps. The regularization weights $\mu$ and $\mu_{in}$ start at 1 and increase progressively, with typical $\sigma^2 \in \{10^{-4}, \dotsc, 1\}$ , $\lambda = 1$ , and injectivity threshold $\eta = 0.1$ . Jacobian-penalty terms are estimated using single Hutchinson samples per example and either automatic-differentiation-based Jacobian-vector products or finite-difference approximations.

5. Empirical Evaluation and Quantitative Performance

Evaluation is conducted by fitting either a single Gaussian or a 10-component Gaussian mixture model (GMM) in latent space post-training, then decoding to produce samples for Fréchet Inception Distance (FID) measurement. On CelebA, the injective flow model ("InjFlow") achieves the lowest reconstruction FID (28.5) and the lowest sample FID with a GMM sampler (40.6), outperforming VAE, β-VAE, and regularized AEs:

Model	Recon FID	Samples (GMM)
VAE	62.4	67.8
β-VAE	30.1	42.8
AE	30.2	43.5
AE+SN	31.2	43.3
InjFlow	28.5	40.6

Comparable improvements are reported on CIFAR-10, while on MNIST, InjFlow is competitive with β-VAE. Qualitatively, InjFlow reconstructions consistently preserve fine-grained details (e.g., hair structure in CelebA) superior to VAEs, which exhibit significant blurring. Random samples from InjFlow display enhanced sharpness but in some cases show artifacts attributable to the injectivity constraint.

6. Significance and Theoretical Perspective

By relaxing the generative mapping requirement from bijectivity to injectivity, the model circumvents the dimensionality limitations and high computational costs of fully invertible flow-based models, while maintaining a closed-form, tractable probability density and efficient training on high-dimensional data. The regularization terms are not ad hoc but are rigorously derived from principled lower bounding of the probability flow objective. The approach synthesizes the best attributes of VAEs (dimensionality reduction, tractable likelihood), autoencoders (reconstruction-driven regularization), and flows (explicit density modeling), offering a unified and scalable solution for generative learning in the underdetermined regime ( $d \ll D$ ) (Kumar et al., 2020).

A plausible implication is that this framework opens avenues for further research in generative modeling on complex manifolds, where strict invertibility is neither feasible nor desirable, and where explicit regularization of the generative map's geometry is critical for both sample quality and tractable likelihood estimation.

Markdown Report Issue Upgrade to Chat

References (1)

Regularized Autoencoders via Relaxed Injective Probability Flow (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Injective Probability Flow Regularized Autoencoder (RAE).