Papers
Topics
Authors
Recent
Search
2000 character limit reached

Injective Probability Flow Regularized Autoencoder

Updated 2 March 2026
  • The paper introduces a framework blending injective probability flows with autoencoder regularization, enabling tractable likelihood estimation and high sample quality.
  • The methodology employs a smooth injective decoder, a left-inverse encoder, and Jacobian-based penalties to maintain reconstruction fidelity and local injectivity.
  • Empirical results on CelebA, CIFAR-10, and MNIST show improved reconstruction and sample FIDs compared to standard VAEs and β-VAEs.

The Injective Probability Flow Regularized Autoencoder (RAE) is a generative modeling framework that bridges injective probability flow models and regularized autoencoders, providing tractable likelihood estimation and superior sample quality without the architectural and computational constraints of bijective flow-based generative models. Unlike invertible normalizing flows that require dimensionality preservation between latent and data space, this approach only requires injectivity of the generator (decoder), unlocking efficient modeling of high-dimensional data manifolds with lower-dimensional latent spaces and explicit regularization directly tied to probability flow objectives (Kumar et al., 2020).

1. Model Structure and Injective Mapping

The foundational principle of the Injective Probability Flow RAE is the use of a smooth, differentiable, and one-to-one map gθ:ZXg_\theta: Z \to X, where ZRdZ \subset \mathbb{R}^d is the low-dimensional latent space, and XRDX \subset \mathbb{R}^D is the high-dimensional data space with DdD \gg d. Injectivity ensures every xgθ(Z)x \in g_\theta(Z) has a unique zz such that x=gθ(z)x = g_\theta(z). The encoder hϕ:XZh_\phi : X \to Z acts as a left-inverse on the image, enabling unique inference for any xgθ(Z)x \in g_\theta(Z). This construction pushes forward a Gaussian prior pz(z)=N(z;0,σ2I)p_z(z) = \mathcal{N}(z; 0, \sigma^2 I) through gθg_\theta to define the induced density px(x)p_x(x) on the data manifold.

2. Likelihood Formulation and Relaxed Probability Flow

The probability flow perspective establishes the data probability density via a change-of-variables adapted to injective mappings. For x=gθ(z)x = g_\theta(z), the pushforward density on gθ(Z)g_\theta(Z) is given by

px(x)=pz(z)det[Jg(z)TJg(z)]p_x(x) = \frac{p_z(z)}{\sqrt{\det[J_g(z)^T J_g(z)]}}

where Jg(z)J_g(z) is the Jacobian of gθg_\theta at zz. The log-likelihood can be expressed as

lnpx(x)=lnpz(hϕ(x))12lndet[Jg(hϕ(x))TJg(hϕ(x))]\ln p_x(x) = \ln p_z(h_\phi(x)) - \frac{1}{2} \ln \det[J_g(h_\phi(x))^T J_g(h_\phi(x))]

Computing the log-determinant is computationally expensive; to address this, a concavity-based lower bound is employed involving the Frobenius norm of the Jacobian, facilitating a tractable estimation. The core tight bound is

lnpx(x)lnpz(hϕ(x))d2ln(1dJg(hϕ(x))F2)\ln p_x(x) \geq \ln p_z(h_\phi(x)) - \frac{d}{2} \ln \left( \frac{1}{d} \lVert J_g(h_\phi(x)) \rVert_F^2 \right)

and, equivalently, another lower bound parameterized by a fixed μ\mu:

lnpx(x)lnpz(hϕ(x))12μJg(hϕ(x))F2d2lnμ+d2\ln p_x(x) \geq \ln p_z(h_\phi(x)) - \frac{1}{2\mu} \lVert J_g(h_\phi(x)) \rVert_F^2 - \frac{d}{2} \ln \mu + \frac{d}{2}

3. Autoencoder-Style Objective and Regularization

Translating the likelihood lower bound into a learnable objective yields a regularized autoencoder formulation,

L(θ,ϕ;x)=12σ2hϕ(x)2+μxgθ(hϕ(x))2+d2ln[max(Jg(hϕ(x))v2,η2v2)]+μin[Jg(hϕ(x))vvη]2\mathcal{L}(\theta, \phi; x) = \frac{1}{2\sigma^2} \lVert h_\phi(x) \rVert^2 + \mu \lVert x - g_\theta(h_\phi(x)) \rVert^2 + \frac{d}{2} \ln \left[ \max \left( \lVert J_g(h_\phi(x))v \rVert^2, \eta^2 \lVert v \rVert^2 \right) \right ] + \mu_{in} \left[ \frac{\lVert J_g(h_\phi(x))v \rVert}{\lVert v \rVert} - \eta \right ]_{-}^2

with vN(0,Id)v \sim \mathcal{N}(0, I_d). This formulation includes:

  • Prior-norm penalty: Encouraging latent codes close to the Gaussian prior.
  • Reconstruction loss: Penalizing deviation between the data and its reconstruction.
  • Jacobian-smoothness loss: Penalty derived from the log-Frobenius bound, favoring smooth gθg_\theta.
  • Injectivity penalty: Enforces a lower bound η>0\eta > 0 on the smallest singular value of the Jacobian, ensuring local injectivity.

The injection of a soft penalty for reconstruction (as opposed to a hard constraint) is critical for stable optimization.

4. Algorithmic Implementation

Both encoder hϕh_\phi and decoder gθg_\theta are constructed using five-layer convolutional or transposed convolutional neural networks, with ELU activations and batch normalization. Latent dimension settings include d=32d=32 for MNIST and d=128d=128 for CIFAR-10 and CelebA. Training uses Adam optimization with an initial learning rate of 1×1031 \times 10^{-3}, batch size 128, and 100k training steps. The regularization weights μ\mu and μin\mu_{in} start at 1 and increase progressively, with typical σ2{104,,1}\sigma^2 \in \{10^{-4}, \dotsc, 1\}, λ=1\lambda = 1, and injectivity threshold η=0.1\eta = 0.1. Jacobian-penalty terms are estimated using single Hutchinson samples per example and either automatic-differentiation-based Jacobian-vector products or finite-difference approximations.

5. Empirical Evaluation and Quantitative Performance

Evaluation is conducted by fitting either a single Gaussian or a 10-component Gaussian mixture model (GMM) in latent space post-training, then decoding to produce samples for Fréchet Inception Distance (FID) measurement. On CelebA, the injective flow model ("InjFlow") achieves the lowest reconstruction FID (28.5) and the lowest sample FID with a GMM sampler (40.6), outperforming VAE, β-VAE, and regularized AEs:

Model Recon FID Samples (GMM)
VAE 62.4 67.8
β-VAE 30.1 42.8
AE 30.2 43.5
AE+SN 31.2 43.3
InjFlow 28.5 40.6

Comparable improvements are reported on CIFAR-10, while on MNIST, InjFlow is competitive with β-VAE. Qualitatively, InjFlow reconstructions consistently preserve fine-grained details (e.g., hair structure in CelebA) superior to VAEs, which exhibit significant blurring. Random samples from InjFlow display enhanced sharpness but in some cases show artifacts attributable to the injectivity constraint.

6. Significance and Theoretical Perspective

By relaxing the generative mapping requirement from bijectivity to injectivity, the model circumvents the dimensionality limitations and high computational costs of fully invertible flow-based models, while maintaining a closed-form, tractable probability density and efficient training on high-dimensional data. The regularization terms are not ad hoc but are rigorously derived from principled lower bounding of the probability flow objective. The approach synthesizes the best attributes of VAEs (dimensionality reduction, tractable likelihood), autoencoders (reconstruction-driven regularization), and flows (explicit density modeling), offering a unified and scalable solution for generative learning in the underdetermined regime (dDd \ll D) (Kumar et al., 2020).

A plausible implication is that this framework opens avenues for further research in generative modeling on complex manifolds, where strict invertibility is neither feasible nor desirable, and where explicit regularization of the generative map's geometry is critical for both sample quality and tractable likelihood estimation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Injective Probability Flow Regularized Autoencoder (RAE).