Injective Probability Flow Regularized Autoencoder
- The paper introduces a framework blending injective probability flows with autoencoder regularization, enabling tractable likelihood estimation and high sample quality.
- The methodology employs a smooth injective decoder, a left-inverse encoder, and Jacobian-based penalties to maintain reconstruction fidelity and local injectivity.
- Empirical results on CelebA, CIFAR-10, and MNIST show improved reconstruction and sample FIDs compared to standard VAEs and β-VAEs.
The Injective Probability Flow Regularized Autoencoder (RAE) is a generative modeling framework that bridges injective probability flow models and regularized autoencoders, providing tractable likelihood estimation and superior sample quality without the architectural and computational constraints of bijective flow-based generative models. Unlike invertible normalizing flows that require dimensionality preservation between latent and data space, this approach only requires injectivity of the generator (decoder), unlocking efficient modeling of high-dimensional data manifolds with lower-dimensional latent spaces and explicit regularization directly tied to probability flow objectives (Kumar et al., 2020).
1. Model Structure and Injective Mapping
The foundational principle of the Injective Probability Flow RAE is the use of a smooth, differentiable, and one-to-one map , where is the low-dimensional latent space, and is the high-dimensional data space with . Injectivity ensures every has a unique such that . The encoder acts as a left-inverse on the image, enabling unique inference for any . This construction pushes forward a Gaussian prior through to define the induced density on the data manifold.
2. Likelihood Formulation and Relaxed Probability Flow
The probability flow perspective establishes the data probability density via a change-of-variables adapted to injective mappings. For , the pushforward density on is given by
where is the Jacobian of at . The log-likelihood can be expressed as
Computing the log-determinant is computationally expensive; to address this, a concavity-based lower bound is employed involving the Frobenius norm of the Jacobian, facilitating a tractable estimation. The core tight bound is
and, equivalently, another lower bound parameterized by a fixed :
3. Autoencoder-Style Objective and Regularization
Translating the likelihood lower bound into a learnable objective yields a regularized autoencoder formulation,
with . This formulation includes:
- Prior-norm penalty: Encouraging latent codes close to the Gaussian prior.
- Reconstruction loss: Penalizing deviation between the data and its reconstruction.
- Jacobian-smoothness loss: Penalty derived from the log-Frobenius bound, favoring smooth .
- Injectivity penalty: Enforces a lower bound on the smallest singular value of the Jacobian, ensuring local injectivity.
The injection of a soft penalty for reconstruction (as opposed to a hard constraint) is critical for stable optimization.
4. Algorithmic Implementation
Both encoder and decoder are constructed using five-layer convolutional or transposed convolutional neural networks, with ELU activations and batch normalization. Latent dimension settings include for MNIST and for CIFAR-10 and CelebA. Training uses Adam optimization with an initial learning rate of , batch size 128, and 100k training steps. The regularization weights and start at 1 and increase progressively, with typical , , and injectivity threshold . Jacobian-penalty terms are estimated using single Hutchinson samples per example and either automatic-differentiation-based Jacobian-vector products or finite-difference approximations.
5. Empirical Evaluation and Quantitative Performance
Evaluation is conducted by fitting either a single Gaussian or a 10-component Gaussian mixture model (GMM) in latent space post-training, then decoding to produce samples for Fréchet Inception Distance (FID) measurement. On CelebA, the injective flow model ("InjFlow") achieves the lowest reconstruction FID (28.5) and the lowest sample FID with a GMM sampler (40.6), outperforming VAE, β-VAE, and regularized AEs:
| Model | Recon FID | Samples (GMM) |
|---|---|---|
| VAE | 62.4 | 67.8 |
| β-VAE | 30.1 | 42.8 |
| AE | 30.2 | 43.5 |
| AE+SN | 31.2 | 43.3 |
| InjFlow | 28.5 | 40.6 |
Comparable improvements are reported on CIFAR-10, while on MNIST, InjFlow is competitive with β-VAE. Qualitatively, InjFlow reconstructions consistently preserve fine-grained details (e.g., hair structure in CelebA) superior to VAEs, which exhibit significant blurring. Random samples from InjFlow display enhanced sharpness but in some cases show artifacts attributable to the injectivity constraint.
6. Significance and Theoretical Perspective
By relaxing the generative mapping requirement from bijectivity to injectivity, the model circumvents the dimensionality limitations and high computational costs of fully invertible flow-based models, while maintaining a closed-form, tractable probability density and efficient training on high-dimensional data. The regularization terms are not ad hoc but are rigorously derived from principled lower bounding of the probability flow objective. The approach synthesizes the best attributes of VAEs (dimensionality reduction, tractable likelihood), autoencoders (reconstruction-driven regularization), and flows (explicit density modeling), offering a unified and scalable solution for generative learning in the underdetermined regime () (Kumar et al., 2020).
A plausible implication is that this framework opens avenues for further research in generative modeling on complex manifolds, where strict invertibility is neither feasible nor desirable, and where explicit regularization of the generative map's geometry is critical for both sample quality and tractable likelihood estimation.