VAE with a VampPrior (1705.07120v5)

Published 19 May 2017 in cs.LG, cs.AI, and stat.ML

Abstract: Many different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks.

Authors (2)

Jakub M. Tomczak (54 papers)
Max Welling (202 papers)

Citations (602)

View on Semantic Scholar

Summary

The paper introduces VampPrior as a mixture of variational posteriors using pseudo-inputs to enhance latent representations.
It presents a hierarchical VAE architecture that mitigates inactive latent dimensions and refines ELBO regularization.
Empirical results across six datasets demonstrate state-of-the-art performance, validating the model's efficiency and flexibility.

Overview of "VAE with a VampPrior"

The paper "VAE with a VampPrior" by Jakub M. Tomczak and Max Welling presents an enhancement to the Variational Auto-Encoder (VAE) framework by introducing a novel prior distribution called the Variational Mixture of Posteriors (VampPrior). This new prior addresses several limitations of standard VAEs, including over-regularization and unused latent dimensions, by leveraging a mixture distribution conditioned on learnable pseudo-inputs.

Key Contributions

The authors propose several contributions to the deep generative modeling community:

Introduction of VampPrior: A mixture of variational posteriors conditioned on pseudo-inputs, VampPrior enables richer latent representations compared to traditional Gaussian priors.
Hierarchical VAE Architecture: The paper introduces a two-layer hierarchical VAE model using the VampPrior, improving the ability to learn meaningful latent representations and mitigating the issue of inactive units.
Empirical Validation: Through experiments on six datasets, the hierarchical VampPrior-based VAE demonstrates state-of-the-art or comparable to state-of-the-art results in various settings.

Methodology

Regularization and ELBO: The authors refocus on the ELBO's regularization term, highlighting that a poor choice of prior can lead to suboptimal learning. They show how the VampPrior, as a multimodal and data-coupled prior, enhances the learning dynamics.
Hierarchical Model: The two-layer architecture includes multiple stochastic latent variable layers, which helps in effectively capturing complex data distributions.
Pseudo-Inputs: Pseudo-inputs act as hyperparameters, allowing the model to flexibly learn from fewer computational resources by approximating the optimal aggregated posterior.

Experimental Results

Performance: Across datasets like MNIST, OMNIGLOT, and others, the VampPrior-based VAE outperforms models with simple priors. The hierarchical design utilizing the VampPrior addresses inactive unit issues and achieves improved likelihood scores.
Comparison with Other Methods: Even when benchmarked against advanced models like those incorporating normalizing flows or autoregressive decoders, the VampPrior-VAE showcases competitive performance.

Implications and Future Directions

This work has several implications:

Theoretical Insight: The coupling of the prior with the variational posterior aligns with principles akin to Empirical Bayes, allowing the prior to adapt during training.
Broader Applicability: While primarily demonstrated on image data, the proposed methods could extend to other domains like text and audio, where sequence modeling can benefit from hierarchical latent structures.
Enhancement with Other Techniques: Combining the hierarchical VampPrior VAE with other innovations, such as normalizing flows or adversarial training, represents an exciting avenue for research.

In conclusion, this paper provides a substantial advancement in VAE methodology by addressing intrinsic limitations through a novel prior construction and architectural refinement. The VampPrior enriches the latent space representation, opening new possibilities for effective and efficient learning in deep generative models.

PDF Markdown