Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lagging Inference Networks and Posterior Collapse in Variational Autoencoders (1901.05534v2)

Published 16 Jan 2019 in cs.LG and stat.ML

Abstract: The variational autoencoder (VAE) is a popular combination of deep latent variable model and accompanying variational learning technique. By using a neural inference network to approximate the model's posterior on latent variables, VAEs efficiently parameterize a lower bound on marginal data likelihood that can be optimized directly via gradient methods. In practice, however, VAE training often results in a degenerate local optimum known as "posterior collapse" where the model learns to ignore the latent variable and the approximate posterior mimics the prior. In this paper, we investigate posterior collapse from the perspective of training dynamics. We find that during the initial stages of training the inference network fails to approximate the model's true posterior, which is a moving target. As a result, the model is encouraged to ignore the latent encoding and posterior collapse occurs. Based on this observation, we propose an extremely simple modification to VAE training to reduce inference lag: depending on the model's current mutual information between latent variable and observation, we aggressively optimize the inference network before performing each model update. Despite introducing neither new model components nor significant complexity over basic VAE, our approach is able to avoid the problem of collapse that has plagued a large amount of previous work. Empirically, our approach outperforms strong autoregressive baselines on text and image benchmarks in terms of held-out likelihood, and is competitive with more complex techniques for avoiding collapse while being substantially faster.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Junxian He (66 papers)
  2. Daniel Spokoyny (4 papers)
  3. Graham Neubig (342 papers)
  4. Taylor Berg-Kirkpatrick (106 papers)
Citations (266)

Summary

Analysis of Posterior Collapse in Variational Autoencoders through Modified Training Dynamics

The paper "Lagging Inference Networks and Posterior Collapse in Variational Autoencoders" explores the notable issue of posterior collapse encountered in variational autoencoders (VAEs). The research develops understanding from a training dynamics perspective and introduces a methodology to mitigate this problem efficiently and effectively.

Variational autoencoders have become a staple in unsupervised learning due to their ability to capture the intricacies of data distributions through latent variables. This is achieved by approximating the posterior distribution over these variables using an inference network, which optimizes a lower bound on the marginal data likelihood via gradient descent methods. However, the training of VAEs is often impeded by the phenomenon of posterior collapse, where the learned model disregards the latent variables, leading the approximate posterior to resemble the prior distribution overly closely.

The authors investigate the root of posterior collapse by studying the dynamics of the training process. They hypothesize and demonstrate that the inference network frequently lags in adequately approximating the continually evolving true model posterior during the initial phases of training. This lag encourages the model to ignore latent encodings and results in collapse.

To counteract this, the paper proposes a simple yet innovative modification in VAE training, centered on addressing inference lagging. This involves intensively optimizing the inference network based on its current mutual information between latent variables and observations before each model update. The proposed algorithm doesn't introduce additional model components or complexity, making it an attractive alternative to more complex strategies intended to prevent collapse.

Empirical results underscore the efficacy of the proposed approach. It achieves superior or equal performance to state-of-the-art autoregressive models on benchmark text and image datasets, such as Yahoo, Yelp, and OMNIGLOT, in terms of predictive log-likelihood. Furthermore, the implementation is considerably faster than competing strategies, offering meaningful improvements in training time.

In exploring future implications, the paper highlights potential enhancements in the robustness of VAEs for discrete data modeling. The straightforward method proposed could encourage further exploration of training dynamics and impact research on related latent variable models. By improving the understanding and control over the training process, this research invites further investigations into the fundamental causes of optimization failures in deep learning models, subsequently increasing their applicability across various domains in AI and machine learning.