Avoiding Latent Variable Collapse With Generative Skip Models (1807.04863v2)

Published 12 Jul 2018 in stat.ML, cs.CL, and cs.LG

Abstract: Variational autoencoders learn distributions of high-dimensional data. They model data with a deep latent-variable model and then fit the model by maximizing a lower bound of the log marginal likelihood. VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful. Specifically, the lower bound involves an approximate posterior of the latent variables; this posterior "collapses" when it is set equal to the prior, i.e., when the approximate posterior is independent of the data. While VAEs learn good generative models, latent variable collapse prevents them from learning useful representations. In this paper, we propose a simple new way to avoid latent variable collapse by including skip connections in our generative model; these connections enforce strong links between the latent variables and the likelihood function. We study generative skip models both theoretically and empirically. Theoretically, we prove that skip models increase the mutual information between the observations and the inferred latent variables. Empirically, we study images (MNIST and Omniglot) and text (Yahoo). Compared to existing VAE architectures, we show that generative skip models maintain similar predictive performance but lead to less collapse and provide more meaningful representations of the data.

Authors (4)

Adji B. Dieng (12 papers)
Yoon Kim (92 papers)
Alexander M. Rush (115 papers)
David M. Blei (110 papers)

Citations (169)

View on Semantic Scholar

Summary

Avoiding Latent Variable Collapse with Generative Skip Models

The paper "Avoiding Latent Variable Collapse with Generative Skip Models" addresses a critical challenge in the application of Variational Autoencoders (VAEs): the phenomenon of latent variable collapse. VAEs have gained prominence in unsupervised representation learning due to their capacity to model complex distributions in high-dimensional data. Despite their success, VAEs are susceptible to latent variable collapse, where the approximate posterior distribution of the latent variables becomes independent of the data, essentially collapsing to the prior. This issue impedes the model's ability to learn informative latent representations while still maintaining generative consistency.

The authors propose an innovative solution to mitigate this problem through the introduction of generative skip models (skipVAE). The methodology incorporates skip connections in the generative model of VAEs, which fortifies the dependency between latent variables and the likelihood function. Theoretical analysis within the paper demonstrates that generative skip models enhance the mutual information between the latent variables and the observed data. This is crucial for maintaining model expressiveness and avoiding collapse, particularly when dealing with powerful likelihood models.

Empirical studies conducted on image datasets like MNIST and Omniglot, as well as text data from the Yahoo corpus, validate the theoretical claims. The generative skip models maintain predictive performance while reducing latent variable collapse, as indicated by metrics such as mutual information and active unit counts. Specifically, skipVAE shows increased mutual information and a greater number of active latent dimensions compared to traditional VAEs.

For practical significance, skipVAE can be integrated with existing sophisticated VAE variants like the Semi-Amortized VAE (SAVAE), augmenting their capability to capture meaningful representations of data. This hybrid approach demonstrates superiority over baseline VAEs in avoiding posterior collapse without sacrificing generative fidelity. Additionally, the paper's approach alleviates the traditional trade-off between model complexity and latent representation quality that typically hampers VAEs.

The broader implications of this research extend to various applications of VAEs, including data generation, compression, and representation learning across diverse domains. By addressing the collapse phenomenon, generative skip models enable more effective deployment of VAEs in tasks requiring robust data representations. Future research could further refine this method, possibly exploring its integration with other generative models or its scalability to more complex data types.

In conclusion, the introduction of generative skip models as a mechanism to counteract latent variable collapse marks a significant advancement in the efficacy of VAEs. This paper not only provides a novel approach to a well-known problem but also establishes a potential trend for future developments in AI by clarifying the pathway to more informative and stable representation learning frameworks.

Related Papers

Find Related Papers