Preventing Posterior Collapse with delta-VAEs (1901.03416v1)

Published 10 Jan 2019 in cs.LG and stat.ML

Abstract: Due to the phenomenon of "posterior collapse," current latent variable generative models pose a challenging design choice that either weakens the capacity of the decoder or requires augmenting the objective so it does not only maximize the likelihood of the data. In this paper, we propose an alternative that utilizes the most powerful generative models as decoders, whilst optimising the variational lower bound all while ensuring that the latent variables preserve and encode useful information. Our proposed $\delta$-VAEs achieve this by constraining the variational family for the posterior to have a minimum distance to the prior. For sequential latent variable models, our approach resembles the classic representation learning approach of slow feature analysis. We demonstrate the efficacy of our approach at modeling text on LM1B and modeling images: learning representations, improving sample quality, and achieving state of the art log-likelihood on CIFAR-10 and ImageNet $32\times 32$.

Authors (4)

Ali Razavi (13 papers)
Ben Poole (46 papers)
Oriol Vinyals (116 papers)
Aäron van den Oord (14 papers)

Citations (166)

View on Semantic Scholar

Summary

The paper presents δ-VAEs that enforce a committed KL divergence to ensure latent variables remain informative.
It utilizes constrained variational families and sequential latent variables to maintain positive KL divergence even with powerful autoregressive decoders.
Extensive experiments show that δ-VAEs achieve state-of-the-art log-likelihoods on benchmarks like CIFAR-10 and ImageNet without compromising model performance.

Preventing Posterior Collapse with $\delta$ -VAEs

The paper entitled "Preventing posterior collapse with $\delta$ -VAEs" addresses the critical challenge of posterior collapse in deep latent variable generative models. Posterior collapse occurs when the variational autoencoder (VAE) models with powerful decoders learn to ignore latent variables, leading to a trivial posterior which converges to the prior distribution. This paper introduces a novel approach called $\delta$ -VAEs to tackle this issue effectively.

Summary of Research

The $\delta$ -VAE framework is designed to maximize the variational lower bound (ELBO) while ensuring that latent variables encode useful information by avoiding posterior collapse. The authors propose to enforce a minimum Kullback-Leibler (KL) divergence, denoted as $\delta$ , between the posterior and the prior, thereby securing a committed rate of information transmission.

The implementation preserves the standard training objective and takes advantage of modern autoregressive decoders without modifying the evidence lower bound (ELBO) or compromising on decoder complexity. A critical insight shared in the paper is that by constraining the variational family, the model ensures that the KL divergence remains positive, hence making the latent variables informative.

Methodology

Constrained Variational Families: The proposed $\delta$ -VAE employs variational families where the KL divergence is strictly bounded below by a committed rate $\delta$ . This is achieved using distributions with mismatched correlation structures, such as autoregressive (AR(1)) processes for prior and independent Gaussians for posterior distributions.
Sequential Latent Variables: For modeling sequential data like text and images, the authors incorporate a sequential latent variable approach. By adopting anti-causal encoder architecture, they provide a bias towards encoding future information instead of past data, which autoregressive decoders are already adept at managing.
Auxiliary Priors: To mitigate the mismatch between the prior and the aggregated posterior, an auxiliary prior is employed that assimilates training data while closely matching the aggregate posterior, without affecting the standard VAE training objective.

Experimental Results

Empirical results demonstrate that $\delta$ -VAEs achieve state-of-the-art log-likelihoods on benchmarks like CIFAR-10 and ImageNet $32\times32$ without sacrificing model performance on density modeling. Additionally, they show robust representation learning capabilities as verified by downstream tasks like classification.

The experimental setup also includes ablation studies comparing $\delta$ -VAEs to other competitive approaches such as $\beta$ -VAEs and free-bits, revealing that $\delta$ -VAEs consistently maintain positive KL divergence, thereby preventing posterior collapse effectively.

Implications and Future Directions

The theoretical and practical implications of this research are manifold. Practically, $\delta$ -VAEs allow combining powerful autoregressive decoders with inferential latent variable modeling, enabling the generation of high-fidelity samples with meaningful latent encodings. Theoretically, the introduction of the committed rate provides a simple and effective quantitative measure to ensure latent variable utility across diverse VAE applications.

Future work could explore alternative variational families and constraints to further enhance the representational capacity and efficiency of $\delta$ -VAEs. Moreover, integrating these approaches into larger architectures with greater complexity, like those used in generative adversarial settings, could yield further benefits.

In conclusion, the paper provides a significant step towards resolving the posterior collapse issue, aligning with both theoretical expectations and empirical robustness in performance, thereby making $\delta$ -VAEs a potential standard approach in the training of VAEs with powerful decoders.

PDF Markdown