Semi-Amortized Variational Autoencoders (1802.02550v7)

Published 7 Feb 2018 in stat.ML, cs.CL, and cs.LG

Abstract: Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them. Crucially, the local SVI procedure is itself differentiable, so the inference network and generative model can be trained end-to-end with gradient-based optimization. This semi-amortized approach enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets.

Citations (238)

View on Semantic Scholar

Summary

The paper introduces a semi-amortized approach that refines variational parameters using both global and local inference methods.
It employs differentiable optimization through SVI updates to mitigate amortization gaps and prevent posterior collapse.
Empirical results show improved perplexity and log-likelihoods on text and image tasks compared to standard AVI and SVI baselines.

Overview of Semi-Amortized Variational Autoencoders

The paper "Semi-Amortized Variational Autoencoders" addresses critical challenges inherent in amortized variational inference (AVI) systems, particularly as they relate to the training of deep generative models such as Variational Autoencoders (VAEs). The primary focus of the paper is to mitigate the drawbacks posed by AVI's use of global inference networks, which are known to produce suboptimal variational parameters causing significant amortization gaps. Instead of replacing instance-specific local inference entirely with a global inference network, this research introduces a hybrid methodology—Semi-Amortized Variational Autoencoders (SA-VAE). This approach strategically utilizes amortized inference to initialize variational parameters and subsequently applies stochastic variational inference (SVI) to further refine these parameters.

The contribution of semi-amortized inference techniques in VAEs extends beyond mere performance gains; it also addresses the notorious issue of posterior collapse during training. Posterior collapse arises when the variational posterior approximates the prior distribution too closely, essentially leading the generative model to disregard the latent variables. By refining inference iteratively, SA-VAE provides a robust solution that allows for the training of complex text and image models—a domain where previous attempts using VAEs often failed due to this collapse.

Technical Contributions and Methodology

The semi-amortized approach stands out due to its integration of contrasting inference strategies: the shortcoming of AVI's speed yet lack of accuracy is counterbalanced by SVI’s precision in optimizing instance-specific distributions at the cost of iterative computational overhead. This research introduces procedures for end-to-end differentiable optimization, enhancing the training process by backpropagating through the SVI update steps.

Significant technical implementations in this work include the deployment of gradient-based optimization techniques that differentiate through SVI refinements, leveraging earlier frameworks established by Domke (2012) and others to maintain differentiated local updates in combination with the global model.

Experimental Results

Empirical evaluations demonstrate the efficacy of the SA-VAE approach over traditional AVI and SVI baselines on tasks within text and image domains. Strong numerical results are reported, indicating that SA-VAE outperforms autoregressive models and exhibits substantial improvements over VAEs combined with SVI but lacking end-to-end integration. For text modeling, SA-VAE surpasses competitive Long Short-Term Memory (LSTM) LLMs by effectively preserving non-trivial latent representations necessary for capturing the variances within text data. Quantitatively, these advances are marked by notable decreases in perplexity and improvements in model log-likelihoods across several tested datasets.

Implications and Future Work

The implications of this work are manifold. Practically, SA-VAE can be immediately applied for tasks requiring sophisticated generative models while ensuring computational feasibility and enhanced model capacity, particularly in dealing with complex, non-conjugate datasets. Theoretically, this paper prompts further investigation into optimizing hybrid inference schemes, specifically exploring the balance between global and local parameter estimation methods. The insight gained here paves the way for developing more generalized inference frameworks that maintain high accuracy without the computational burden typically associated with VAEs using iterative refinement.

Looking ahead, future research can explore the scalability of SA-VAE to other rich generative models and datasets. There also exists the potential for integrating this framework with hierarchical latent variable models and adopting more intricate prior/posterior designs to minimize approximation gaps further.

This paper strengthens the understanding of inference methodologies in variational autoencoders and provides a credible solution for tackling existing limitations associated with amortized inference in deep learning contexts.

PDF Markdown