Tighter Variational Bounds are Not Necessarily Better (1802.04537v3)

Published 13 Feb 2018 in stat.ML and cs.LG

Abstract: We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted auto-encoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.

Citations (192)

View on Semantic Scholar

Summary

The paper shows that tighter ELBOs may impair inference network training by reducing the signal-to-noise ratio of gradient estimates.
Using both theoretical and empirical analyses, it reveals that increasing importance sampling particles (K) can degrade gradient reliability.
The study introduces novel autoencoder variants (PIWAE, MIWAE, CIWAE) to balance generative efficiency with robust inference performance.

Analyzing the Impact of Tighter Variational Bounds in Variational Inference

In the paper entitled "Tighter Variational Bounds are Not Necessarily Better," the authors present a rigorous assessment of the implications of tighter Evidence Lower Bounds (ELBOs) in the context of training deep generative models using variational inference methods. Traditionally, tighter ELBOs are presumed to improve training outcomes by providing closer approximations to the true model evidence. This research challenges that assumption, both theoretically and empirically, demonstrating that tighter bounds may adversely affect the learning efficacy of inference networks due to reduced signal-to-noise ratio (SNR) in the gradient estimates.

Theoretical and Empirical Analysis

The paper develops a detailed theoretical foundation to analyze how the choice of ELBO affects the gradient estimation process, particularly focusing on the signal-to-noise ratio for gradient updates of generative and inference networks. Through the application of stochastic gradient ascent (SGA) and examination of signal-to-noise ratios, the authors show that increasing the number of importance sampling particles, $K$ , while tightening the ELBO bounds, can indeed degrade the signal-to-noise ratio for inference networks. Specifically, for large values of $K$ , the gradient's absolute value diminishes faster than its variance, resulting in higher relative variance and poorer gradient estimates.

Remarkably, while tighter bounds might intuitively benefit the generative model learning—potentially leading to decreasing bias and improving likelihood estimation—their impact on the inference network can be complex. Theoretical insights reveal that, despite the potential improvement in directionality towards reducing variance, the practical utility is often hindered by lower signal-to-noise ratios in gradient estimates. This suggests that distinct objectives might be preferable for generative and inference networks, advocating for methods that can dynamically balance these competing needs.

Novel Algorithms and Their Implications

In response to these findings, the authors introduce three innovative algorithms: Partially Importance Weighted Auto-Encoder (PIWAE), Multiply Importance Weighted Auto-Encoder (MIWAE), and Combination Importance Weighted Auto-Encoder (CIWAE). These methodologies extend existing IWAE frameworks, offering alternative strategies that account for the identified deficiencies in inference network training:

PIWAE employs separate targets for the generative and inference networks, optimizing the IWAE and MIWAE objectives respectively, aiming to enhance the training efficiency and effectiveness of both networks simultaneously.
MIWAE optimizes for multiple importance samples (M>1) to counteract the detrimental effects of larger $K$ , without over-extending computational budgets.
CIWAE uses a convex combination of VAE and IWAE, where the choice of combination parameter, $\beta$ , allows for adaptable trade-offs between the benefits of tighter bounds and reliable gradient estimation.

Empirical evaluations on benchmark datasets, like MNIST, illustrate that these algorithms can surpass the performance of standard IWAE even when assessed using IWAE-targeted metrics. This highlights the adaptability and robustness of these methodologies, advancing our understanding of how objective choices fundamentally affect model training.

Implications for AI Development and Future Work

The paper's conclusions advise caution against indiscriminate adoption of tighter bounds without considering their nuanced impact on inference networks. For AI research, this implies the need for innovative ELBO designs that harmoniously integrate expressiveness and computational practicality, potentially motivating the development of dynamic and context-aware autoencoders. Specifically, advances could explore automated approaches that dynamically adjust ELBO tightness based on real-time assessment of model convergence and performance metrics.

Further, the research provides fertile ground for exploring distinct objectives for inference and generative networks, hinting at a paradigm shift towards tailored inferential strategies that enhance model robustness and adaptability. Such consideration might prove invaluable as AI continues to tackle increasingly complex generative tasks across domains, from natural language processing to computer vision.

In summary, Rainforth et al.'s paper calls into question conventional assumptions regarding ELBOs in variational inference, proposing compelling alternatives through the introduction of novel autoencoder frameworks. As AI models continue to burgeon in complexity, the findings serve as a timely reminder of the criticality in marrying theoretical insights with pragmatic algorithm design, ensuring that both inference and generative models receive the appropriate focus amidst evolving benchmarks.

PDF Markdown