Boosting Variational Inference (1611.05559v2)

Published 17 Nov 2016 in stat.ML and cs.LG

Abstract: Variational inference (VI) provides fast approximations of a Bayesian posterior in part because it formulates posterior approximation as an optimization problem: to find the closest distribution to the exact posterior over some family of distributions. For practical reasons, the family of distributions in VI is usually constrained so that it does not include the exact posterior, even as a limit point. Thus, no matter how long VI is run, the resulting approximation will not approach the exact posterior. We propose to instead consider a more flexible approximating family consisting of all possible finite mixtures of a parametric base distribution (e.g., Gaussian). For efficient inference, we borrow ideas from gradient boosting to develop an algorithm we call boosting variational inference (BVI). BVI iteratively improves the current approximation by mixing it with a new component from the base distribution family and thereby yields progressively more accurate posterior approximations as more computing time is spent. Unlike a number of common VI variants including mean-field VI, BVI is able to capture multimodality, general posterior covariance, and nonstandard posterior shapes.

Citations (70)

View on Semantic Scholar

Summary

The paper introduces Boosting Variational Inference (BVI), which enhances traditional variational inference by modeling complex, multimodal posteriors with finite mixtures.
BVI employs iterative gradient boosting to refine the posterior approximation, significantly reducing KL divergence and capturing covariance structures.
Empirical results demonstrate that BVI outperforms Mean-Field VI in synthetic and real-world applications, offering robust theoretical guarantees and practical benefits.

An Overview of Boosting Variational Inference

The paper "Boosting Variational Inference" presents an innovative approach to enhancing the flexibility of variational inference (VI) through the development of Boosting Variational Inference (BVI). Variational inference has grown in prominence due to its efficacy in rapidly approximating posterior distributions in Bayesian models. However, the inherent limitations of traditional VI, particularly its inability to approximate multimodal and nonstandard posterior distributions accurately, prompt the need for approaches such as BVI.

Key Contributions

The principal contribution of this paper lies in its introduction of BVI, which approaches posterior approximation using a family of finite mixtures of parametric base distributions, such as Gaussian distributions. This is in contrast to standard VI approaches, like Mean-Field Variational Inference (MFVI), which assume that the approximating distribution factorizes across parameters, thus failing to capture multimodality and often underestimating posterior covariance. Crucially, BVI leverages gradient boosting techniques to provide an increasingly better approximation of the target posterior over time.

Methodological Innovation

The algorithm iteratively refines the approximation by adding new components to the mixture model, akin to how boosting in machine learning iteratively adds weak learners to improve model accuracy. BVI's ability to trade computational time for enhanced statistical accuracy is a key feature, allowing researchers to dynamically manage computational resources against the need for accuracy. The authors demonstrate that BVI can reduce excess discrepancies like the KL divergence effectively.

Theoretical and Empirical Validation

The paper rigorously details the theoretical properties that support the consistency of BVI. It begins by formulating the optimization problem and introducing the key concept of a flexible family of distributions. BVI then uses a Laplacian Gradient Boosting approach, guided by KL divergence, to iteratively improve the posterior approximation. The empirical results, conducted through toy examples and real-world applications such as sensor network localization and Bayesian logistic regression, indicate significant improvements over traditional methods, particularly in capturing complex, multimodal distributions.

Implications and Future Prospects

From a practical perspective, BVI's strength lies in its enhanced flexibility, which accommodates a diverse set of posterior shapes while mitigating the biases inherent in MFVI. Theoretically, the promising results indicate that utilizing finite mixtures might be a step toward more robust theoretical guarantees in VI.

Looking forward, integrating BVI with areas requiring highly accurate posterior estimation, like deep probabilistic models or Bayesian neural networks, could prove invaluable. Further work might involve improving computational efficiency in high-dimensional contexts and exploring extensions of boosting paradigms to other approximate inference methods.

In conclusion, by addressing the core limitations of existing VI methods, the paper contributes a substantial advancement to the field of Bayesian inference, providing researchers with a potent tool for more accurately approximating complex posterior distributions.

PDF Markdown

Related Papers

Tweets

https://twitter.com/alexxthiery/status/1859300648220754372