Frequentist Consistency of Variational Bayes (1705.03439v3)

Published 9 May 2017 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: A key challenge for modern Bayesian statistics is how to perform scalable inference of posterior distributions. To address this challenge, variational Bayes (VB) methods have emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) methods. VB methods tend to be faster while achieving comparable predictive performance. However, there are few theoretical results around VB. In this paper, we establish frequentist consistency and asymptotic normality of VB methods. Specifically, we connect VB methods to point estimates based on variational approximations, called frequentist variational approximations, and we use the connection to prove a variational Bernstein-von Mises theorem. The theorem leverages the theoretical characterizations of frequentist variational approximations to understand asymptotic properties of VB. In summary, we prove that (1) the VB posterior converges to the Kullback-Leibler (KL) minimizer of a normal distribution, centered at the truth and (2) the corresponding variational expectation of the parameter is consistent and asymptotically normal. As applications of the theorem, we derive asymptotic properties of VB posteriors in Bayesian mixture models, Bayesian generalized linear mixed models, and Bayesian stochastic block models. We conduct a simulation study to illustrate these theoretical results.

Citations (200)

View on Semantic Scholar

Summary

The paper establishes the frequentist consistency of VB methods via a variational Bernstein–von Mises theorem.
It connects VB approximations to frequentist point estimates, proving asymptotically normal convergence to true parameters.
Applications to Bayesian models show that VB offers computational efficiency while maintaining reliable inference despite underdispersion.

Insightful Overview of "Frequentist Consistency of Variational Bayes"

Variational Bayes (VB) methods have attracted attention as a scalable alternative to traditional Markov Chain Monte Carlo (MCMC) methods for Bayesian inference, particularly when handling large datasets. While MCMC is known for its robustness and theoretical underpinnings, VB methods promise computational efficiency without substantial loss in prediction accuracy. Despite their empirical success, the theoretical properties of VB methods have been less explored. The paper by Wang and Blei fills this gap by investigating the frequentist properties of VB methods, focusing on their consistency and asymptotic normality.

The paper's core contribution is establishing the frequentist consistency of VB methods, leveraging the framework of frequentist variational approximations. The authors focus on mean-field VB, which approximates the true posterior distribution with a product of independent distributions, simplifying the inference but potentially missing dependencies among latent variables. The authors introduce the concept of frequentist variational estimates (VFEs) and propose a variational Bernstein--von Mises theorem as a theoretical foundation for VB. The theorem demonstrates that the VB posterior converges asymptotically to the true parameter and is normally distributed around it.

Key Theoretical Contributions

Connection to Frequentist Estimation: The paper establishes a link between VB methods and frequentist point estimates derived from variational approximations. This connection is pivotal in proving the frequentist consistency of VB.
Variational Bernstein--von Mises Theorem: This theorem is the paper's centerpiece, showing that the VB posterior converges to the Kullback-Leibler (KL) minimizer of a normal distribution centered at the true parameter. This result positions the VB posterior as a consistent estimator, akin to classical frequentist estimators.
Applications Across Models: The paper demonstrates the applicability of its theoretical results to various Bayesian models, including Bayesian mixture models, generalized linear mixed models (GLMM), and stochastic block models (SBM).
Addressing Underdispersion: A known issue with VB posteriors, particularly in the mean-field setting, is underdispersion. The paper provides theoretical insights into this phenomenon and suggests that more expressive variational families might mitigate this issue.

Simulation Studies

To validate the theoretical results, the authors conduct simulation studies on Poisson GLMMs and latent Dirichlet allocation (LDA). These studies illustrate the convergence properties of VB posteriors and highlight their computational efficiency compared to MCMC methods. The results align with theoretical predictions, showing that VB posteriors are consistent but underdispersed.

Implications and Future Directions

The findings of this paper have significant implications for the practical application of VB methods in Bayesian inference. By establishing that VB methods can offer consistent and asymptotically normal estimates, the paper boosts confidence in using VB despite its approximations. Future work could explore non-parametric settings, assess finite-sample properties, and investigate the impact of local optima in optimization, which are typical with VB methods.

In summary, this paper robustly supports the theoretical validity of VB methods through a frequentist lens, reinforcing their role as a reliable tool in the modern statistical and machine learning arsenal. The results foster a deeper understanding of the trade-offs involved in choosing VB over MCMC, particularly when computational resources are a constraint.

PDF Markdown