Automatic Variational Inference in Stan (1506.03431v2)

Published 10 Jun 2015 in stat.ML

Abstract: Variational inference is a scalable technique for approximate Bayesian inference. Deriving variational inference algorithms requires tedious model-specific calculations; this makes it difficult to automate. We propose an automatic variational inference algorithm, automatic differentiation variational inference (ADVI). The user only provides a Bayesian model and a dataset; nothing else. We make no conjugacy assumptions and support a broad class of models. The algorithm automatically determines an appropriate variational family and optimizes the variational objective. We implement ADVI in Stan (code available now), a probabilistic programming framework. We compare ADVI to MCMC sampling across hierarchical generalized linear models, nonconjugate matrix factorization, and a mixture model. We train the mixture model on a quarter million images. With ADVI we can use variational inference on any model we write in Stan.

Citations (226)

View on Semantic Scholar

Summary

The paper introduces ADVI, automating variational inference by transforming latent variables to a real-coordinate space and leveraging automatic differentiation.
The paper demonstrates ADVI's efficiency, achieving faster convergence and scalability compared to traditional MCMC methods in diverse Bayesian models.
The paper highlights ADVI's broad applicability across non-conjugate models, simplifying inference and fostering innovative Bayesian analysis.

An Overview of "Automatic Variational Inference in Stan"

The paper "Automatic Variational Inference in Stan" by Alp Kucukelbir et al. presents a detailed methodology for automating the process of variational inference (VI) in Bayesian models using Stan, a probabilistic programming language. The authors introduce Automatic Differentiation Variational Inference (ADVI), a novel algorithmic framework designed to simplify the application of variational inference by leveraging automatic differentiation within Stan.

Core Contributions and Methodology

Variational inference, a technique widely recognized for its scalability and efficiency in approximate Bayesian inference, typically involves laborious and model-specific derivations. The authors address this challenge by proposing ADVI, which automates the selection of a suitable variational family and optimization of the variational objective, abstracting away the intricacies of manual derivation.

Model Support and Generality:

ADVI is crafted to operate without conjugacy assumptions, extending its utility across a diverse range of models. Specifically, it supports differentiable probability models, which are mathematical frameworks where latent variables have continuous support and where the gradient of the log-joint density with respect to these variables is well-defined.

Transformation Approach:

Central to ADVI is the transformation of latent variables to a real-coordinate space, followed by the assumption of a mean-field Gaussian variational distribution. This transformation facilitates an implicit non-Gaussian approximation in the original latent variable space, allowing the framework to respect the support constraints of the Bayesian posterior.

The automatic differentiation capabilities of Stan are crucial in ADVI, enabling stochastic gradient ascent for the optimization of the Evidence Lower Bound (ELBO). The use of Monte Carlo (MC) integration offers unbiased estimates for gradient evaluations, promoting both accuracy and computational tractability.

Empirical Evaluation

The paper provides a rigorous empirical evaluation of ADVI's performance across various Bayesian models, including hierarchical regression and matrix factorization tasks. In these tasks, ADVI demonstrates significant speed advantages over traditional MCMC methods, such as Hamiltonian Monte Carlo (HMC) and No-U-Turn Sampler (NUTS), while maintaining competitive predictive accuracy.

Performance Highlights:

In hierarchical generalized linear models, ADVI achieves faster convergence compared to NUTS and exhibits scalability advantages when applied to large datasets.
ADVI is applied effectively to non-conjugate models like the Gaussian mixture model with a dataset of 250,000 images, outperforming traditional samplers in computational efficiency.

Implications and Future Work

ADVI represents a meaningful advancement in probabilistic programming by democratizing access to scalable VI methodologies. It has profound implications for the design and exploration of complex Bayesian models, removing traditional barriers associated with model-specific derivations. This capability allows practitioners to focus on model innovation rather than inference derivations.

As the landscape of probabilistic programming and Bayesian analysis continues to evolve, it is plausible to expect further enhancements in automated inference techniques. Future research might explore even broader classes of models, extend support to discrete parameter spaces, and improve the handling of complex, multimodal posteriors.

Overall, ADVI positions itself as a versatile and efficient tool, cementing the role of automated inference in advancing applied Bayesian statistics and machine learning. The integration with Stan underscores its potential as a foundational element in the broader toolkit for probabilistic modeling.

PDF Markdown