Stochastic Variational Inference (1206.7051v3)

Published 29 Jun 2012 in stat.ML, cs.AI, stat.CO, and stat.ME

Abstract: We develop stochastic variational inference, a scalable algorithm for approximating posterior distributions. We develop this technique for a large class of probabilistic models and we demonstrate it with two probabilistic topic models, latent Dirichlet allocation and the hierarchical Dirichlet process topic model. Using stochastic variational inference, we analyze several large collections of documents: 300K articles from Nature, 1.8M articles from The New York Times, and 3.8M articles from Wikipedia. Stochastic inference can easily handle data sets of this size and outperforms traditional variational inference, which can only handle a smaller subset. (We also show that the Bayesian nonparametric topic model outperforms its parametric counterpart.) Stochastic variational inference lets us apply complex Bayesian models to massive data sets.

Citations (2,536)

View on Semantic Scholar

Summary

The paper introduces SVI, a scalable algorithm that uses stochastic optimization on mini-batches to update variational parameters.
It modifies traditional variational inference by decomposing updates into manageable computations, enabling applications to models like LDA and HDP.
Empirical results show that SVI converges faster and achieves superior per-word predictive likelihood on extensive datasets.

Stochastic Variational Inference: A Scalable Approach for Approximate Bayesian Inference

The paper "Stochastic Variational Inference" by Hoffman, Blei, Wang, and Paisley introduces an efficient algorithm for variational inference, particularly tailored for handling large-scale data. The central contribution is a method known as Stochastic Variational Inference (SVI), designed to approximate posterior distributions in probabilistic models. This algorithm is particularly useful in scenarios involving extensive datasets, where traditional variational inference methods fall short due to scalability issues.

Overview of the Algorithm

Stochastic Variational Inference modifies the traditional variational inference by incorporating stochastic optimization techniques. The classical variational inference algorithm refines global variational parameters iteratively by considering complete passes over the entire dataset. However, SVI leverages the noisy gradient estimates obtained from subsampled data points, significantly reducing the computational burden for each iteration.

The key idea revolves around the decomposition of the global variational parameters update into manageable computations. By adopting a Robbins-Monro scheme for the gradient updates, SVI ensures convergence under certain conditions on the step sizes. Specifically, at each iteration, SVI:

Samples a subset of data (a mini-batch) from the complete dataset.
Optimizes the local variational parameters for this mini-batch.
Computes an intermediate estimate of the global variational parameters based on the sampled data.
Updates the global parameters as a weighted average of the previous parameters and the intermediate estimates.

Applications to Topic Models

The paper demonstrates the effectiveness of SVI using two well-known probabilistic topic models: Latent Dirichlet Allocation (LDA) and the Hierarchical Dirichlet Process (HDP) topic model.

Latent Dirichlet Allocation (LDA): LDA is a generative model that represents documents as mixtures of topics, where each topic is characterized by a distribution over words. The primary computational challenge lies in inferring the posterior distribution of the topics and the topic proportions for each document given the observed corpus. Using SVI, the paper shows that LDA can scale to corpora containing millions of documents, which were previously infeasible with traditional batch variational inference.
Hierarchical Dirichlet Process (HDP) Topic Model: The HDP model extends LDA to allow for an unbounded number of topics, effectively inferring the appropriate number of topics based on the data. Implementing SVI for the HDP involves managing the complexities of Bayesian nonparametric methods, where the posterior comprises an infinite-dimensional parameter space. The authors illustrate that SVI can handle such infinite settings via truncation strategies, resulting in scalable and efficient posterior inference.

Empirical Evaluation

The empirical section of the paper provides a comprehensive evaluation of SVI on three large datasets: articles from Nature, The New York Times, and Wikipedia. The results highlight the superiority of SVI in terms of per-word predictive log-likelihood compared to batch inference. Notably, SVI not only converges faster but also attains better likelihood scores, showcasing its robustness and efficiency.

Implications and Future Directions

The introduction of SVI opens numerous avenues for practical and theoretical advancements in Bayesian inference and machine learning. Practically, SVI enables the application of complex probabilistic models to massive datasets without requiring extensive computational resources, democratizing access to advanced data analysis techniques.

On the theoretical front, the principles underlying SVI can be extended and refined. For instance, future research might explore:

Non-conjugate Models: Extending SVI to handle non-conjugate priors and more complex hierarchical structures, thereby broadening the applicability of variational methods.
Adaptive Learning Rates: Developing adaptive step-size schedules that dynamically adjust to the estimated gradient's variance, enhancing convergence rates and stability.
Hybrid Approaches: Integrating stochastic variational methods with other inference techniques like Markov chain Monte Carlo (MCMC) to leverage the strengths of both paradigms.

Conclusion

"Stochastic Variational Inference" represents a significant step forward in developing scalable algorithms for Bayesian inference. By effectively combining variational inference with stochastic optimization, Hoffman et al. have provided a powerful tool for analyzing large-scale datasets with complex probabilistic models. This work not only advances the state-of-the-art in variational methods but also lays the groundwork for future research in scalable and efficient Bayesian inference techniques.

PDF Markdown