Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Batch and match: black-box variational inference with a score-based divergence (2402.14758v2)

Published 22 Feb 2024 in stat.ML, cs.AI, cs.LG, and stat.CO

Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. H. Asi and J. C. Duchi. Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity. SIAM Journal on Optimization, 29(3):2257–2290, 2019.
  2. Minimum Stein discrepancy estimators. Advances in Neural Information Processing Systems, 32, 2019.
  3. Pyro: Deep universal probabilistic programming. The Journal of Machine Learning Research, 20(1):973–978, 2019.
  4. Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
  5. JAX: composable transformations of Python+NumPy programs, 2018.
  6. Stan: A probabilistic programming language. Journal of Statistical Software, 76(1):1–32, 2017.
  7. S. Chrétien and A. O. Hero. Kullback proximal algorithms for maximum-likelihood estimation. IEEE Transactions on Information Theory, 46(5):1800–1810, 2000.
  8. Provable Bayesian inference via particle mirror descent. In Artificial Intelligence and Statistics, pages 985–994. PMLR, 2016.
  9. D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions. SIAM J. Optim., 29(1):207–239, 2019.
  10. Robust, accurate stochastic optimization for variational inference. Advances in Neural Information Processing Systems, 33:10961–10973, 2020.
  11. Challenges and opportunities in high dimensional variational inference. Advances in Neural Information Processing Systems, 34:7787–7798, 2021.
  12. G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods, 2023.
  13. J. Gorham and L. Mackey. Measuring sample quality with Stein’s method. Advances in Neural Information Processing Systems, 28, 2015.
  14. A. Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
  15. C. Jones and A. Pewsey. Sinh-arcsinh distributions. Biometrika, 96(4):761–780, 2009.
  16. C. Jones and A. Pewsey. The sinh-arcsinh normal distribution. Significance, 16(2):6–7, 2019.
  17. An introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.
  18. Kullback-Leibler proximal variational inference. In Advances in Neural Information Processing Systems, 2015.
  19. Faster stochastic variational inference using proximal-gradient methods with general divergence functions. In Conference on Uncertainty in Artificial Intelligence, 2016.
  20. D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
  21. A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  22. Automatic differentiation variational inference. Journal of Machine Learning Research, 2017.
  23. V. Kučera. On nonnegative definite solutions to matrix quadratic equations. Automatica, 8(4):413–423, 1972a.
  24. V. Kučera. A contribution to matrix quadratic equations. IEEE Transactions on Automatic Control, 17(3):344–347, 1972b.
  25. A kernelized Stein discrepancy for goodness-of-fit tests. In International Conference on Machine Learning, pages 276–284. PMLR, 2016.
  26. posteriordb: a set of posteriors for Bayesian inference and probabilistic programming. https://github.com/stan-dev/posteriordb, 2022.
  27. Variational inference with Gaussian score matching. In Advances in Neural Information Processing Systems, 2023.
  28. A. Nemirovskii and D. B. Yudin. Problem complexity and method efficiency in optimization. John Wiley and Sons, 1983.
  29. J. E. Potter. Matrix quadratic solutions. SIAM Journal of Applied Mathematics, 14(3):496–501, 1966.
  30. Y. Qiao and N. Minematsu. A study on invariance of f𝑓fitalic_f-divergence and its application to speech recognition. IEEE Transactions on Signal Processing, 58(7):3884–3890, 2010.
  31. Black box variational inference. In Artificial Intelligence and Statistics, pages 814–822. PMLR, 2014.
  32. Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pages 1278–1286. PMLR, 2014.
  33. BridgeStan: Efficient in-memory access to Stan programs through Python, Julia, and R. https://github.com/roualdes/bridgestan, 2023.
  34. Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2:e55, 2016.
  35. Quadratic matrix equations. The Ohio Journal of Science, 74(5), 1974.
  36. Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
  37. L. Theis and M. Hoffman. A trust-region method for stochastic variational inference with applications to streaming data. In International Conference on Machine Learning, pages 2503–2511. PMLR, 2015.
  38. J. M. Tomczak. Deep Generative Modeling. Springer, 2022.
  39. P. Tseng. An analysis of the EM algorithm and entropy-like proximal point methods. Mathematics of Operations Research, 29(1):27–44, 2004.
  40. Robust, automated, and accurate black-box variational inference. arXiv preprint arXiv:2203.15945, 2022.
  41. L. Yu and C. Zhang. Semi-implicit variational inference via score matching. In The Eleventh International Conference on Learning Representations, 2023.
  42. The solutions to the quadratic matrix equation X*AX+B*X+D=0. Applied Mathematics and Computation, 410:126463, 2021.
  43. Variational Hamiltonian Monte Carlo via score matching. Bayesian Analysis, 13(2):485, 2018.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Diana Cai (15 papers)
  2. Chirag Modi (54 papers)
  3. Loucas Pillaud-Vivien (19 papers)
  4. Charles C. Margossian (20 papers)
  5. Robert M. Gower (41 papers)
  6. David M. Blei (110 papers)
  7. Lawrence K. Saul (9 papers)
Citations (6)

Summary

  • The paper introduces the Batch and Match framework that employs a score-based divergence to achieve closed-form updates for Gaussian variational families.
  • It demonstrates exponential convergence toward the target distribution, reducing gradient variance and speeding up inference compared to traditional VI methods.
  • Empirical evaluations reveal that BaM outperforms conventional approaches under various conditions, highlighting its robustness and potential for broader applications.

Exploring Score-based Divergence in Variational Inference: The Batch and Match Approach

Introduction to Score-based Variational Inference

Variational inference (VI) is a widespread method for approximating complex probabilistic models, especially in the context of posterior inference within Bayesian frameworks. However, traditional VI approaches, particularly those optimizing the Kullback-Leibler (KL) divergence, are known for their slow convergence and the high variance of gradient estimates, which can significantly hinder their performance. To address these limitations, we introduce "Batch and Match" (BaM), a novel framework for Black-Box Variational Inference (BBVI) that operates on a score-based divergence metric. Unlike conventional methods, BaM enables closed-form proximal updates for Gaussian variational families with full covariance matrices and demonstrates theoretical and empirical superiority in terms of convergence rate and accuracy.

Theoretical Foundations: BaM Algorithm

BaM diverges from the traditional VI objectives by leveraging a score-based divergence, which measures the agreement in the gradients of the log densities (scores) between the target and variational distributions. This score-based divergence has certain properties – specifically, non-negativity, equality, and affine invariance – that make it particularly suited for the task of variational inference while allowing for an invariant measure of similarity under affine transformations of the input.

Central to the BaM algorithm are two alternating steps: a "batch" step that estimates divergence using a batch of samples from the approximation to the target, and a "match" step that updates the variational approximation to match the scores at these samples. This iterative process converges toward a distribution that minimizes the score-based divergence from the target.

Convergence Analysis for Gaussian Targets

When applied to Gaussian target distributions, BaM's iterative process shows exponential convergence to the target mean and covariance in the limit of infinite batch size. This remarkable property is theoretically supported for all fixed levels of regularization across all initialization points. Such strong convergence guarantees, even in the simplified setting of Gaussian targets, provide a solid foundation for the method's empirical performance.

Empirical Evaluation

Empirically, BaM is tested against leading BBVI methods under various conditions, including Gaussian and non-Gaussian targets, which arise in hierarchical and deep generative models. Unlike traditional approaches that often struggle with high-dimensional problems and exhibit sensitivity to learning rates, BaM shows faster convergence and higher accuracy. These benefits are particularly prominent with larger batch sizes, showcasing BaM's robustness to initialization and regularization.

Future Directions

Despite its advantages, the application of BaM to non-Gaussian variational families and the analysis of its convergence in the finite-batch scenario remain open for exploration. Furthermore, extending the scope of score-based divergence beyond VI to other domains like goodness-of-fit testing could yield interesting insights due to its affine-invariance property.

Conclusion

The "Batch and Match" approach presents a significant step forward in the field of variational inference, addressing many of the shortcomings of existing methods. By centering on a score-based divergence and enabling efficient, closed-form updates, BaM not only speeds up convergence but also broadens the applicability of VI to more complex distributions. Its theoretical underpinning and empirical success lay the groundwork for further advancements in score-based methods for probabilistic modeling, with the potential to enhance a wide range of applications in statistics and machine learning.