Batch and match: black-box variational inference with a score-based divergence (2402.14758v2)
Abstract: Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization.
- H. Asi and J. C. Duchi. Stochastic (approximate) proximal point methods: convergence, optimality, and adaptivity. SIAM Journal on Optimization, 29(3):2257–2290, 2019.
- Minimum Stein discrepancy estimators. Advances in Neural Information Processing Systems, 32, 2019.
- Pyro: Deep universal probabilistic programming. The Journal of Machine Learning Research, 20(1):973–978, 2019.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- JAX: composable transformations of Python+NumPy programs, 2018.
- Stan: A probabilistic programming language. Journal of Statistical Software, 76(1):1–32, 2017.
- S. Chrétien and A. O. Hero. Kullback proximal algorithms for maximum-likelihood estimation. IEEE Transactions on Information Theory, 46(5):1800–1810, 2000.
- Provable Bayesian inference via particle mirror descent. In Artificial Intelligence and Statistics, pages 985–994. PMLR, 2016.
- D. Davis and D. Drusvyatskiy. Stochastic model-based minimization of weakly convex functions. SIAM J. Optim., 29(1):207–239, 2019.
- Robust, accurate stochastic optimization for variational inference. Advances in Neural Information Processing Systems, 33:10961–10973, 2020.
- Challenges and opportunities in high dimensional variational inference. Advances in Neural Information Processing Systems, 34:7787–7798, 2021.
- G. Garrigos and R. M. Gower. Handbook of convergence theorems for (stochastic) gradient methods, 2023.
- J. Gorham and L. Mackey. Measuring sample quality with Stein’s method. Advances in Neural Information Processing Systems, 28, 2015.
- A. Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
- C. Jones and A. Pewsey. Sinh-arcsinh distributions. Biometrika, 96(4):761–780, 2009.
- C. Jones and A. Pewsey. The sinh-arcsinh normal distribution. Significance, 16(2):6–7, 2019.
- An introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.
- Kullback-Leibler proximal variational inference. In Advances in Neural Information Processing Systems, 2015.
- Faster stochastic variational inference using proximal-gradient methods with general divergence functions. In Conference on Uncertainty in Artificial Intelligence, 2016.
- D. P. Kingma and M. Welling. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
- A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
- Automatic differentiation variational inference. Journal of Machine Learning Research, 2017.
- V. Kučera. On nonnegative definite solutions to matrix quadratic equations. Automatica, 8(4):413–423, 1972a.
- V. Kučera. A contribution to matrix quadratic equations. IEEE Transactions on Automatic Control, 17(3):344–347, 1972b.
- A kernelized Stein discrepancy for goodness-of-fit tests. In International Conference on Machine Learning, pages 276–284. PMLR, 2016.
- posteriordb: a set of posteriors for Bayesian inference and probabilistic programming. https://github.com/stan-dev/posteriordb, 2022.
- Variational inference with Gaussian score matching. In Advances in Neural Information Processing Systems, 2023.
- A. Nemirovskii and D. B. Yudin. Problem complexity and method efficiency in optimization. John Wiley and Sons, 1983.
- J. E. Potter. Matrix quadratic solutions. SIAM Journal of Applied Mathematics, 14(3):496–501, 1966.
- Y. Qiao and N. Minematsu. A study on invariance of f𝑓fitalic_f-divergence and its application to speech recognition. IEEE Transactions on Signal Processing, 58(7):3884–3890, 2010.
- Black box variational inference. In Artificial Intelligence and Statistics, pages 814–822. PMLR, 2014.
- Stochastic backpropagation and approximate inference in deep generative models. In International Conference on Machine Learning, pages 1278–1286. PMLR, 2014.
- BridgeStan: Efficient in-memory access to Stan programs through Python, Julia, and R. https://github.com/roualdes/bridgestan, 2023.
- Probabilistic programming in Python using PyMC3. PeerJ Computer Science, 2:e55, 2016.
- Quadratic matrix equations. The Ohio Journal of Science, 74(5), 1974.
- Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32, 2019.
- L. Theis and M. Hoffman. A trust-region method for stochastic variational inference with applications to streaming data. In International Conference on Machine Learning, pages 2503–2511. PMLR, 2015.
- J. M. Tomczak. Deep Generative Modeling. Springer, 2022.
- P. Tseng. An analysis of the EM algorithm and entropy-like proximal point methods. Mathematics of Operations Research, 29(1):27–44, 2004.
- Robust, automated, and accurate black-box variational inference. arXiv preprint arXiv:2203.15945, 2022.
- L. Yu and C. Zhang. Semi-implicit variational inference via score matching. In The Eleventh International Conference on Learning Representations, 2023.
- The solutions to the quadratic matrix equation X*AX+B*X+D=0. Applied Mathematics and Computation, 410:126463, 2021.
- Variational Hamiltonian Monte Carlo via score matching. Bayesian Analysis, 13(2):485, 2018.
- Diana Cai (15 papers)
- Chirag Modi (54 papers)
- Loucas Pillaud-Vivien (19 papers)
- Charles C. Margossian (20 papers)
- Robert M. Gower (41 papers)
- David M. Blei (110 papers)
- Lawrence K. Saul (9 papers)