Black Box Variational Inference with a Deterministic Objective: Faster, More Accurate, and Even More Black Box (2304.05527v4)
Abstract: Automatic differentiation variational inference (ADVI) offers fast and easy-to-use posterior approximation in multiple modern probabilistic programming languages. However, its stochastic optimizer lacks clear convergence criteria and requires tuning parameters. Moreover, ADVI inherits the poor posterior uncertainty estimates of mean-field variational Bayes (MFVB). We introduce "deterministic ADVI" (DADVI) to address these issues. DADVI replaces the intractable MFVB objective with a fixed Monte Carlo approximation, a technique known in the stochastic optimization literature as the "sample average approximation" (SAA). By optimizing an approximate but deterministic objective, DADVI can use off-the-shelf second-order optimization, and, unlike standard mean-field ADVI, is amenable to more accurate posterior covariances via linear response (LR). In contrast to existing worst-case theory, we show that, on certain classes of common statistical problems, DADVI and the SAA can perform well with relatively few samples even in very high dimensions, though we also show that such favorable results cannot extend to variational approximations that are too expressive relative to mean-field ADVI. We show on a variety of real-world problems that DADVI reliably finds good solutions with default settings (unlike ADVI) and, together with LR covariances, is typically faster and more accurate than standard ADVI.
- C. Bishop. Pattern Recognition and Machine Learning. Springer, New York, 2006. Chapter 10.
- Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
- An automatic finite-sample robustness metric: When can dropping a little data make a big difference? arXiv preprint arXiv:2011.14999, 2020.
- Sample average approximation for Black-Box VI. arXiv preprint arXiv:2304.06803, 2023.
- Robust, accurate stochastic optimization for variational inference. Advances in Neural Information Processing Systems, 33:10961–10973, 2020.
- J. Domke and D. Sheldon. Importance weighting and variational inference. Advances in Neural Information Processing Systems, 31, 2018.
- J. Domke and D. Sheldon. Divide and couple: Using Monte Carlo variational objectives for posterior approximation. Advances in Neural Information Processing Systems, 32, 2019.
- R. Dudley. Real analysis and probability. CRC Press, 2018.
- A. Gelman and J. Hill. Data Analysis Using Regression and Multilevel/Hierarchical Models. Analytical Methods for Social Research. Cambridge University Press, 2006. doi: 10.1017/CBO9780511790942.
- A swiss army infinitesimal jackknife. In International Conference on Artificial Intelligence and Statistics, pages 1139–1147. PMLR, 2019.
- Evaluating sensitivity to the stick-breaking prior in bayesian nonparametrics (with discussion). Bayesian Analysis, 18(1):287–366, 2023.
- Linear response methods for accurate covariance estimates from mean field variational Bayes. In Advances in Neural Information Processing Systems, pages 1441–1449, 2015.
- Covariances, robustness, and variational Bayes. Journal of Machine Learning Research, 19(51):1–49, 2018.
- An Updated Dynamic Bayesian Forecasting Model for the US Presidential Election. Harvard Data Science Review, 2(4), 10 2020. doi: 10.1162/99608f92.fc62f1e1. URL https://hdsr.mitpress.mit.edu/pub/nw1dzd02.
- M. Hoffman and A. Gelman. The No-U-Turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
- Stochastic variational inference. Journal of Machine Learning Research, 2013.
- T. Homem-de Mello and G. Bayraksan. Monte Carlo sampling-based methods for stochastic optimization. Surveys in Operations Research and Management Science, 19(1):56–85, 2014.
- Validated variational inference via practical posterior error bounds. In International Conference on Artificial Intelligence and Statistics, pages 1792–1802. PMLR, 2020.
- Scaling multi-species occupancy detection models to large citizen science datasets. In preparation, 2022.
- M. Kery and A. Royle. Inference about species richness and community structure using species-specific occupancy models in the National Swiss Breeding Bird Survey MUB, pages 639–656. Modeling demographic processes in marked populations. Springer, New York and London, 2009. URL http://pubs.er.usgs.gov/publication/5211455.
- A guide to sample average approximation. Handbook of simulation optimization, pages 207–243, 2015.
- Automatic Differentiation Variational Inference. Journal of Machine Learning Research, 18(14):1–45, 2017.
- D. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
- C. Margossian and L. Saul. The shrinkage-delinkage trade-off: An analysis of factorized gaussian approximations for variational inference. arXiv preprint arXiv:2302.09163, 2023.
- R. Meager. Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments. American Economic Journal: Applied Economics, 11(1):57–91, January 2019. doi: 10.1257/app.20170299.
- Monte Carlo gradient estimation in machine learning. Journal of Machine Learning Research, 21(132):1–62, 2020.
- Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609, 2009.
- J. Nocedal and S. Wright. Numerical optimization. Springer, 1999.
- Black Box Variational Inference. In International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, pages 814–822. PMLR, 2014.
- D. Rezende and S. Mohamed. Variational inference with normalizing flows. In International Conference on Machine Learning, pages 1530–1538, 2015.
- J. Royset and R. Szechtman. Optimal budget allocation for sample average approximation. Operations Research, 61(3):762–776, 2013.
- Probabilistic programming in python using PyMC3. PeerJ Computer Science, 2:e55, apr 2016. doi: 10.7717/peerj-cs.55. URL https://doi.org/10.7717/peerj-cs.55.
- A. Shapiro. Monte Carlo sampling methods. Handbooks in operations research and management science, 10:353–425, 2003.
- Lectures on stochastic programming: Modeling and theory. SIAM, 2021.
- L. Stefanski and D. Boos. The calculus of M-estimation. The American Statistician, 56(1):29–38, 2002.
- R. Turner and M. Sahani. Two problems with variational expectation maximisation for time-series models. In D. Barber, A. T. Cemgil, and S. Chiappa, editors, Bayesian Time Series Models. Cambridge University Press, 2011.
- A. Van der Vaart and J. Wellner. Weak convergence and empirical processes: With applications to statistics. Springer Science & Business Media, 2013.
- M. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019.
- Robust, automated, and accurate black-box variational inference. arXiv preprint arXiv:2203.15945, 2022.
- S. Wright and J. Nocedal. Numerical Optimization, volume 35. Springer Series in Operation Research and Financial Engineering, 1999.
- Sparse bayesian lasso via a variable-coefficient ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT penalty. arXiv preprint arXiv:2211.05089, 2022.
- Pathfinder: Parallel quasi-newton variational inference. The Journal of Machine Learning Research, 23(1):13802–13850, 2022.