General bounds on the quality of Bayesian coresets (2405.11780v2)
Abstract: Bayesian coresets speed up posterior inference in the large-scale data regime by approximating the full-data log-likelihood function with a surrogate log-likelihood based on a small, weighted subset of the data. But while Bayesian coresets and methods for construction are applicable in a wide range of models, existing theoretical analysis of the posterior inferential error incurred by coreset approximations only apply in restrictive settings -- i.e., exponential family models, or models with strong log-concavity and smoothness assumptions. This work presents general upper and lower bounds on the Kullback-Leibler (KL) divergence of coreset approximations that reflect the full range of applicability of Bayesian coresets. The lower bounds require only mild model assumptions typical of Bayesian asymptotic analyses, while the upper bounds require the log-likelihood functions to satisfy a generalized subexponentiality criterion that is weaker than conditions used in earlier work. The lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations, and to provide a theoretical explanation for the previously-observed poor empirical performance of importance sampling-based construction methods. The upper bounds are used to analyze the performance of recent subsample-optimize methods. The flexibility of the theory is demonstrated in validation experiments involving multimodal, unidentifiable, heavy-tailed Bayesian posterior distributions.
- Monte Carlo Statistical Methods. Springer, 2nd edition, 2004.
- Bayesian data analysis. CRC Press, 3rd edition, 2013.
- Markov chain Monte Carlo using an approximation. Journal of Computational and Graphical Statistics, 14(4):795–810, 2005.
- Accelerating Metropolis-Hastings algorithms by delayed acceptance. Foundations of Data Science, 1(2):103–128, 2019.
- Bayesian big data classification: a review with complements. arXiv:1411.5653, 2014.
- Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods. Journal of Computational and Graphical Statistics, 26(2):434–444, 2017.
- Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika, 102(2):295–313, 2015.
- Firefly Monte Carlo: exact MCMC with subsets of data. In Conference on Uncertainty in Artificial Intelligence, 2014.
- The block-Poisson estimator for optimally tuned exact subsampling MCMC. Journal of Computational and Graphical Statistics, 30(4):877–888, 2021.
- Bayesian learning via stochastic gradient Langevin dynamics. In International Conference on Machine Learning, 2011.
- Bayesian posterior sampling via stochastic gradient Fisher scoring. In International Conference on Machine Learning, 2012.
- Austerity in MCMC land: cutting the Metropolis-Hastings budget. In International Conference on Machine Learning, 2014.
- Stochastic gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning, 2015.
- No free lunch for approximate MCMC. arXiv:2010.12514, 2020.
- On Markov chain Monte Carlo methods for tall data. Journal of Machine Learning Research, 18:1–43, 2017.
- The true cost of stochastic gradient Langevin dynamics. arXiv:1706.02692, 2017.
- Control variates for stochastic gradient MCMC. Statistics and Computing, 29:599–615, 2019.
- Stochastic gradient Markov Chain Monte Carlo. Journal of the American Statistical Association, 116(533):433–450, 2021.
- Speeding up MCMC by efficient data subsampling. Journal of the American Statistical Association, 114(526):831–843, 2019.
- Subsampling MCMC—an introduction for the survey statistician. Sankhya: The Indian Journal of Statistics, 80-A:S33–S69, 2018.
- Variational inference: A review for statisticians. Journal of the American Statistical Association, 112(518):859–877, 2017.
- Laplace approximation of high dimensional integrals. Journal of the Royal Statistical Society: Series B, 57(4):749–760, 1995.
- Asymptotic normality and valid inference for gaussian variational approximation. The Annals of Statistics, 39(5):2502–2532, 2011.
- Aad van der Vaart. Asymptotic Statistics. Cambridge University Press, 2000.
- Frequentist consistency of variational Bayes. Journal of the American Statistical Association, 114(527):1147–1161, 2018.
- Consistency of variational Bayes inference for estimation and model selection in mixtures. Electronic Journal of Statistics, 12:2995–3035, 2018.
- α𝛼\alphaitalic_α-variational inference with statistical guarantees. The Annals of Statistics, 2018.
- Concentration of tempered posteriors and of their variational approximations. The Annals of Statistics, 48(3):1475–1497, 2020.
- The computational asymptotics of Gaussian variational inference and the Laplace approximation. Statistics and Computing, 32(63), 2022.
- Jeffrey Miller. Asymptotic normality, concentration, and coverage of generalized posteriors. Journal of Machine Learning Research, 22:1–53, 2021.
- Advances in variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8):2008–2026, 2019.
- Coresets for scalable Bayesian logistic regression. In Advances in Neural Information Processing Systems, 2016.
- Automated scalable Bayesian inference via Hilbert coresets. Journal of Machine Learning Research, 20(15):1–38, 2019.
- Sparse variational inference: Bayesian coresets from scratch. In Advances in Neural Information Processing Systems, 2019.
- Bayesian coreset construction via greedy iterative geodesic ascent. In International Conference on Machine Learning, 2018.
- Bayesian inference via sparse Hamiltonian flows. In Advances in Neural Information Processing Systems, 2022.
- Fast Bayesian coresets via subsampling and quasi-Newton refinement. In Advances in Neural Information Processing Systems, 2022.
- Martin Jankowiak and Du Phan. Surrogate likelihoods for variational annealed importance sampling. In International Conference on Machine Learning, 2022.
- Coreset Markov chain Monte Carlo. In International Conference on Artificial Intelligence and Statistics, 2024.
- A statistical perspective on algorithmic leveraging. Journal of Machine Learning Research, 16:861–911, 2015.
- Optimal subsampling for large sample logistic regression. Journal of the American Statistical Association, 113(522):829–844, 2018.
- HaiYing Wang. More efficient estimation for logistic regression with optimal subsamples. Journal of Machine Learning Research, 20:1–59, 2019.
- Optimal subsampling algorithms for big data regressions. Statistica Sinica, 31(2):749–772, 2021.
- Optimal subsampling for quantile regression in big data. Biometrika, 108(1):99–112, 2021.
- Dan Feldman. Introduction to core-sets: an updated survey. arXiv:2011.09384, 2020.
- Alastair Walker. New fast method for generating discrete random numbers with arbitrary frequency distributions. Electronics Letters, 10(8):127–128, 1974.
- Alastair Walker. An efficient method for generating discrete random variables with general distributions. ACM Transactions on Mathematical Software, 3(3):253–256, 1977.
- Tim van Erven and Peter Harrëmos. Rényi divergence and Kullback-Leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
- Roman Vershynin. High-dimensional probability: an introduction with applications in data science. Cambridge University Press, 2020.
- Igor Vajda. Note on discrimination information and variation. IEEE Transactions on Information Theory, 16(6):771–773, 1970.
- David Pollard. A user’s guide to probability theory. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, 7thsuperscript7th7^{\text{th}}7 start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT edition, 2002.
- Robert Keener. Theoretical statistics: topics for a core course. Springer, 2010.
- Andre Bulinski. Conditional central limit theorem. Theory of Probability & its Applications, 61(4):613–631, 2017.