Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Approximate Sampling via Stochastic Gradient Barker Dynamics (2405.08999v1)

Published 14 May 2024 in stat.ML and cs.LG

Abstract: Stochastic Gradient (SG) Markov Chain Monte Carlo algorithms (MCMC) are popular algorithms for Bayesian sampling in the presence of large datasets. However, they come with little theoretical guarantees and assessing their empirical performances is non-trivial. In such context, it is crucial to develop algorithms that are robust to the choice of hyperparameters and to gradients heterogeneity since, in practice, both the choice of step-size and behaviour of target gradients induce hard-to-control biases in the invariant distribution. In this work we introduce the stochastic gradient Barker dynamics (SGBD) algorithm, extending the recently developed Barker MCMC scheme, a robust alternative to Langevin-based sampling algorithms, to the stochastic gradient framework. We characterize the impact of stochastic gradients on the Barker transition mechanism and develop a bias-corrected version that, under suitable assumptions, eliminates the error due to the gradient noise in the proposal. We illustrate the performance on a number of high-dimensional examples, showing that SGBD is more robust to hyperparameter tuning and to irregular behavior of the target gradients compared to the popular stochastic gradient Langevin dynamics algorithm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. A new learning algorithm for blind signal separation. In D. Touretzky, M.C. Mozer, and M. Hasselmo, editors, Advances in Neural Information Processing Systems, volume 8. MIT Press, 1995.
  2. Adelchi Azzalini. The skew-normal and related families, volume 3. Cambridge University Press, 2013.
  3. Towards scaling up markov chain monte carlo: an adaptive subsampling approach. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 405–413, Bejing, China, 22–24 Jun 2014. PMLR.
  4. Michael Betancourt. The fundamental incompatibility of scalable hamiltonian monte carlo and naive data subsampling. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 533–540, Lille, France, 07–09 Jul 2015. PMLR.
  5. A logistic approximation to the cumulative normal distribution. Journal of Industrial Engineering and Management, 2, 07 2009.
  6. The promises and pitfalls of stochastic gradient langevin dynamics. Advances in Neural Information Processing Systems, 31, 2018.
  7. Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 2017.
  8. Stochastic gradient hamiltonian monte carlo. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 1683–1691, Bejing, China, Jun 2014. PMLR.
  9. Efficient and generalizable tuning strategies for stochastic gradient mcmc. Statistics and Computing, 33, 04 2023.
  10. Bayesian sampling using stochastic gradient thermostats. In Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 27. Curran Associates, Inc., 2014.
  11. Hybrid monte carlo. Physics Letters B, 195(2):216–222, 1987.
  12. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(61):2121–2159, 2011.
  13. A fresh take on ’barker dynamics’ for mcmc. In Monte Carlo and Quasi-Monte Carlo Methods, 2020.
  14. A linear-time kernel goodness-of-fit test. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  15. Austerity in mcmc land: Cutting the metropolis-hastings budget. In Eric P. Xing and Tony Jebara, editors, Proceedings of the 31st International Conference on Machine Learning, volume 32 of Proceedings of Machine Learning Research, pages 181–189, Bejing, China, Jun 2014. PMLR.
  16. The barker proposal: Combining robustness and efficiency in gradient‐based mcmc. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2022.
  17. Relativistic Monte Carlo . In Aarti Singh and Jerry Zhu, editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1236–1245. PMLR, 20–22 Apr 2017.
  18. A complete recipe for stochastic gradient mcmc. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, page 2917–2925, Cambridge, MA, USA, 2015. MIT Press.
  19. Radford Neal. Mcmc using hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2012.
  20. Stochastic gradient markov chain monte carlo. Journal of the American Statistical Association, 116(533):433–450, 2021.
  21. Accelerated sampling on discrete spaces with non-reversible markov processes. arXiv preprint arXiv:1912.04681, 2019.
  22. A stochastic approximation method. The Annals of Mathematical Statistics, 22(3):400 – 407, 1951.
  23. Optimal scaling of discrete approximations to langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1):255–268, 1995.
  24. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363, 1996.
  25. Bayesian probabilistic matrix factorization using markov chain monte carlo. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, page 880–887, New York, NY, USA, 2008. Association for Computing Machinery.
  26. Discrete langevin samplers via wasserstein gradient flow. In International Conference on Artificial Intelligence and Statistics, pages 6290–6313. PMLR, 2023.
  27. Consistency and fluctuations for stochastic gradient langevin dynamics. Journal of Machine Learning Research,, 17(1):193–225, 2016.
  28. Optimal design of the Barker proposal and other locally balanced Metropolis–Hastings algorithms. Biometrika, 10 2022. ISSN 1464-3510.
  29. Exploration of the (non-)asymptotic bias and variance of stochastic gradient langevin dynamics. Journal of Machine Learning Research, 17(159):1–48, 2016.
  30. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 681–688, Madison, WI, USA, 2011. Omnipress.
  31. Giacomo Zanella. Informed proposals for local mcmc in discrete spaces. Journal of the American Statistical Association, 115(530):852–865, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com