Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Cyclical MCMC Sampling (2403.00230v1)

Published 1 Mar 2024 in stat.CO and stat.ML

Abstract: Cyclical MCMC is a novel MCMC framework recently proposed by Zhang et al. (2019) to address the challenge posed by high-dimensional multimodal posterior distributions like those arising in deep learning. The algorithm works by generating a nonhomogeneous Markov chain that tracks -- cyclically in time -- tempered versions of the target distribution. We show in this work that cyclical MCMC converges to the desired probability distribution in settings where the Markov kernels used are fast mixing, and sufficiently long cycles are employed. However in the far more common settings of slow mixing kernels, the algorithm may fail to produce samples from the desired distribution. In particular, in a simple mixture example with unequal variance, we show by simulation that cyclical MCMC fails to converge to the desired limit. Finally, we show that cyclical MCMC typically estimates well the local shape of the target distribution around each mode, even when we do not have convergence to the target.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Variational Inference: A Review for Statisticians. ArXiv e-prints arXiv:1601.00670.
  2. Stochastic gradient hamiltonian monte carlo. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32. ICML’14, JMLR.org.
  3. Markov chains. Springer International Publishing.
  4. High-dimensional bayesian inference via the unadjusted langevin algorithm. Bernoulli 25 2854–2882.
  5. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of The 33rd International Conference on Machine Learning (M. F. Balcan and K. Q. Weinberger, eds.), vol. 48 of Proceedings of Machine Learning Research. PMLR.
  6. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Statist. Sci. 13 163–185.
  7. Geyer, C. (1991). Markov chain monte carlo maximum likelihood, in computing science and statistics: Proceedings of the 32rd symposium on the interface, ed. e.m. keramigas, fairfax: Interface foundation, pp 156-163 .
  8. Annealing markov chain monte carlo with applications to pedigree analysis. Journal of the American Statistical Association 90 909–920.
  9. Graves, A. (2011). Practical variational inference for neural networks. In Advances in Neural Information Processing Systems (J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira and K. Weinberger, eds.), vol. 24. Curran Associates, Inc.
  10. Hervé, L. (2008). Vitesse de convergence dans le théorème limite central pour des chaînes de Markov fortement ergodiques. Ann. Inst. Henri Poincaré Probab. Stat. 44 280–292.
  11. Exchange monte carlo method and application to spin glass simulations. Journal of the Physical Society of Japan 65 1604–1608.
  12. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning 110 457–506.
  13. Geometric ergodicity of metropolis algorithms. Sto. Proc. Appl. 85 341–361.
  14. Exponential bounds and stopping rules for mcmc and general markov chains. First International Conference on Performance Evaluation Methodologies and Tools, Pisa, Italy .
  15. A complete recipe for stochastic gradient mcmc. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2. NIPS’15, MIT Press, Cambridge, MA, USA.
  16. Simulated tempering: A new monte carlo schemes. Europhysics letters 19 451–458.
  17. Matthews, P. (1993). A slowly mixing markov chain with implications for gibbs sampling. Statistics & Probability Letters 17 231–236.
  18. Markov chains and stochastic stability. 2nd ed. Cambridge University Press, Cambridge.
  19. Entropic gradient descent algorithms and wide flat minima. In International Conference on Learning Representations. URL https://openreview.net/forum?id=xjXg0bnoDmS
  20. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Proceedings of the 2017 Conference on Learning Theory (S. Kale and O. Shamir, eds.), vol. 65 of Proceedings of Machine Learning Research. PMLR.
  21. Monte Carlo statistical methods. 2nd ed. Springer Texts in Statistics, Springer-Verlag, New York.
  22. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 1929–1958.
  23. Parallel tempering on optimized paths. In Proceedings of the 38th International Conference on Machine Learning (M. Meila and T. Zhang, eds.), vol. 139 of Proceedings of Machine Learning Research. PMLR. URL https://proceedings.mlr.press/v139/syed21a.html
  24. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11, Omnipress, USA.
  25. Whiteley, N. (2013). Stability properties of some particle filters. The Annals of Applied Probability 23 2500 – 2537.
  26. Sufficient conditions for torpid mixing of parallel and simulated tempering. Electron. J. Probab. 14 780–804. URL https://doi.org/10.1214/EJP.v14-638
  27. Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. Ann. Appl. Probab. 19 617–640.
  28. Cyclical stochastic gradient mcmc for bayesian deep learning. arXiv preprint arXiv:1902.03932 .
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets