Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Gradient-based Discrete Sampling with Automatic Cyclical Scheduling (2402.17699v2)

Published 27 Feb 2024 in cs.LG and stat.ML

Abstract: Discrete distributions, particularly in high-dimensional deep models, are often highly multimodal due to inherent discontinuities. While gradient-based discrete sampling has proven effective, it is susceptible to becoming trapped in local modes due to the gradient information. To tackle this challenge, we propose an automatic cyclical scheduling, designed for efficient and accurate sampling in multimodal discrete distributions. Our method contains three key components: (1) a cyclical step size schedule where large steps discover new modes and small steps exploit each mode; (2) a cyclical balancing schedule, ensuring "balanced" proposals for given step sizes and high efficiency of the Markov chain; and (3) an automatic tuning scheme for adjusting the hyperparameters in the cyclical schedules, allowing adaptability across diverse datasets with minimal tuning. We prove the non-asymptotic convergence and inference guarantee for our method in general discrete distributions. Extensive experiments demonstrate the superiority of our method in sampling complex multimodal discrete distributions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Multicanonical algorithms for first order phase transitions. Physics Letters B, 267(2):249–253, 1991.
  2. Optimization methods for large-scale machine learning. Siam Review, 60(2):223–311, 2018.
  3. Dalalyan, A. S. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society Series B: Statistical Methodology, 79(3):651–676, 2017.
  4. Non-convex learning via replica exchange stochastic gradient mcmc. In International Conference on Machine Learning, pp.  2474–2483. PMLR, 2020a.
  5. A contour stochastic gradient langevin dynamics algorithm for simulations of multi-modal distributions. Advances in neural information processing systems, 33:15725–15736, 2020b.
  6. Implicit generation and modeling with energy based models. Advances in Neural Information Processing Systems, 32, 2019.
  7. Discs: A benchmark for discrete sampling. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  8. Oops I took A gradient: Scalable sampling for discrete distributions. CoRR, abs/2102.04509, 2021. URL https://arxiv.org/abs/2102.04509.
  9. Hinton, G. E. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800, 2002.
  10. Jones, G. L. On the markov chain central limit theorem. 2004.
  11. Simulated tempering: a new monte carlo scheme. Europhysics letters, 19(6):451, 1992.
  12. Neal, R. M. Annealed importance sampling. Statistics and computing, 11:125–139, 2001.
  13. Enhanced gradient-based mcmc in discrete spaces, 2022.
  14. Ruder, S. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
  15. Sansone, E. Lsb: Local self-balancing mcmc in discrete spaces. In International Conference on Machine Learning, pp.  19205–19220. PMLR, 2022.
  16. Path auxiliary proposal for mcmc in discrete space. In International Conference on Learning Representations, 2021.
  17. Optimal scaling for locally balanced proposals in discrete spaces, 2022.
  18. Any-scale balanced samplers for discrete space. In The Eleventh International Conference on Learning Representations, 2023a. URL https://openreview.net/forum?id=lEkl0jdSb7B.
  19. Discrete langevin samplers via wasserstein gradient flow. In International Conference on Artificial Intelligence and Statistics, pp.  6290–6313. PMLR, 2023b.
  20. Replica monte carlo simulation of spin-glasses. Physical review letters, 57(21):2607, 1986.
  21. Nonuniversal critical dynamics in monte carlo simulations. Physical review letters, 58(2):86, 1987.
  22. Tieleman, T. Training restricted boltzmann machines using approximations to the likelihood gradient. In Proceedings of the 25th international conference on Machine learning, pp.  1064–1071, 2008.
  23. Multivariate output analysis for markov chain monte carlo. Biometrika, 106(2):321–337, 2019.
  24. Wolff, U. Collective monte carlo updating for spin systems. Physical Review Letters, 62(4):361, 1989.
  25. Efficient informed proposals for discrete distributions via newton’s series approximation. In International Conference on Artificial Intelligence and Statistics, pp.  7288–7310. PMLR, 2023.
  26. Zanella, G. Informed proposals for local mcmc in discrete spaces, 2017.
  27. Cyclical stochastic gradient mcmc for bayesian deep learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkeS1RVtPS.
  28. A langevin-like sampler for discrete distributions. In International Conference on Machine Learning, pp.  26375–26396. PMLR, 2022a.
  29. A langevin-like sampler for discrete distributions, 2022b.
  30. Sgd can converge to local maxima. In International Conference on Learning Representations, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets