Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster Sampling via Stochastic Gradient Proximal Sampler (2405.16734v1)

Published 27 May 2024 in stat.ML and cs.LG

Abstract: Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve $\epsilon$-sampling error in total variation (TV) distance within $\tilde{\mathcal{O}}(d\epsilon{-2})$ and $\tilde{\mathcal{O}}(d{1/2}\epsilon{-2})$ gradient complexities, which outperform the best-known result by at least an $\tilde{\mathcal{O}}(d{1/3})$ factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Faster high-accuracy log-concave sampling via algorithmic warm starts. arXiv preprint arXiv:2302.10249.
  2. Pattern recognition and machine learning, volume 4. Springer.
  3. Coupling and convergence for hamiltonian monte carlo.
  4. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pages 1683–1691. PMLR.
  5. Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, pages 2984–3014. PMLR.
  6. Fast mixing of metropolized hamiltonian monte carlo: Benefits of multi-step gradients. The Journal of Machine Learning Research, 21(1):3647–3717.
  7. Optimal convergence rate of hamiltonian monte carlo for strongly logconcave distributions. Theory of Computing, 18(1):1–18.
  8. Convergence of langevin mcmc in kl-divergence. In Algorithmic Learning Theory, pages 186–211. PMLR.
  9. Underdamped langevin mcmc: A non-asymptotic analysis. In Conference on learning theory, pages 300–323. PMLR.
  10. User-friendly guarantees for the langevin monte carlo with inaccurate gradient. Stochastic Processes and their Applications, 129(12):5278–5311.
  11. Utilising the clt structure in stochastic gradient based sampling: Improved analysis and faster algorithms. In The Thirty Sixth Annual Conference on Learning Theory, pages 4072–4129. PMLR.
  12. Particle-based variational inference with preconditioned functional gradient flow. arXiv preprint arXiv:2211.13954.
  13. Analysis of langevin monte carlo via convex optimization. The Journal of Machine Learning Research, 20(1):2666–2711.
  14. On the convergence of hamiltonian monte carlo. arXiv preprint arXiv:1705.00166.
  15. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization i: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469–1492.
  16. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368.
  17. Monte carlo sampling without isoperimetry: A reverse diffusion approach.
  18. Faster sampling without isoperimetry via diffusion-based monte carlo.
  19. Structured logconcave sampling with a restricted gaussian oracle. In Conference on Learning Theory, pages 2993–3050. PMLR.
  20. Algorithmic theory of odes and sampling from well-conditioned logconcave densities. arXiv preprint arXiv:1812.06243.
  21. Lemarechal, C. (1978). Nonsmooth optimization and descent methods.
  22. Lemarechal, C. (2009). An extension of davidon methods to non differentiable problems. In Nondifferentiable optimization, pages 95–109. Springer.
  23. A proximal algorithm for sampling from non-smooth potentials. In 2022 Winter Simulation Conference (WSC), pages 3229–3240. IEEE.
  24. A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes. SIAM Journal on Optimization, 31(4):2955–2986.
  25. A unified analysis of a class of proximal bundle methods for solving hybrid convex composite optimization problems. Mathematics of Operations Research.
  26. Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli, 27(3).
  27. Rapid mixing of hamiltonian monte carlo on strongly log-concave distributions. arXiv preprint arXiv:1708.07114.
  28. Dimensionally tight bounds for second-order hamiltonian monte carlo. Advances in neural information processing systems, 31.
  29. Mifflin, R. (1982). A modification and an extension of Lemaréchal’s algorithm for nonsmooth minimization. Springer.
  30. High-order langevin diffusion yields an accelerated mcmc algorithm. The Journal of Machine Learning Research, 22(1):1919–1959.
  31. Neal, R. (1992). Bayesian learning via stochastic dynamics. Advances in neural information processing systems, 5.
  32. Neal, R. M. (1993). Probabilistic inference using markov chain monte carlo methods.
  33. Neal, R. M. (2010). MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 54:113–162.
  34. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19(4):1574–1609.
  35. Nesterov, Y. (2013). Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media.
  36. Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR.
  37. Monte Carlo statistical methods, volume 2. Springer.
  38. Langevin diffusions and metropolis-hastings algorithms. Methodology and computing in applied probability, 4:337–357.
  39. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pages 341–363.
  40. Rockafellar, R. T. (1976). Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898.
  41. Rapid convergence of the unadjusted langevin algorithm: Isoperimetry suffices. Advances in neural information processing systems, 32.
  42. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688.
  43. Wolfe, P. (2009). A method of conjugate subgradients for minimizing nondifferentiable functions. In Nondifferentiable optimization, pages 145–173. Springer.
  44. Minimax mixing time of the metropolis-adjusted langevin algorithm for log-concave sampling. The Journal of Machine Learning Research, 23(1):12348–12410.
  45. Langevin diffusions and the Metropolis adjusted Langevin algorithm. Stat. Probabil. Lett., 91:14–19.
  46. Global convergence of Langevin dynamics based algorithms for nonconvex optimization. In Advances in Neural Information Processing Systems, pages 3126–3137.
  47. On the convergence of hamiltonian monte carlo with stochastic gradients. In International Conference on Machine Learning, pages 13012–13022. PMLR.
  48. Sampling from non-log-concave distributions via variance-reduced gradient langevin dynamics. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2936–2945. PMLR.
  49. Faster convergence of stochastic gradient langevin dynamics for non-log-concave sampling. In Uncertainty in Artificial Intelligence, pages 1152–1162. PMLR.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com