An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling (2403.06183v1)
Abstract: Understanding the dimension dependency of computational complexity in high-dimensional sampling problem is a fundamental problem, both from a practical and theoretical perspective. Compared with samplers with unbiased stationary distribution, e.g., Metropolis-adjusted Langevin algorithm (MALA), biased samplers, e.g., Underdamped Langevin Dynamics (ULD), perform better in low-accuracy cases just because a lower dimension dependency in their complexities. Along this line, Freund et al. (2022) suggest that the modified Langevin algorithm with prior diffusion is able to converge dimension independently for strongly log-concave target distributions. Nonetheless, it remains open whether such property establishes for more general cases. In this paper, we investigate the prior diffusion technique for the target distributions satisfying log-Sobolev inequality (LSI), which covers a much broader class of distributions compared to the strongly log-concave ones. In particular, we prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules. The core of our proof technique is a novel construction of an interpolating SDE, which significantly helps to conduct a more accurate characterization of the discrete updates of the overdamped Langevin dynamics. Our theoretical analysis demonstrates the benefits of prior diffusion for a broader class of target distributions and provides new insights into developing faster sampling algorithms.
- Faster high-accuracy log-concave sampling via algorithmic warm starts. arXiv preprint arXiv:2302.10249.
- Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
- Diffusions hypercontractives. In Seminaire de probabilités XIX 1983/84, pages 177–206. Springer.
- Analysis and geometry of Markov diffusion operators, volume 103. Springer.
- A fast iterative shrinkage-thresholding algorithm for linear inverse problems. Siam J Imaging Sciences, 2(1):183–202.
- Bellman, R. (1968). Some inequalities for the square root of a positive definite matrix. Linear Algebra and its applications, 1(3):321–324.
- Pattern recognition and machine learning, volume 4. Springer.
- Handbook of markov chain monte carlo. CRC press.
- Dimension-free log-sobolev inequalities for mixture distributions. Journal of Functional Analysis, 281(11):109236.
- Fast mixing of metropolized hamiltonian monte carlo: Benefits of multi-step gradients. The Journal of Machine Learning Research, 21(1):3647–3717.
- Convergence of langevin mcmc in kl-divergence. In Algorithmic Learning Theory, pages 186–211. PMLR.
- Dalalyan, A. (2017a). Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent. In Conference on Learning Theory, pages 678–689. PMLR.
- Dalalyan, A. S. (2017b). Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):651–676.
- User-friendly guarantees for the langevin monte carlo with inaccurate gradient. Stochastic Processes and their Applications, 129(12):5278–5311.
- Analysis of langevin monte carlo via convex optimization. The Journal of Machine Learning Research, 20(1):2666–2711.
- High-dimensional bayesian inference via the unadjusted langevin algorithm. Bernoulli, 25(4A):2854–2882.
- Log-concave sampling: Metropolis-hastings algorithms are fast! In Conference on learning theory, pages 793–797. PMLR.
- Log-concave sampling: Metropolis-hastings algorithms are fast. Journal of Machine Learning Research, 20(183):1–42.
- When is the convergence time of langevin algorithms dimension independent? a composite optimization viewpoint. Journal of Machine Learning Research, 23(214):1–32.
- Markov chain Monte Carlo in practice. CRC press.
- Reverse diffusion monte carlo. In International Conference on Learning Representations (ICLR).
- Faster sampling without isoperimetry via diffusion-based monte carlo. arXiv preprint arXiv:2401.06325.
- An introduction to variational methods for graphical models. Machine learning, 37(2):183–233.
- The variational formulation of the fokker–planck equation. SIAM journal on mathematical analysis, 29(1):1–17.
- Double randomized underdamped langevin with dimension-independent convergence guarantee. Advances in Neural Information Processing Systems, 36.
- Is there an analog of nesterov acceleration for mcmc? arXiv preprint arXiv:1902.00996.
- Sampling can be faster than optimization. Proceedings of the National Academy of Sciences, 116(42):20881–20885.
- Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Department of Computer Science, University of Toronto Toronto, ON, Canada.
- Nesterov, Y. et al. (2018). Lectures on convex optimization, volume 137. Springer.
- Oksendal, B. (2013). Stochastic differential equations: an introduction with applications. Springer Science & Business Media.
- Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR.
- Risken, H. (1996). Fokker-planck equation. In The Fokker-Planck Equation, pages 63–95. Springer.
- Monte Carlo statistical methods, volume 2. Springer.
- Langevin diffusions and metropolis-hastings algorithms. Methodology and computing in applied probability, 4(4):337–357.
- Brownian dynamics as smart monte carlo simulation. The Journal of Chemical Physics, 69(10):4628–4633.
- Applied stochastic differential equations, volume 10. Cambridge University Press.
- The randomized midpoint method for log-concave sampling. Advances in Neural Information Processing Systems, 32.
- Consistency and fluctuations for stochastic gradient langevin dynamics. Journal of Machine Learning Research, 17.
- Rapid convergence of the unadjusted langevin algorithm: Isoperimetry suffices. Advances in neural information processing systems, 32.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th international conference on machine learning (ICML-11), pages 681–688. Citeseer.
- Wibisono, A. (2018). Sampling as optimization in the space of measures: The langevin dynamics as a composite optimization problem. In Conference on Learning Theory, pages 2093–3027. PMLR.
- Wibisono, A. (2019). Proximal langevin algorithm: Rapid convergence under isoperimetry. arXiv preprint arXiv:1911.01469.
- Langevin diffusions and the metropolis-adjusted langevin algorithm. Statistics & Probability Letters, 91:14–19.
- Global convergence of Langevin dynamics based algorithms for nonconvex optimization. In Advances in Neural Information Processing Systems, pages 3126–3137.
- Faster convergence of stochastic gradient langevin dynamics for non-log-concave sampling. In Uncertainty in Artificial Intelligence, pages 1152–1162. PMLR.