Zeroth-Order Sampling Methods for Non-Log-Concave Distributions: Alleviating Metastability by Denoising Diffusion (2402.17886v4)
Abstract: This paper considers the problem of sampling from non-logconcave distribution, based on queries of its unnormalized density. It first describes a framework, Denoising Diffusion Monte Carlo (DDMC), based on the simulation of a denoising diffusion process with its score function approximated by a generic Monte Carlo estimator. DDMC is an oracle-based meta-algorithm, where its oracle is the assumed access to samples that generate a Monte Carlo score estimator. Then we provide an implementation of this oracle, based on rejection sampling, and this turns DDMC into a true algorithm, termed Zeroth-Order Diffusion Monte Carlo (ZOD-MC). We provide convergence analyses by first constructing a general framework, i.e. a performance guarantee for DDMC, without assuming the target distribution to be log-concave or satisfying any isoperimetric inequality. Then we prove that ZOD-MC admits an inverse polynomial dependence on the desired sampling accuracy, albeit still suffering from the curse of dimensionality. Consequently, for low dimensional distributions, ZOD-MC is a very efficient sampler, with performance exceeding latest samplers, including also-denoising-diffusion-based RDMC and RSDMC. Last, we experimentally demonstrate the insensitivity of ZOD-MC to increasingly higher barriers between modes or discontinuity in non-convex potential.
- Towards a theory of non-log-concave sampling: first-order stationarity guarantees for langevin monte carlo. In Conference on Learning Theory, pages 2896–2923. PMLR, 2022.
- Linear convergence bounds for diffusion models via stochastic localization. arXiv preprint arXiv:2308.03686, 2023.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. ICLR, 2022.
- Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, pages 2984–3014. PMLR, 2022.
- Y. Chen and R. Eldan. Localization schemes: A framework for proving mixing bounds for markov chains. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 110–122. IEEE, 2022.
- Y. Chen and K. Gatmiry. A simple proof of the mixing of metropolis-adjusted langevin algorithm under smoothness and isoperimetry. arXiv preprint arXiv:2304.04095, 2023.
- S. Chewi. Log-concave sampling. 2023. Book draft available at https://chewisinho.github.io/.
- Svgd as a kernelized wasserstein gradient flow of the chi-squared divergence. Advances in Neural Information Processing Systems, 33:2098–2109, 2020.
- Score diffusion models without early stopping: finite fisher information is all you need. arXiv preprint arXiv:2308.12240, 2023.
- A. S. Dalalyan and A. Karagulyan. User-friendly guarantees for the langevin monte carlo with inaccurate gradient. Stochastic Processes and their Applications, 129(12):5278–5311, 2019.
- A. S. Dalalyan and L. Riou-Durand. On sampling from a log-concave density using kinetic langevin diffusions. Bernoulli, 26(3):1956–1988, 2020.
- V. De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. arXiv preprint arXiv:2208.05314, 2022.
- Log-concave sampling: Metropolis-hastings algorithms are fast. Journal of Machine Learning Research, 20(183):1–42, 2019.
- On the ergodicity, bias and asymptotic normality of randomized midpoint sampling method. Advances in Neural Information Processing Systems, 33:7366–7376, 2020.
- Regularized stein variational gradient flow. arXiv preprint arXiv:2211.07861, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Monte carlo sampling without isoperimetry: A reverse diffusion approach. arXiv preprint arXiv:2307.02037, 2023.
- Faster sampling without isoperimetry via diffusion-based monte carlo. arXiv preprint arXiv:2401.06325, 2024.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023.
- Structured logconcave sampling with a restricted gaussian oracle. In M. Belkin and S. Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 2993–3050. PMLR, 15–19 Aug 2021.
- Towards faster non-asymptotic convergence for diffusion-based generative models. arXiv preprint arXiv:2306.09251, 2023.
- The mirror Langevin algorithm converges with vanishing bias. In International Conference on Algorithmic Learning Theory, pages 718–742. PMLR, 2022.
- Sqrt(d) Dimension Dependence of Langevin Monte Carlo. In ICLR, 2021.
- X. H. Li and M. Tao. Automated construction of effective potential via algorithmic implicit bias. arXiv preprint arXiv:2401.03511, 2024.
- J. Liang and Y. Chen. A proximal algorithm for sampling. arXiv preprint arXiv:2202.13975, 2022.
- Q. Liu. Stein variational gradient descent as gradient flow. Advances in neural information processing systems, 30, 2017.
- L. Pardo. Statistical inference based on divergence measures. CRC press, 2018.
- Improved sampling via learned diffusions. ICLR, 2024.
- H. E. Robbins. An empirical bayes approach to statistics. In Breakthroughs in Statistics: Foundations and basic theory, pages 388–394. Springer, 1992.
- A convergence theory for svgd in the population limit under talagrand’s inequality t1. In International Conference on Machine Learning, pages 19139–19152. PMLR, 2022.
- A. Schlichting. Poincaré and log–sobolev inequalities for mixtures. Entropy, 21(1):89, 2019.
- R. Shen and Y. T. Lee. The randomized midpoint method for log-concave sampling. Advances in Neural Information Processing Systems, 32, 2019.
- Deep unsupervised learning using nonequilibrium thermodynamics. ICML, 2015.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
- S. Vempala and A. Wibisono. Rapid convergence of the unadjusted langevin algorithm: Isoperimetry suffices. Advances in neural information processing systems, 32, 2019.
- K. Yingxi Yang and A. Wibisono. Convergence of the inexact langevin algorithm and score-based generative models in kl divergence. arXiv e-prints, pages arXiv–2211, 2022.
- Q. Zhang and Y. Chen. Path integral sampler: a stochastic control approach for sampling. ICLR, 2022.