A Separation in Heavy-Tailed Sampling: Gaussian vs. Stable Oracles for Proximal Samplers (2405.16736v1)
Abstract: We study the complexity of heavy-tailed sampling and present a separation result in terms of obtaining high-accuracy versus low-accuracy guarantees i.e., samplers that require only $O(\log(1/\varepsilon))$ versus $\Omega(\text{poly}(1/\varepsilon))$ iterations to output a sample which is $\varepsilon$-close to the target in $\chi2$-divergence. Our results are presented for proximal samplers that are based on Gaussian versus stable oracles. We show that proximal samplers based on the Gaussian oracle have a fundamental barrier in that they necessarily achieve only low-accuracy guarantees when sampling from a class of heavy-tailed targets. In contrast, proximal samplers based on the stable oracle exhibit high-accuracy guarantees, thereby overcoming the aforementioned limitation. We also prove lower bounds for samplers under the stable oracle and show that our upper bounds cannot be fundamentally improved.
- Comparison of Markov chains via weak Poincaré inequalities with application to pseudo-marginal MCMC. The Annals of Statistics, 50(6):3592–3618, 2022.
- Weak Poincaré Inequalities for Markov chains: Theory and Applications. arXiv preprint arXiv:2312.11689, 2023.
- D. Applebaum. Lévy processes and stochastic calculus. Cambridge university press, 2009.
- Towards a theory of non-log-concave sampling: First-order stationarity guarantees for Langevin Monte Carlo. In Conference on Learning Theory, pages 2896–2923. PMLR, 2022.
- M.-F. F. Balcan and H. Zhang. Sample and computationally efficient learning algorithms under s𝑠sitalic_s-concave distributions. Advances in Neural Information Processing Systems, 30, 2017.
- Ergodicity of the zigzag process. The Annals of Applied Probability, 29(4):2266–2301, 2019.
- Time-dependent Schrödinger perturbations of transition densities. Studia Mathematica, 189(3):235–254, 2008.
- On explicit L2subscript𝐿2{L}_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-convergence rate estimate for underdamped Langevin dynamics. Archive for Rational Mechanics and Analysis, 247(5):90, 2023.
- E. A. Carlen and W. Gangbo. Constrained steepest descent in the 2-Wasserstein metric. Annals of mathematics, pages 807–846, 2003.
- D. Chafaï. Entropies, convexity, and functional inequalities, on ΦΦ{\Phi}roman_Φ-entropies and ΦΦ\Phiroman_Φ-sobolev inequalities. Journal of Mathematics of Kyoto University, 44(2):325–363, 2004.
- Sampling s-concave functions: The limit of convexity based isoperimetry. In International Workshop on Approximation Algorithms for Combinatorial Optimization, pages 420–433. Springer, 2009.
- Oracle lower bounds for stochastic gradient sampling algorithms. Bernoulli, 28(2):1074–1092, 2022.
- Y. Chen and K. Gatmiry. A Simple Proof of the Mixing of Metropolis-Adjusted Langevin Algorithm under Smoothness and Isoperimetry. arXiv preprint arXiv:2304.04095, 2023.
- Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, pages 2984–3014. PMLR, 2022.
- Underdamped Langevin MCMC: A non-asymptotic analysis. In Conference on learning theory, pages 300–323. PMLR, 2018.
- Optimal dimension dependence of the Metropolis-Adjusted Langevin Algorithm. In Conference on Learning Theory, pages 1260–1300. PMLR, 2021.
- Fisher information lower bounds for sampling. arXiv preprint arXiv:2210.02482, 2022a.
- The query complexity of sampling from strongly log-concave distributions in one dimension. In Proceedings of Thirty Fifth Conference on Learning Theory, volume 178, pages 2041–2059. PMLR, 2022b.
- A. S. Dalalyan and L. Riou-Durand. On sampling from a log-concave density using kinetic Langevin diffusions. Bernoulli, 26(3):1956–1988, 2020.
- Exponential ergodicity of the Bouncy Particle Sampler. Annals of Statistics, 47(3), 2019.
- Learning halfspaces with Massart noise under structured distributions. In Conference on Learning Theory, pages 1486–1513. PMLR, 2020.
- A. Durmus and É. Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. The Annals of Applied Probability, 27(3):1551–1587, June 2017.
- Geometric ergodicity of the Bouncy Particle Sampler. Annals of applied probability, 30(5):2069–2098, 2020.
- Log-concave sampling: Metropolis-Hastings algorithms are fast. Journal of Machine Learning Research, 20(183):1–42, 2019.
- Couplings and quantitative contraction rates for Langevin dynamics. The Annals of Probability, 47(4):1982–2010, 2019.
- M. Erbar. Gradient flows of the entropy for jump processes. In Annales de l’IHP Probabilités et statistiques, volume 50, pages 920–945, 2014.
- M. A. Erdogdu and R. Hosseinzadeh. On the convergence of Langevin Monte Carlo: The interplay between tail growth and smoothness. In Conference on Learning Theory, pages 1776–1822. PMLR, 2021.
- Global non-convex optimization with discretized diffusions. Advances in Neural Information Processing Systems, 31, 2018.
- Convergence of Langevin Monte Carlo in chi-squared and Rényi divergence. In International Conference on Artificial Intelligence and Statistics, pages 8151–8175. PMLR, 2022.
- Improved dimension dependence of a proximal algorithm for sampling. arXiv preprint arXiv:2302.10081, 2023.
- Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pages 579–586, 2020.
- A weakly informative default prior distribution for logistic and other regression models. The annals of applied statistics, 2(4):1360–1383, 2008.
- A. Genz and F. Bretz. Computation of multivariate normal and t-probabilities, volume 195. Springer Science & Business Media, 2009.
- Approximations to multivariate t𝑡titalic_t integrals with application to multiple comparison procedures. In Recent Developments in Multiple Comparison Procedures, pages 24–32. Institute of Mathematical Statistics, 2004.
- On the use of Cauchy prior distributions for Bayesian logistic regression. Bayesian Analysis, 13(2):359–383, 2018.
- Algorithmic aspects of the log-Laplace transform and a non-Euclidean proximal sampler. arXiv preprint arXiv:2302.06085, 2023.
- M. Hairer. Convergence of Markov processes. Lecture notes, 2010.
- On the ergodicity, bias and asymptotic normality of randomized midpoint sampling method. Advances in Neural Information Processing Systems, 33:7366–7376, 2020.
- An analysis of Transformed Unadjusted Langevin Algorithm for Heavy-tailed Sampling. IEEE Transactions on Information Theory, 2024a.
- Mean-square analysis of discretized Itô diffusions for heavy-tailed sampling. Journal of Machine Learning Research (to appear), 2024b.
- Approximation of heavy-tailed distributions via stable-driven SDEs. Bernoulli, 27(3):2040–2068, 2021.
- S. Jarner and G. Roberts. Convergence of heavy-tailed Monte Carlo Markov Chain algorithms. Scandinavian Journal of Statistics, 34(4):781–815, 2007.
- Variable transformation to obtain geometric ergodicity in the Random-Walk Metropolis algorithm. The Annals of Statistics, 40(6):3050–3076, 2012.
- K. Kamatani. Efficient strategy for the Markov chain Monte Carlo in high-dimension with heavy-tailed target probability distribution. Bernoulli, 24(4B):3711–3750, 2018.
- S. Kotz and S. Nadarajah. Multivariate t-distributions and their applications. Cambridge University Press, 2004.
- M. Kwaśnicki. Ten equivalent definitions of the fractional Laplace operator. Fractional Calculus and Applied Analysis, 20(1):7–51, 2017.
- Logsmooth gradient concentration and tighter runtimes for Metropolized Hamiltonian Monte Carlo. In Conference on learning theory, pages 2565–2597. PMLR, 2020.
- Lower bounds on Metropolized sampling methods for well-conditioned distributions. Advances in Neural Information Processing Systems, 34:18812–18824, 2021a.
- Structured logconcave sampling with a Restricted Gaussian Oracle. In Conference on Learning Theory, pages 2993–3050. PMLR, 2021b.
- Sqrt(d) Dimension Dependence of Langevin Monte Carlo. In The International Conference on Learning Representations, 2022.
- Stochastic Runge-Kutta accelerates Langevin Monte Carlo and beyond. Advances in neural information processing systems, 32, 2019.
- Improved bounds for discretization of Langevin diffusions: Near-optimal rates without convexity. Bernoulli, 28(3):1577–1601, 2022.
- Towards a Complete Analysis of Langevin Monte Carlo: Beyond Poincaré Inequality. In Proceedings of Thirty Sixth Conference on Learning Theory, volume 195, pages 1–35, 2023.
- Non-asymptotic analysis of Fractional Langevin Monte Carlo for non-convex optimization. In International Conference on Machine Learning, pages 4810–4819, 2019.
- J. P. Nolan. Univariate stable distributions. Springer, 2020.
- An adaptive ensemble filter for heavy-tailed distributions: Tuning-free inflation and localization. arXiv preprint arXiv:2310.08741, 2023.
- D. Qi and A. J. Majda. Predicting fat-tailed intermittent probability distributions in passive scalar turbulence with imperfect models through empirical information theory. Communications in Mathematical Sciences, 14(6):1687–1722, 2016.
- Non-convex learning via stochastic gradient Langevin dynamics: A nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017.
- P. D. Sardeshmukh and C. Penland. Understanding the distinctively skewed and heavy tailed character of atmospheric and oceanic probability distributions. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(3), 2015.
- R. Shen and Y. T. Lee. The randomized midpoint method for log-concave sampling. Advances in Neural Information Processing Systems, 32, 2019.
- Fractional underdamped Langevin dynamics: Retargeting SGD with momentum under heavy-tailed gradient noise. In International Conference on Machine Learning, pages 8970–8980, 2020.
- F.-Y. Wang and J. Wang. Functional inequalities for stable-like Dirichlet forms. Journal of Theoretical Probability, 28(2):423–448, 2015.
- A. Wibisono. Sampling as optimization in the space of measures: The Langevin dynamics as a composite optimization problem. In Conference on Learning Theory, pages 2093–3027. PMLR, 2018.
- Minimax Mixing Time of the Metropolis-Adjusted Langevin Algorithm for Log-Concave Sampling. Journal of Machine Learning Research, 23(270):1–63, 2022a.
- Minimax mixing time of the Metropolis-adjusted Langevin algorithm for log-concave sampling. The Journal of Machine Learning Research, 23(1):12348–12410, 2022b.
- Stereographic Markov Chain Monte Carlo. arXiv preprint arXiv:2205.12112, 2022.
- X. Zhang and X. Zhang. Ergodicity of supercritical SDEs driven by α𝛼\alphaitalic_α-stable processes and heavy-tailed sampling. Bernoulli, 29(3):1933–1958, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.