Penalized Overdamped and Underdamped Langevin Monte Carlo Algorithms for Constrained Sampling (2212.00570v2)
Abstract: We consider the constrained sampling problem where the goal is to sample from a target distribution $\pi(x)\propto e{-f(x)}$ when $x$ is constrained to lie on a convex body $\mathcal{C}$. Motivated by penalty methods from continuous optimization, we propose penalized Langevin Dynamics (PLD) and penalized underdamped Langevin Monte Carlo (PULMC) methods that convert the constrained sampling problem into an unconstrained sampling problem by introducing a penalty function for constraint violations. When $f$ is smooth and gradients are available, we get $\tilde{\mathcal{O}}(d/\varepsilon{10})$ iteration complexity for PLD to sample the target up to an $\varepsilon$-error where the error is measured in the TV distance and $\tilde{\mathcal{O}}(\cdot)$ hides logarithmic factors. For PULMC, we improve the result to $\tilde{\mathcal{O}}(\sqrt{d}/\varepsilon{7})$ when the Hessian of $f$ is Lipschitz and the boundary of $\mathcal{C}$ is sufficiently smooth. To our knowledge, these are the first convergence results for underdamped Langevin Monte Carlo methods in the constrained sampling that handle non-convex $f$ and provide guarantees with the best dimension dependency among existing methods with deterministic gradient. If unbiased stochastic estimates of the gradient of $f$ are available, we propose PSGLD and PSGULMC methods that can handle stochastic gradients and are scaleable to large datasets without requiring Metropolis-Hasting correction steps. For PSGLD and PSGULMC, when $f$ is strongly convex and smooth, we obtain $\tilde{\mathcal{O}}(d/\varepsilon{18})$ and $\tilde{\mathcal{O}}(d\sqrt{d}/\varepsilon{39})$ iteration complexity in W2 distance. When $f$ is smooth and can be non-convex, we provide finite-time performance bounds and iteration complexity results. Finally, we illustrate the performance on Bayesian LASSO regression and Bayesian constrained deep learning problems.
- K. Ahn and S. Chewi. Efficient constrained sampling via the mirror-Langevin algorithm. In Advances in Neural Information Processing Systems (NeurIPS), volume 34, 2021.
- An introduction to MCMC for machine learning. Machine Learning, 50(1):5–43, 2003.
- M. Assran and M. Rabbat. On the convergence of Nesterov’s accelerated gradient method in stochastic settings. In Proceedings of the 37th International Conference on Machine Learning, pages 410–420. PMLR, 2020.
- A universally optimal multistage accelerated stochastic gradient method. In Advances in Neural Information Processing Systems, volume 32, 2019.
- Analysis and Geometry of Markov Diffusion Operators. Springer, Cham, 2014.
- About the Lipschitz property of the metric projection in the Hilbert space. Journal of Mathematical Analysis and Applications, 394(2):545–551, 2012.
- Towards a theory of non-log-concave sampling: First-order stationarity guarantees for Langevin Monte Carlo. In Conference on Learning Theory, pages 2896–2923. PMLR, 2022.
- On Markov chain Monte Carlo methods for tall data. Journal of Machine Learning Research, 18(47):1–43, 2017.
- On stochastic gradient Langevin dynamics with dependent data streams in the logconcave case. Bernoulli, 27(1):1–33, 2021.
- Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
- N. Bleistein and R. A. Handelsman. Asymptotic Expansions of Integrals. Dover, New York, 2010.
- F. Bolley and C. Villani. Weighted Csiszár-Kullback-Pinsker inequalities and applications to transportation inequalities. Annales-Faculté des sciences Toulouse Mathematiques, 14(3):331–352, 2005.
- L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186. Springer, 2010.
- Handbook of Markov Chain Monte Carlo. Chapman & Hall/CRC Press, 2011.
- Sampling from a log-concave distribution with compact support with proximal Langevin Monte Carlo. In Proceedings of the 2017 Conference on Learning Theory, volume 65 of Proceedings of Machine Learning Research, pages 319–342. PMLR, 2017.
- J. Browien and A. Lewis. Convex Analysis and Nonlinear Optimization: Theory and Examples. CMS Books in Mathematics. Springer, New York, 2nd edition, 2005.
- S. Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231–357, 2015.
- Finite-time analysis of projected Langevin Monte Carlo. In Advances in Neural Information Processing Systems, volume 28, 2015.
- Sampling from a log-concave distribution with projected Langevin Monte Carlo. Discrete & Computational Geometry, 59(4):757–783, 2018.
- On explicit L2superscript𝐿2L^{2}italic_L start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT-convergence rate estimate for underdamped Langevin dynamics. Archive for Rational Mechanics and Analysis, 247(90):1–34, 2023.
- Bayesian linear regression with sparse priors. Annals of Statistics, 43(5):1986–2018, 2015.
- Truncated log-concave sampling for convex bodies with Reflective Hamiltonian Monte Carlo. ACM Transactions on Mathematical Software, 49(2):1–25, 2023.
- On stochastic gradient Langevin dynamics with dependent data streams: The fully nonconvex case. SIAM Journal on Mathematics of Data Science, 3(3):959–986, 2021.
- On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In Advances in Neural Information Processing Systems (NIPS), pages 2278–2286, 2015.
- Bridging the gap between stochastic gradient MCMC and stochastic optimization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1051–1060, 2016.
- Stochastic gradient Hamiltonian Monte Carlo. In International Conference on Machine Learning, pages 1683–1691, 2014.
- Improved analysis for a proximal algorithm for sampling. In Conference on Learning Theory, volume 178, pages 2984–3014. PMLR, 2022.
- Z. Chen and S. S. Vempala. Optimal convergence rate of Hamiltonian Monte Carlo for strongly logconcave distributions. Theory of Computing, 18(1):1–18, 2022.
- X. Cheng and P. L. Bartlett. Convergence of Langevin MCMC in KL-divergence. In Proceedings of the 29th International Conference on Algorithmic Learning Theory (ALT), pages 186–211, 2018.
- Sharp convergence rates for Langevin dynamics in the nonconvex setting. arXiv:1805.01648, 2018.
- Underdamped Langevin MCMC: A non-asymptotic analysis. In Proceedings of the 31st Conference on Learning Theory, pages 300–323. PMLR, 2018.
- Exponential ergodicity of mirror-Langevin diffusions. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
- Diffusion for global optimization in ℝnsuperscriptℝ𝑛\mathbb{R}^{n}blackboard_R start_POSTSUPERSCRIPT italic_n end_POSTSUPERSCRIPT. SIAM Journal on Control and Optimization, 25(3):737–753, 1987.
- Fast and accurate deep network learning by exponential linear units (ELUs). In International Conference on Learning Representations, 2016.
- A. S. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log-concave densities. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):651–676, 2017a.
- A. S. Dalalyan. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Conference on Learning Theory, volume 65, pages 678–689. PMLR, 2017b.
- User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes and their Applications, 129(12):5278–5311, 2019.
- A. S. Dalalyan and L. Riou-Durand. On sampling from a log-concave density using kinetic Langevin diffusions. Bernoulli, 26(3):1956–1988, 2020.
- Mathematics for Machine Learning. Cambridge University Press, 2020.
- A. Durmus and E. Moulines. Non-asymptotic convergence analysis for the Unadjusted Langevin Algorithm. Annals of Applied Probability, 27(3):1551–1587, 2017.
- A. Durmus and E. Moulines. High-dimensional Bayesian inference via the Unadjusted Langevin Algorithm. Bernoulli, 25(4A):2854–2882, 2019.
- Efficient Bayesian computation by proximal Markov Chain Monte Carlo: When Langevin meets Moreau. SIAM Journal on Imaging Sciences, 11(1):473–506, 2018.
- Couplings and quantitative contraction rates for Langevin dynamics. Annals of Probability, 47(4):1982–2010, 2019.
- K. Fan. Note on circular disks containing the eigenvalues of a matrix. Duke Mathematical Journal, 25(3):441–445, 1958.
- H. Federer. Curvature measures. Transactions of the American Mathematical Society, 93(3):418–491, 1959.
- Breaking reversibility accelerates Langevin dynamics for global non-convex optimization. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
- Global convergence of Stochastic Gradient Hamiltonian Monte Carlo for non-convex stochastic optimization: Non-asymptotic performance bounds and momentum-based acceleration. Operations Research, 70(5):2931–2947, 2022.
- K. Gatmiry and S. S. Vempala. Convergence of the Riemannian Langevin algorithm. arXiv:2204.10818, 2022.
- Simulated annealing type algorithms for multivariate optimization. Algorithmica, 6(1):419–436, 1991.
- Bayesian Data Analysis. Chapman & Hall/CRC Press, 1995.
- C. J. Geyer. Practical Markov Chain Monte Carlo. Statistical Science, 7(4):473–483, 1992.
- On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002.
- M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2):123–214, 2011.
- Regularization for deep learning. In Deep Learning. MIT Press, 2016.
- Decentralized stochastic gradient Langevin dynamics and Hamiltonian Monte Carlo. Journal of Machine Learning Research, 22:1–69, 2021.
- A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. Journal of Optimization Theory and Applications, 194(3):1014–1041, 2022.
- C. Hans. Bayesian Lasso regression. Biometrika, 96(4):835–845, 2009.
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction, volume 2. Springer, 2009.
- F. Hérau and F. Nier. Isotropic hypoellipticity and trend to equilibrium for the Fokker-Planck equation with a high-degree potential. Archive for Rational Mechanics and Analysis, 171(2):151–218, 2004.
- Logarithmic Sobolev inequalities and stochastic Ising models. Journal of Statistical Physics, 46:1159–1194, 1987.
- Asymptotics of the spectral gap with applications to the theory of simulated annealing. Journal of Functional Analysis, 83(2):333–347, 1989.
- Mirrored Langevin dynamics. In Advances in Neural Information Processing Systems, volume 31, 2018.
- Non-convex stochastic optimization via nonreversible stochastic gradient Langevin dynamics. arXiv:2004.02823, 2020.
- A. D. Ioffe. Metric regularity–a survey part 1. theory. Journal of the Australian Mathematical Society, 101:188–243, 2016a.
- A. D. Ioffe. Metric regularity—a survey part ii. applications. Journal of the Australian Mathematical Society, 101(3):376–417, 2016b.
- Accelerating stochastic gradient descent for least squares regression. In Conference on Learning Theory, pages 545–604. PMLR, 2018.
- Q. Jiang. Mirror Langevin Monte Carlo: the case under isoperimetry. In Advances in Neural Information Processing Systems, volume 34, pages 715–725, 2021.
- O. Kallenberg. Foundations of Modern Probability. Springer, New York, 2nd edition, 2002.
- A. Karagulyan and A. Dalalyan. Penalized Langevin dynamics with vanishing penalty for smooth and log-concave targets. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
- Sampling with Riemannian Hamiltonian Monte Carlo in a constrained space. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, 2022.
- A. Lamperski. Projected stochastic gradient Langevin algorithms for constrained sampling and non-convex learning. In Proceedings of The 34th Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 1–47. PMLR, 2021.
- S. Lan and B. Shahbaba. Sampling constrained probability distributions using spherical augmentation. In H. Q. Minh and V. Murino, editors, Algorithmic Advances in Riemannian Geometry and Applications: For Machine Learning, Computer Vision, Statistics, and Optimization, pages 25–71. Springer International Publishing, Cham, 2016.
- J. Lehec. The Langevin Monte Carlo algorithm in the non-smooth log-concave case. Annals of Applied Probability, 33(6A):4858–4874, 2023.
- G. Leobacher and A. Steinicke. Existence, uniqueness and regularity of the projection onto differentiable manifolds. Annals of Global Analysis and Geometry, 60(3):559–587, 2021.
- The mirror Langevin algorithm converges with vanishing bias. In S. Dasgupta and N. Haghtalab, editors, Proceedings of The 33rd International Conference on Algorithmic Learning Theory, volume 167, pages 718–742. PMLR, 2022a.
- Sqrt(d) dimension dependence of Langevin Monte Carlo. In Internatonal Conference on Learning Representations, 2022b.
- Stochastic Runge-Kutta accelerates Langevin Monte Carlo and beyond. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019.
- L. Lovász and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms, 30(3):307–358, 2007.
- Regression and classification using extreme learning machine based on L1subscript𝐿1L_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-norm and L2subscript𝐿2L_{2}italic_L start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT-norm. Neurocomputing, 174(Part A):179–186, 2016.
- Transformed ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT regularization for learning sparse deep neural networks. Neural Networks, 119:286–298, 2019a.
- Sampling can be faster than optimization. Proceedings of the National Academy of Sciences, 116(24):20881–20885, 2019b.
- Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli, 27(3):1942–1992, 2021.
- O. Mangoubi and A. Smith. Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions: Continuous dynamics. Annals of Applied Probability, 31(5):2019 – 2045, 2021.
- Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise. Stochastic Processes and their Applications, 101(2):185–232, 2002.
- Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course, volume 87. Springer Science & Business Media, 2013.
- J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, second edition, 2006.
- Bayesian multivariate logistic regression. Biometrics, 60(3):739–746, 2004.
- S. Patterson and Y. W. Teh. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In Advances in Neural Information Processing Systems (NIPS) 26, pages 3102–3110, 2013.
- G. A. Pavliotis. Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck and Langevin Equations, volume 60 of Texts in Applied Mathematics. Springer, New York, 2014.
- Non-convex learning via stochastic gradient Langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, volume 65, pages 1674–1703. PMLR, 2017.
- Another proof that convex functions are locally Lipschitz. The American Mathematical Monthly, 81(9):1014–1016, 1974.
- R. T. Rockafellar. Convex Analysis, volume 18. Princeton University Press, 1970.
- Double-loop Unadjusted Langevin Algorithm. In International Conference on Machine Learning, volume 119, pages 8169–8177. PMLR, 2020.
- A. Salim and P. Richtárik. Primal dual interpretation of the proximal stochastic gradient Langevin algorithm. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
- Convergence error analysis of reflected gradient Langevin dynamics for globally optimizing non-convex constrained problems. arXiv preprint arXiv:2203.10215, 2022.
- M. Schmidt. Least squares optimization with L1-norm regularization. CS542B Project Report, 504:195–221, 2005.
- Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929–1958, 2014.
- A. M. Stuart. Inverse problems: A Bayesian perspective. Acta Numerica, 19:451–559, 2010.
- D. Talay and L. Tubaro. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Analysis and Applications, 8(4):483–509, 1990.
- Consistency and fluctuations for stochastic gradient Langevin dynamics. Journal of Machine Learning Research, 17(1):193–225, 2016.
- A. C. Thompson. Minkowski Geometry. Cambridge University Press, 1996.
- J.-P. Vial. Strong convexity of sets and functions. Journal of Mathematical Economics, 9(1-2):187–205, 1982.
- C. Villani. Optimal Transport: Old and New. Springer, Berlin, 2009.
- Fast convergence of Langevin dynamics on manifold: Geodesics meet log-Sobolev. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020.
- M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 681–688. PMLR, 2011.
- Global convergence of Langevin dynamics based algorithms for nonconvex optimization. In Advances in Neural Information Processing Systems, volume 31, pages 3122–3133, 2018.
- Wasserstein control of mirror Langevin Monte Carlo. In Conference on Learning Theory, volume 125, pages 3814–3841. PMLR, 2020.
- Nonasymptotic estimates for Stochastic Gradient Langevin Dynamics under local conditions in nonconvex optimization. Applied Mathematics and Optimization, 87(25):1–41, 2023.
- Y. Zheng and A. Lamperski. Constrained Langevin algorithms with L-mixing external random variables. In Advances in Neural Information Processing Systems (NeurIPS), volume 35, 2022.
- D. Zou and Q. Gu. On the convergence of Hamiltonian Monte Carlo with stochastic gradients. In International Conference on Machine Learning, volume 139, pages 13012–13022. PMLR, 2021.
- Faster convergence of stochastic gradient Langevin dynamics for non-log-concave sampling. In Uncertainty in Artificial Intelligence, volume 161, pages 1152–1162. PMLR, 2021.