Optimization, Isoperimetric Inequalities, and Sampling via Lyapunov Potentials (2410.02979v4)
Abstract: In this paper, we prove that optimizability of any function F using Gradient Flow from all initializations implies a Poincar\'e Inequality for Gibbs measures mu_{beta} = e{-beta F}/Z at low temperature. In particular, under mild regularity assumptions on the convergence rate of Gradient Flow, we establish that mu_{beta} satisfies a Poincar\'e Inequality with constant O(C'+1/beta) for beta >= Omega(d), where C' is the Poincar\'e constant of mu_{beta} restricted to a neighborhood of the global minimizers of F. Under an additional mild condition on F, we show that mu_{beta} satisfies a Log-Sobolev Inequality with constant O(beta max(S, 1) max(C', 1)) where S denotes the second moment of mu_{beta}. Here asymptotic notation hides F-dependent parameters. At a high level, this establishes that optimizability via Gradient Flow from every initialization implies a Poincar\'e and Log-Sobolev Inequality for the low-temperature Gibbs measure, which in turn imply sampling from all initializations. Analogously, we establish that under the same assumptions, if F can be initialized from everywhere except some set S, then mu_{beta} satisfies a Weak Poincar\'e Inequality with parameters (O(C'+1/beta), O(mu_{beta}(S))) for \beta = Omega(d). At a high level, this shows while optimizability from 'most' initializations implies a Weak Poincar\'e Inequality, which in turn implies sampling from suitable warm starts. Our regularity assumptions are mild and as a consequence, we show we can efficiently sample from several new natural and interesting classes of non-log-concave densities, an important setting with relatively few examples. As another corollary, we obtain efficient discrete-time sampling results for log-concave measures satisfying milder regularity conditions than smoothness, similar to Lehec (2023).
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98):1–76, 2021.
- How to escape sharp minima with random perturbations. Forty-first International Conference on Machine Learning, 2024.
- A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning, pages 242–252. PMLR, 2019.
- Faster high-accuracy log-concave sampling via algorithmic warm starts. Journal of the ACM, 71(3):1–55, 2024.
- Diffusions hypercontractives. In Séminaire de Probabilités XIX 1983/84: Proceedings, pages 177–206. Springer, 2006.
- A simple proof of the poincaré inequality for a large class of probability measures. Electronic Communications in Probability [electronic only], 13:60–66, 2008.
- Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014.
- A very simple proof of the lsi for high temperature spin systems. Journal of Functional Analysis, 276(8):2582–2588, 2019.
- Local and global linear convergence of general low-rank matrix recovery problems. Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10129–10137, 2022.
- Michel Bonnefont. Poincaré inequality with explicit constant in dimension d≥1𝑑1d\geq 1italic_d ≥ 1. https://www.math.u-bordeaux.fr/ mibonnef/Poincare__Toulouse.pdf, 2022.
- Convex optimization. Cambridge university press, 2004.
- On extensions of the brunn-minkowski and prékopa-leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of functional analysis, 22(4):366–389, 1976.
- Lqr through the lens of first order methods: Discrete-time case. arXiv preprint arXiv:1907.08921, 2019.
- Sébastien Bubeck et al. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.
- A note on talagrand’s transportation inequality and logarithmic sobolev inequality. Probability theory and related fields, 148:285–304, 2010.
- Langevin dynamics: A unified perspective on optimization via lyapunov potentials. arXiv preprint arXiv:2407.04264, 2024.
- Ting Chen. On the importance of noise scheduling for diffusion models. arXiv preprint arXiv:2301.10972, 2023.
- Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Mathematical Programming, 176:5–37, 2019.
- Fast conditional mixing of mcmc algorithms for non-log-concave distributions. Advances in Neural Information Processing Systems, 36, 2024.
- Sinho Chewi. Log-concave sampling. Book draft available at https://chewisinho. github.io, 2024.
- Analysis of langevin monte carlo from poincare to log-sobolev. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1–2. PMLR, 02–05 Jul 2022. URL https://proceedings.mlr.press/v178/chewi22a.html.
- Diffusion for global optimization in r^n. SIAM Journal on Control and Optimization, 25(3):737–753, 1987.
- Label noise sgd provably prefers flat global minimizers. Advances in Neural Information Processing Systems, 34:27449–27461, 2021.
- Utilising the clt structure in stochastic gradient based sampling: Improved analysis and faster algorithms. In The Thirty Sixth Annual Conference on Learning Theory, pages 4072–4129. PMLR, 2023.
- From gradient flow on population loss to learning with stochastic gradient descent. Advances in Neural Information Processing Systems, 35:30963–30976, 2022.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Gradient descent finds global minima of deep neural networks. In International conference on machine learning, pages 1675–1685. PMLR, 2019.
- Bounds on the covariance matrix of the sherrington–kirkpatrick model. Electronic Communications in Probability, 29:1–13, 2024.
- Optimizing static linear feedback: Gradient method. SIAM Journal on Control and Optimization, 59(5):3887–3911, 2021.
- Escaping from saddle points—online stochastic gradient for tensor decomposition. In Conference on learning theory, pages 797–842. PMLR, 2015.
- Faster sampling via stochastic gradient proximal sampler. Forty-first International Conference on Machine Learning, 2024.
- How to escape saddle points efficiently. In International conference on machine learning, pages 1724–1732. PMLR, 2017.
- Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I 16, pages 795–811. Springer, 2016.
- Improved convergence rate of stochastic gradient langevin dynamics with variance reduction and its application to optimization. Advances in Neural Information Processing Systems, 35:19022–19034, 2022.
- Krzysztof Kurdyka. On gradients of functions definable in o-minimal structures. Annales de l’institut Fourier, 48(3):769–783, 1998.
- Structured logconcave sampling with a restricted gaussian oracle. In Conference on Learning Theory, pages 2993–3050. PMLR, 2021.
- Riemannian langevin algorithm for solving semidefinite programs. Bernoulli, 29(4):3093–3113, 2023.
- Stanislaw Lojasiewicz. A topological property of real analytic subsets. Coll. du CNRS, Les équations aux dérivées partielles, 117(87-89):2, 1963.
- Taming under isoperimetry. arXiv preprint arXiv:2311.09003, 2023.
- Sampling can be faster than optimization. Proceedings of the National Academy of Sciences, 116(42):20881–20885, 2019.
- On the global convergence rates of softmax policy gradient methods. In International Conference on Machine Learning, pages 6820–6829. PMLR, 2020.
- Leveraging non-uniformity in first-order non-convex optimization. In International Conference on Machine Learning, pages 7555–7564. PMLR, 2021.
- Improved bounds for discretization of langevin diffusions: Near-optimal rates without convexity. Bernoulli, 28(3):1577–1601, 2022.
- Towards a complete analysis of langevin monte carlo: Beyond poincaré inequality. In The Thirty Sixth Annual Conference on Learning Theory, pages 1–35. PMLR, 2023.
- Yurii Nesterov et al. Lectures on convex optimization, volume 137. Springer, 2018.
- An optimal poincaré inequality for convex domains. Archive for Rational Mechanics and Analysis, 5(1):286–292, 1960.
- Boris T Polyak. Gradient methods for the minimisation of functionals. USSR Computational Mathematics and Mathematical Physics, 3(4):864–878, 1963.
- Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis. In Conference on Learning Theory, pages 1674–1703. PMLR, 2017.
- Yang Song. Generative modeling by estimating gradients of the data distribution. https://yang-song.net/blog/2021/score/, 2021.
- Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
- Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
- Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations, 2021.
- Smoothness, low noise and fast rates. Advances in neural information processing systems, 23, 2010.
- Auxiliary gradient-based sampling algorithms. Journal of the Royal Statistical Society Series B: Statistical Methodology, 80(4):749–767, 2018.
- Rapid convergence of the unadjusted langevin algorithm: Isoperimetry suffices. Advances in neural information processing systems, 32, 2019.
- Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Global convergence of langevin dynamics based algorithms for nonconvex optimization. Advances in Neural Information Processing Systems, 31, 2018.
- A general sample complexity analysis of vanilla policy gradient. In International Conference on Artificial Intelligence and Statistics, pages 3332–3380. PMLR, 2022.
- Global convergence in deep learning with variable splitting via the kurdyka-łojasiewicz property. arXiv preprint arXiv:1803.00225, 9, 2018.
- Trained transformers learn linear models in-context. Journal of Machine Learning Research, 25(49):1–55, 2024.
- Faster convergence of stochastic gradient langevin dynamics for non-log-concave sampling. In Uncertainty in Artificial Intelligence, pages 1152–1162. PMLR, 2021.