Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities (2403.02004v2)
Abstract: We prove non-asymptotic error bounds for particle gradient descent (PGD)~(Kuntz et al., 2023), a recently introduced algorithm for maximum likelihood estimation of large latent variable models obtained by discretizing a gradient flow of the free energy. We begin by showing that, for models satisfying a condition generalizing both the log-Sobolev and the Polyak--{\L}ojasiewicz inequalities (LSI and P{\L}I, respectively), the flow converges exponentially fast to the set of minimizers of the free energy. We achieve this by extending a result well-known in the optimal transport literature (that the LSI implies the Talagrand inequality) and its counterpart in the optimization literature (that the P{\L}I implies the so-called quadratic growth condition), and applying it to our new setting. We also generalize the Bakry--\'Emery Theorem and show that the LSI/P{\L}I generalization holds for models with strongly concave log-likelihoods. For such models, we further control PGD's discretization error, obtaining non-asymptotic error bounds. While we are motivated by the study of PGD, we believe that the inequalities and results we extend may be of independent interest.
- “Interacting particle Langevin algorithm for maximum marginal likelihood estimation”, 2023
- Luigi Ambrosio, Nicola Gigli and Giuseppe Savaré “Gradient Flows: In Metric Spaces and in the Space of Probability Measures” Springer Science & Business Media, 2005
- Mihai Anitescu “Degenerate nonlinear programming with a quadratic growth condition” In SIAM Journal on Optimization 10, 2000, pp. 1116–1135
- “Diffusions hypercontractives” In Séminaire de Probabilités XIX 1983/84 Lecture Notes in Mathematics Springer, 1985
- Dominique Bakry, Ivan Gentil and Michel Ledoux “Analysis and Geometry of Markov Diffusion Operators” Springer, 2014
- Dmitri Burago, Yuri Burago and Sergei Ivanov “A Course in Metric Geometry” American Mathematical Society, 2001
- René Carmona “Lectures on BSDEs, Stochastic Control, and Stochastic Differential Games with Financial Applications” SIAM, 2016
- “Propagation of chaos: A review of models, methods and applications. I. Models and methods” In Kinetic and Related Models 15.6, 2022
- Rujian Chen “Approximate Bayesian Modeling with Embedded Gaussian Processes”, 2023
- “Convergence of Langevin MCMC in KL-divergence” In Proceedings of Algorithmic Learning Theory 83, 2018, pp. 186–211
- Sinho Chewi “Log-concave Sampling” Book draft, 2014 URL: https://chewisinho.github.io
- Arnak S. Dalalyan “Theoretical Guarantees for Approximate Sampling from Smooth and Log-Concave Densities” In Journal of the Royal Statistical Society Series B: Statistical Methodology 79.3, 2016, pp. 651–676
- Arthur P. Dempster, Nan M. Laird and Donald B. Rubin “Maximum likelihood from incomplete data via the EM Algorithm” In Journal of the Royal Statistical Society, Series B 39, 1977, pp. 2–38
- Steffen Dereich, Michael Scheutzow and Reik Schottstedt “Constructive quantization: Approximation by empirical measures” In Annales de l’Institut Henri Poincaré: Probabilités et statistiques 49, 2013, pp. 1183–1203
- Randal Douc, Éric Moulines and David Stoffer “Nonlinear Time Series: Theory, Methods and Applications with R Examples” CRC press, 2014
- “High-dimensional Bayesian inference via the unadjusted Langevin algorithm” In Bernoulli 25.4A, 2019, pp. 2854–2882
- “Nonasymptotic convergence analysis for the unadjusted Langevin algorithm” In Annals of Applied Probability 27, 2017, pp. 1551–1587
- “Gradient flows for empirical Bayes in high-dimensional linear models”, 2023
- Nicolas Fournier “Convergence of the empirical measure in expected Wasserstein distance: non-asymptotic explicit bounds in ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT” In ESAIM: Probability and Statistics 27, 2023, pp. 749–775
- Richard Jordan, David Kinderlehrer and Felix Otto “The variational formulation of the Fokker–Planck equation” In SIAM Journal on Mathematical Analysis 29, 1998, pp. 1–17
- Hamed Karimi, Julie Nutini and Mark Schmidt “Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition” In Machine Learning and Knowledge Discovery in Databases, 2016, pp. 795–811
- Diederik P. Kingma and Max Welling “An Introduction to Variational Autoencoders” In Foundations and Trends® in Machine Learning 12, 2019, pp. 307–392
- Juan Kuntz, Jen Ning Lim and Adam M. Johansen “Particle algorithms for maximum likelihood training of latent variable models” In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics 206, 2023, pp. 5134–5180
- “Momentum particle maximum likelihood”, 2023
- Stanislaw Łojasiewicz “Une propriété topologique des sous-ensembles analytiques réels” In Les équations aux dérivées partielles 117, 1963, pp. 87–89
- “Is there an analog of Nesterov acceleration for gradient-based MCMC?” In Bernoulli 27.3 Bernoulli Society for Mathematical StatisticsProbability, 2021, pp. 1942–1992
- Radford M. Neal and Geoffrey E. Hinton “A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants” In Learning in Graphical Models Springer Netherlands, 1998, pp. 355–368
- Yurii Nesterov “A method of solving a convex programming problem with convergence rate 𝒪(1k2)𝒪1superscript𝑘2\mathcal{O}\bigl{(}\frac{1}{k^{2}}\bigr{)}caligraphic_O ( divide start_ARG 1 end_ARG start_ARG italic_k start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG )” In Doklady Akademii Nauk 269.3, 1983, pp. 543–547 Russian Academy of Sciences
- Yurii Nesterov “Introductory Lectures on Convex Optimization: A Basic Course” Springer Science & Business Media, 2003
- Bernt Øksendal “Stochastic Differential Equations: An Introduction with Applications” Springer Science & Business Media, 2013
- “Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality” In Journal of Functional Analysis 173, 2000, pp. 361–400
- Boris T. Polyak “Gradient methods for the minimisation of functionals (in Russian)” In Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki 3, 1963, pp. 643–653
- Herbert Robbins “An empirical Bayes approach to statistics” In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability 3.1, 1956, pp. 157–164
- Filippo Santambrogio “Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling” Birkhäuser/Springer, 2015
- Louis Sharrock, Daniel Dodd and Christopher Nemeth “CoinEM: Tuning-free particle-based variational inference for latent variable models”, 2023
- Michel Talagrand “Transportation cost for Gaussian and other product measures” In Geometric & Functional Analysis 6 Springer, 1996, pp. 587–600
- N.García Trillos, Bamdad Hosseini and Daniel Sanz-Alonso “From Optimization to Sampling Through Gradient Flows” In Notices of the American Mathematical Society 70.6, 2023
- Cédric Villani “Optimal Transport: Old and New” Springer Science & Business Media, 2009
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.