Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Diffusions (2306.09332v3)

Published 15 Jun 2023 in cs.LG and cs.DS

Abstract: Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost'' can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincar\'e or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincar\'e constant. In this paper, we show a close connection between the mixing time of a broad class of Markov processes with generator $\mathcal{L}$ and an appropriately chosen generalized score matching loss that tries to fit $\frac{\mathcal{O} p}{p}$. This allows us to adapt techniques to speed up Markov chains to construct better score-matching losses. In particular, preconditioning'' the diffusion can be translated to an appropriatepreconditioning'' of the score loss. Lifting the chain by adding a temperature like in simulated tempering can be shown to result in a Gaussian-convolution annealed score matching loss, similar to Song and Ermon, 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter of the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Nonlinear bayesian estimation using gaussian sum approximations. IEEE transactions on automatic control, 17(4):439–448, 1972.
  2. Diffusions hypercontractives. In Séminaire de Probabilités XIX 1983/84: Proceedings, pages 177–206. Springer, 2006.
  3. Robustly learning mixtures of k arbitrary gaussians. In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, pages 1234–1247, 2022.
  4. Statistical guarantees for the em algorithm: From population to sample-based analysis. 2017.
  5. Minimum stein discrepancy estimators. Advances in Neural Information Processing Systems, 32, 2019.
  6. Mario Bebendorf. A note on the poincaré inequality for convex domains. Zeitschrift für Analysis und ihre Anwendungen, 22(4):751–756, 2003.
  7. Polynomial learning of distribution families. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 103–112. IEEE, 2010.
  8. Convex optimization. Cambridge university press, 2004.
  9. On explicit l 2-convergence rate estimate for underdamped langevin dynamics. Archive for Rational Mechanics and Analysis, 247(5):90, 2023.
  10. Dimension-free log-sobolev inequalities for mixture distributions. Journal of Functional Analysis, 281(11):109236, 2021.
  11. Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. arXiv preprint arXiv:2211.01916, 2022.
  12. Optimal convergence rate of hamiltonian monte carlo for strongly logconcave distributions. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
  13. G Constantine and T Savits. A multivariate faa di bruno formula with applications. Transactions of the American Mathematical Society, 348(2):503–520, 1996.
  14. Sanjoy Dasgupta. Learning mixtures of gaussians. In 40th Annual Symposium on Foundations of Computer Science (Cat. No. 99CB37039), pages 634–644. IEEE, 1999.
  15. Ten steps of em suffice for mixtures of two gaussians. In Conference on Learning Theory, pages 704–710. PMLR, 2017.
  16. Geometric bounds for eigenvalues of markov chains. The annals of applied probability, pages 36–61, 1991.
  17. Score-based generative modeling with critically-damped langevin diffusion. arXiv preprint arXiv:2112.07068, 2021.
  18. Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23):3910–3916, 2005.
  19. Simulated tempering langevin monte carlo ii: An improved proof using soft markov chain decomposition. arXiv preprint arXiv:1812.00793, 2018.
  20. Riemann manifold langevin and hamiltonian monte carlo methods. Journal of the Royal Statistical Society Series B: Statistical Methodology, 73(2):123–214, 2011.
  21. A two-scale approach to logarithmic sobolev inequalities and the hydrodynamic limit. In Annales de l’IHP Probabilités et statistiques, volume 45, pages 302–351, 2009.
  22. Björn Holmquist. The d-variate vector hermite polynomial of order k. Linear algebra and its applications, 237:155–190, 1996.
  23. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1021–1034, 2018.
  24. Exchange monte carlo method and application to spin glass simulations. Journal of the Physical Society of Japan, 65(6):1604–1608, 1996.
  25. Aapo Hyvärinen. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
  26. Score function features for discriminative learning: Matrix and tensor framework. arXiv preprint arXiv:1412.2863, 2014.
  27. Statistical efficiency of score matching: The view from isoperimetry. arXiv preprint arXiv:2210.00726, 2022.
  28. Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering langevin monte carlo. Advances in neural information processing systems, 31, 2018.
  29. Convergence for score-based generative modeling with polynomial complexity. arXiv preprint arXiv:2206.06227, 2022.
  30. Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023.
  31. Tony Lelièvre. A general two-scale criteria for logarithmic sobolev inequalities. Journal of Functional Analysis, 256(7):2211–2221, 2009.
  32. Preconditioned stochastic gradient langevin dynamics for deep neural networks. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  33. Siwei Lyu. Interpretation and generalization of score matching. arXiv preprint arXiv:1205.2629, 2012.
  34. A complete recipe for stochastic gradient mcmc. Advances in neural information processing systems, 28, 2015.
  35. Markov chain decomposition for convergence rate analysis. Annals of Applied Probability, pages 581–606, 2002.
  36. Simulated tempering: a new monte carlo scheme. Europhysics letters, 19(6):451, 1992.
  37. Estimating high order gradients of the data distribution by denoising. Advances in Neural Information Processing Systems, 34:25359–25369, 2021.
  38. Fast convergence for langevin diffusion with manifold structure. arXiv preprint arXiv:2002.05576, 2020.
  39. Settling the polynomial learnability of mixtures of gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE, 2010.
  40. Radford M Neal. Sampling from multimodal distributions using tempered transitions. Statistics and computing, 6:353–366, 1996.
  41. A new criterion for the logarithmic sobolev inequality and two applications. Journal of Functional Analysis, 243(1):121–157, 2007.
  42. Provable benefits of score matching. arXiv preprint arXiv:2306.01993, 2023.
  43. Diffusions, Markov processes and martingales: Volume 2, Itô calculus, volume 2. Cambridge university press, 2000.
  44. Yasumasa Saisho. Stochastic differential equations for multi-dimensional domain with reflecting boundary. Probability Theory and Related Fields, 74(3):455–477, 1987.
  45. Learning mixtures of arbitrary gaussians. In Proceedings of the thirty-third annual ACM symposium on Theory of computing, pages 247–257, 2001.
  46. Stochastic quasi-newton langevin monte carlo. In International Conference on Machine Learning, pages 642–651. PMLR, 2016.
  47. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  48. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  49. Replica monte carlo simulation of spin-glasses. Physical review letters, 57(21):2607, 1986.
  50. Henry Teicher. Identifiability of finite mixtures. The annals of Mathematical statistics, pages 1265–1269, 1963.
  51. Alexis Akira Toda. Operator reverse monotonicity of the inverse. The American Mathematical Monthly, 118(1):82–83, 2011.
  52. Aad W Van der Vaart. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  53. Sufficient conditions for torpid mixing of parallel and simulated tempering. 2009a.
  54. Conditions for rapid mixing of parallel and simulated tempering on multimodal distributions. 2009b.
  55. On the identifiability of finite mixtures. The Annals of Mathematical Statistics, 39(1):209–214, 1968.
  56. Sequential markov chain monte carlo. arXiv preprint arXiv:1308.3861, 2013.
Citations (2)

Summary

We haven't generated a summary for this paper yet.