Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities (2402.06121v2)

Published 9 Feb 2024 in cs.LG and stat.ML

Abstract: Efficiently generating statistically independent samples from an unnormalized probability distribution, such as equilibrium samples of many-body systems, is a foundational problem in science. In this paper, we propose Iterated Denoising Energy Matching (iDEM), an iterative algorithm that uses a novel stochastic score matching objective leveraging solely the energy function and its gradient -- and no data samples -- to train a diffusion-based sampler. Specifically, iDEM alternates between (I) sampling regions of high model density from a diffusion-based sampler and (II) using these samples in our stochastic matching objective to further improve the sampler. iDEM is scalable to high dimensions as the inner matching objective, is simulation-free, and requires no MCMC samples. Moreover, by leveraging the fast mode mixing behavior of diffusion, iDEM smooths out the energy landscape enabling efficient exploration and learning of an amortized sampler. We evaluate iDEM on a suite of tasks ranging from standard synthetic energy functions to invariant $n$-body particle systems. We show that the proposed approach achieves state-of-the-art performance on all metrics and trains $2-5\times$ faster, which allows it to be the first method to train using energy on the challenging $55$-particle Lennard-Jones system.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (81)
  1. Building normalizing flows with stochastic interpolants. International Conference on Learning Representations (ICLR), 2023.
  2. Flow-based generative models for markov chain monte carlo in lattice field theory. Physical Review D, 100(3):034515, 2019.
  3. Self-consuming generative models go mad. International Conference on Learning Representations (ICLR), 2023.
  4. An optimal control perspective on diffusion-based generative modeling. arXiv preprint arXiv:2211.01364, 2022.
  5. On the stability of iterative retraining of generative models on their own data. International Conference on Learning Representations (ICLR), 2023.
  6. Equivariant finite normalizing flows. arXiv preprint arXiv:2110.08649, 2021.
  7. SE(3)-stochastic flow matching for protein backbone generation. International Conference on Learning Representations (ICLR), 2024.
  8. EDGI: Equivariant diffusion for planning with embodied agents. Neural Information Processing Systems (NeurIPS), 2023a.
  9. Geometric algebra transformer. Neural Information Processing Systems (NeurIPS), 2023b.
  10. Adaptive importance sampling: The past, the present, and the future. IEEE Signal Processing Magazine, 34(4):60–79, 2017.
  11. Neural ordinary differential equations. Neural Information Processing Systems (NIPS), 2018.
  12. Dai Pra, P. A stochastic control approach to reciprocal diffusion processes. Applied mathematics and Optimization, 23:313–329, 1991.
  13. Diffusion schrödinger bridge with applications to score-based generative modeling. Neural Information Processing Systems (NeurIPS), 2021.
  14. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(3):411–436, 2006.
  15. Density estimation using Real NVP. International Conference on Learning Representations (ICLR), 2017.
  16. Score-based diffusion meets annealed importance sampling. Neural Information Processing Systems (NeurIPS), 2022.
  17. Importance nested sampling and the MultiNest algorithm. arXiv preprint arXiv:1306.2144, 2013.
  18. Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021. URL http://jmlr.org/papers/v22/20-451.html.
  19. E(n) equivariant normalizing flows. Neural Information Processing Systems (NeurIPS), 2021.
  20. MCMC variational inference via uncorrected Hamiltonian annealing. Neural Information Processing Systems (NeurIPS), 2021.
  21. Langevin diffusion variational inference. Artificial Intelligence and Statistics (AISTATS), 2023.
  22. Representations of knowledge in complex systems. Journal of the Royal Statistical Society: Series B (Methodological), 56(4):549–581, 1994.
  23. On sampling with approximate transport maps. arXiv preprint arXiv:2302.04763, 2023.
  24. Polychord: nested sampling for cosmology. Monthly Notices of the Royal Astronomical Society: Letters, 450(1):L61–L65, 2015.
  25. Hastings, W. K. Monte carlo sampling methods using markov chains and their applications. 1970.
  26. Denoising diffusion probabilistic models. Neural Information Processing Systems (NeurIPS), 2020.
  27. The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15(1):1593–1623, 2014.
  28. Equivariant diffusion for molecule generation in 3d. International Conference on Machine Learning (ICML), 2022.
  29. Reverse diffusion Monte Carlo. International Conference on Learning Representations (ICLR), 2024.
  30. Equivariant 3d-conditional diffusion models for molecular linker design. International Conference on Learning Representations (ICLR), 2022.
  31. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  32. Optimization by simulated annealing. science, 220(4598):671–680, 1983.
  33. Timewarp: Transferable acceleration of molecular dynamics by learning time-coarsened dynamics. Neural Information Processing Systems (NeurIPS), 2023a.
  34. Equivariant flow matching. Neural Information Processing Systems (NeurIPS), 2023b.
  35. Equivariant flows: exact likelihood generative learning for symmetric densities. International Conference on Machine Learning (ICML), 2020.
  36. Rigid body flows for sampling molecular crystal structures. International Conference on Machine Learning (ICML), 2023.
  37. A theory of continuous generative flow networks. International Conference on Machine Learning (ICML), 2023.
  38. Rational construction of stochastic numerical methods for molecular sampling. Applied Mathematics Research eXpress, 2013(1):34–56, 2013.
  39. Improving gradient-guided nested sampling for posterior inference. arXiv preprint arXiv:2312.03911, 2023.
  40. Neural network renormalization group. Physical review letters, 121(26):260601, 2018.
  41. Flow matching for generative modeling. International Conference on Learning Representations (ICLR), 2023.
  42. Liu, Q. Rectified flow: A marginal preserving approach to optimal transport. arXiv preprint arXiv:2209.14577, 2022.
  43. GFlowNets and variational inference. International Conference on Learning Representations (ICLR), 2023.
  44. Continual repeated annealed flow transport monte carlo. International Conference on Machine Learning (ICML), 2022.
  45. Equation of state calculations by fast computing machines. The journal of chemical physics, 21(6):1087–1092, 1953.
  46. SE(3) equivariant augmented coupling flows. Neural Information Processing Systems (NeurIPS), 2023a.
  47. Flow annealed importance sampling bootstrap. International Conference on Learning Representations (ICLR), 2023b.
  48. Neal, R. M. Annealed importance sampling. Statistics and computing, 11:125–139, 2001.
  49. Neal, R. M. Slice sampling. The annals of statistics, 31(3):705–767, 2003.
  50. Neal, R. M. et al. MCMC using Hamiltonian dynamics. Handbook of Markov chain Monte Carlo, 2(11):2, 2011.
  51. Asymptotically unbiased estimation of physical observables with neural samplers. Physical Review E, 101(2):023304, 2020.
  52. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
  53. Owen, A. B. Monte Carlo theory, methods and examples. https://artowen.su.domains/mc/, 2013.
  54. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(1):2617–2680, 2021.
  55. Pavon, M. Stochastic control and nonequilibrium thermodynamical systems. Applied Mathematics and Optimization, 19:187–202, 1989.
  56. Variational inference with normalizing flows. International Conference on Machine Learning (ICML), 2015.
  57. Improved sampling via learned diffusions. International Conference on Learning Representations (ICLR), 2024.
  58. Monte Carlo statistical methods, volume 2. Springer, 1999.
  59. Optimal scaling of discrete approximations to langevin diffusions. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1):255–268, 1998.
  60. Exponential convergence of langevin distributions and their discrete approximations. Bernoulli, pp.  341–363, 1996.
  61. E (n) equivariant graph neural networks. International Conference on Machine Learning (ICML), 2021.
  62. Skilling, J. Nested sampling for general Bayesian computation. 2006.
  63. Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning (ICML), 2015.
  64. Loss-guided diffusion models for plug-and-play controllable generation. International Conference on Machine Learning (ICML), 2023.
  65. Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations (ICLR), 2021.
  66. Speagle, J. S. DYNESTY: a dynamic nested sampling package for estimating Bayesian posteriors and evidences. Monthly Notices of the Royal Astronomical Society, 493(3):3132–3158, 2020.
  67. Monte Carlo variational auto-encoders. International Conference on Machine Learning (ICML), 2021.
  68. Tieleman, T. Training restricted boltzmann machines using approximations to the likelihood gradient. International Conference on Machine Learning (ICML), 2008.
  69. Improving and generalizing flow-based generative models with minibatch optimal transport. arXiv preprint arXiv:2302.00482, 2023.
  70. Neural stochastic differential equations: Deep latent Gaussian models in the diffusion limit. arXiv preprint arXiv:1905.09883, 2019a.
  71. Theoretical guarantees for sampling and inference in generative models with latent diffusions. Conference on Learning Theory (CoLT), 2019b.
  72. Denoising diffusion samplers. International Conference on Learning Representations (ICLR), 2023.
  73. Transport meets variational inference: Controlled Monte Carlo diffusions. International Conference on Learning Representations (ICLR), 2024.
  74. Vershynin, R. High-dimensional probability: An introduction with applications in data science. Cambridge university press, 2018.
  75. Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1–2):1–305, 2008.
  76. Stochastic normalizing flows. Neural Information Processing Systems (NeurIPS), 2020.
  77. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
  78. Fast protein backbone generation with SE(3) flow matching. arXiv preprint arXiv:2310.05297, 2023a.
  79. SE(3) diffusion model with application to protein backbone generation. International Conference on Machine Learning (ICML), 2023b.
  80. Unifying generative models with GFlowNets and beyond. arXiv preprint arXiv:2209.02606v2, 2023.
  81. Path integral sampler: a stochastic control approach for sampling. International Conference on Learning Representations (ICLR), 2022.
Citations (21)

Summary

  • The paper introduces a diffusion-based sampler that eliminates explicit MCMC sampling by leveraging a simulation-free stochastic score matching objective.
  • It employs an iterative process alternating between diffusion sampling and score matching to navigate high-dimensional energy landscapes.
  • Empirical tests on systems like the 55-particle Lennard-Jones potential demonstrate state-of-the-art performance with 2-5 times faster training times.

Iterated Denoising Energy Matching for Sampling from Boltzmann Densities: An Expert Analysis

The paper "Iterated Denoising Energy Matching for Sampling from Boltzmann Densities" presents a novel framework aimed at improving the generation of statistically independent samples from unnormalized density functions, a critical operation in many scientific domains. This research leverages Iterated Denoising Energy Matching (DEM), an algorithm that integrates denoising diffusion models into the sampling process, specifically targeting the energy landscapes encountered in high-dimensional systems.

Core Contributions

The paper introduces an innovative approach that systematically trains a diffusion-based sampler using only energy functions and their gradients, without relying on data samples. The primary method, DEM, operates by alternating between sampling from a diffusion-based model and optimizing the sampler through a proposed stochastic score matching objective. Notably, DEM forgoes the need for explicit Markov Chain Monte Carlo (MCMC) samples during training, thus reducing computational overheads commonly associated with traditional sampling methods.

Technical Approach

The DEM algorithm leverages principles from denoising diffusion probabilistic models (DDPMs), modified to address the challenges associated with Boltzmann distributions. It utilizes a two-step iterative process:

  1. Sampling via Diffusion Models: DEM employs a diffusion-based approach to sample regions of high model density. This involves running an SDE that noisily perturbs samples along the energy landscape, effectively navigating the high-dimensional space.
  2. Stochastic Score Matching Objective: The innovations of this work lie in the proposed objective function that guides the sampler's refinement. This objective is entirely simulation-free, depending solely on the given energy function, thereby eliminating the extensive need for sample data traditionally required in similar methods.

The result is a reduction in computational complexity and an increase in efficiency, particularly because DEM enables the smooth exploration of complex energy landscapes without direct simulation—addressing a prominent limitation of previous neural sampling techniques.

Empirical Evaluation

The authors conducted comprehensive empirical evaluations on tasks ranging from synthetic energy functions to invariant nn-body particle systems, specifically examining systems like the $55$-particle Lennard-Jones potential. The results are noteworthy, showing that DEM not only achieves state-of-the-art performance metrics but also does so with training times that are 2-5 times faster than existing methods.

Implications and Future Directions

The implications of DEM are substantial for both theoretical and practical applications. Theoretically, DEM advances the understanding of diffusion models in probabilistic inference, particularly in contexts that lack predefined datasets. Practically, its ability to efficiently sample high-dimensional probability distributions positions it as a potentially vital tool across various scientific and engineering disciplines, such as molecular simulation and statistical physics.

Future developments could explore enhancements in the algorithm's robustness and scalability, particularly as scientific endeavors continue to grow in complexity and dimensionality. Integration with adaptive variance reduction techniques and advanced SDE solvers could further ameliorate the algorithm's applicability to real-world, large-scale systems.

Conclusion

The research presented in this paper marks a significant step forward in the field of probabilistic sampling by seamlessly integrating sophisticated machine learning techniques with the domain-specific requirements of Boltzmann distributions. The introduction of a simulation-free approach to sampling stands to influence future research directions and practical implementations profoundly, advocating for a broader adoption of diffusion-based methodologies in high-stakes computations. Through this work, the authors have demonstrated not only technical innovation but also a keen awareness of the computational demands facing modern scientific research.