Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Complete Recipe for Diffusion Generative Models (2303.01748v2)

Published 3 Mar 2023 in cs.LG, cs.CV, and stat.ML

Abstract: Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
  2. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  3. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  4. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, a.
  5. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  6. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23(47):1–33, 2022a.
  7. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  8. Hierarchical text-conditional image generation with clip latents, 2022a. URL https://arxiv.org/abs/2204.06125.
  9. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems.
  10. Diffusion probabilistic modeling for video generation, 2022a. URL https://arxiv.org/abs/2203.09481.
  11. Video diffusion models. In Advances in Neural Information Processing Systems.
  12. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  13. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5826–5835, 2021.
  14. Score-based generative modeling with critically-damped langevin diffusion. In International Conference on Learning Representations, a.
  15. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, page 681–688, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
  16. Stochastic gradient hamiltonian monte carlo. In International conference on machine learning, pages 1683–1691. PMLR, 2014.
  17. A complete recipe for stochastic gradient mcmc. Advances in neural information processing systems, 28, 2015.
  18. Alex Krizhevsky. Learning multiple layers of features from tiny images. pages 32–33, 2009. URL https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf.
  19. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  20. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  21. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  22. Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982. ISSN 0304-4149. doi: https://doi.org/10.1016/0304-4149(82)90051-5. URL https://www.sciencedirect.com/science/article/pii/0304414982900515.
  23. Pascal Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011. doi: 10.1162/NECO_a_00142.
  24. Maximum likelihood training of score-based diffusion models. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=AklttWFnxS9.
  25. Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, may 2011. doi: 10.1201/b10905. URL https://doi.org/10.1201%2Fb10905.
  26. L Yin and P Ao. Existence and construction of dynamical potential in nonequilibrium processes without detailed balance. Journal of Physics A: Mathematical and General, 39(27):8593, jun 2006. doi: 10.1088/0305-4470/39/27/003. URL https://dx.doi.org/10.1088/0305-4470/39/27/003.
  27. Where to diffuse, how to diffuse, and how to get back: Automated learning for multivariate diffusions. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=osei3IzUia.
  28. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020.
  29. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
  30. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  31. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems.
  32. Numerical Solution of Stochastic Differential Equations. Springer Berlin Heidelberg, 1992. doi: 10.1007/978-3-662-12616-5. URL https://doi.org/10.1007/978-3-662-12616-5.
  33. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8188–8197, 2020.
  34. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  35. Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents. Transactions on Machine Learning Research.
  36. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
  37. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  38. Flow matching for generative modeling. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=PqvMRDCJT9t.
  39. Denoising diffusion implicit models. In International Conference on Learning Representations, b.
  40. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  41. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations.
  42. Non gaussian denoising diffusion models. arXiv preprint arXiv:2106.07582, 2021.
  43. A family of embedded runge-kutta formulae. Journal of Computational and Applied Mathematics, 6(1):19–26, 1980. ISSN 0377-0427. doi: https://doi.org/10.1016/0771-050X(80)90013-3. URL https://www.sciencedirect.com/science/article/pii/0771050X80900133.
  44. Subspace diffusion generative models. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIII, pages 274–289. Springer, 2022.
  45. Adversarial score matching and improved sampling for image generation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=eLfqMl3z3lq.
  46. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations.
  47. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  48. Wavegrad: Estimating gradients for waveform generation. In International Conference on Learning Representations.
  49. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  50. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022b.
  51. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022.
  52. Diffusion probabilistic modeling for video generation. arXiv preprint arXiv:2203.09481, 2022b.
  53. Scaling autoregressive models for content-rich text-to-image generation. Transactions on Machine Learning Research.
  54. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022b.
  55. B. Leimkuhler. Molecular dynamics : with deterministic and stochastic numerical methods / Ben Leimkuhler, Charles Matthews. Interdisciplinary applied mathematics, 39. Springer, Cham, 2015. ISBN 3319163744.
  56. D2c: Diffusion-decoding models for few-shot conditional generation. Advances in Neural Information Processing Systems, 34:12533–12548, 2021.
  57. Nvae: A deep hierarchical variational autoencoder. Advances in neural information processing systems, 33:19667–19679, 2020.
  58. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  59. Analytic-DPM: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=0xiJLKH-ufZ.
  60. On fast sampling of diffusion probabilistic models. In ICML Workshop on Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models.
  61. Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Loek7hfb46P.
  62. gddim: Generalized denoising diffusion implicit models. arXiv preprint arXiv:2206.05564, 2022.
  63. Genie: Higher-order denoising diffusion solvers. In Advances in Neural Information Processing Systems, b.
  64. On distillation of guided diffusion models. In NeurIPS 2022 Workshop on Score-Based Methods.
  65. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  66. A flexible diffusion model, 2022.
  67. Microcanonical langevin monte carlo. arXiv preprint arXiv:2303.18221, 2023.
  68. Hamiltonian dynamics with non-newtonian momentum for rapid sampling. Advances in Neural Information Processing Systems, 34:11012–11025, 2021.
  69. H. F. Trotter. On the product of semi-groups of operators. Proceedings of the American Mathematical Society, 10(4):545–551, 1959. doi: 10.1090/s0002-9939-1959-0108732-6. URL https://doi.org/10.1090/s0002-9939-1959-0108732-6.
  70. Gilbert Strang. On the construction and comparison of difference schemes. SIAM Journal on Numerical Analysis, 5(3):506–517, September 1968. doi: 10.1137/0705041. URL https://doi.org/10.1137/0705041.
  71. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  72. Richard Zhang. Making convolutional networks shift-invariant again. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7324–7334. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/zhang19a.html.
  73. Ricky T. Q. Chen. torchdiffeq, 2018. URL https://github.com/rtqichen/torchdiffeq.
  74. High-fidelity performance metrics for generative models in pytorch, 2020. URL https://github.com/toshas/torch-fidelity. Version: 0.3.0, DOI: 10.5281/zenodo.4957738.
Citations (7)

Summary

We haven't generated a summary for this paper yet.