Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Trajectory Regularity of ODE-based Diffusion Sampling (2405.11326v1)

Published 18 May 2024 in cs.LG and cs.CV

Abstract: Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in $5\sim 10$ function evaluations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Anderson, B. D. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
  2. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  3. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, 2022.
  4. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems, pp. 899–907, 2013.
  5. Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22563–22575, 2023.
  6. Carreira-Perpinán, M. A. A review of mean-shift algorithms for clustering. arXiv preprint arXiv:1503.00687, 2015.
  7. A geometric perspective on diffusion models. arXiv preprint arXiv:2305.19947, 2023a.
  8. Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. In International Conference on Machine Learning, pp. 4672–4712, 2023b.
  9. Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for ddim-type samplers. In International Conference on Machine Learning, pp. 4462–4484, 2023c.
  10. Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995.
  11. Mean shift analysis and applications. In Proceedings of the International Conference on Computer Vision, pp.  1197–1203, 1999.
  12. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002.
  13. Real-time tracking of non-rigid objects using mean shift. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp.  142–149, 2000.
  14. Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):564–577, 2003.
  15. Introduction to algorithms. MIT press, 2022.
  16. De Bortoli, V. Convergence of denoising diffusion models under the manifold hypothesis. Transactions on Machine Learning Research, 2022.
  17. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, pp. 8780–8794, 2021.
  18. Genie: Higher-order denoising diffusion solvers. In Advances in Neural Information Processing Systems, pp. 30150–30166, 2022.
  19. Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
  20. Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
  21. Feller, W. On the theory of stochastic processes, with particular reference to applications. In Proceedings of the First Berkeley Symposium on Mathematical Statistics and Probability, pp.  403–432, 1949.
  22. The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1):32–40, 1975.
  23. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
  24. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pp. 6840–6851, 2020.
  25. Video diffusion models. In Advances in Neural Information Processing Systems, pp. 8633–8646, 2022.
  26. Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695–709, 2005.
  27. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  4401–4410, 2019.
  28. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, pp. 26565–26577, 2022.
  29. Variational diffusion models. In Advances in Neural Information Processing Systems, pp. 21696–21707, 2021.
  30. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2021.
  31. Learning multiple layers of features from tiny images. Technical Report, 2009.
  32. Diffusion models already have a semantic latent space. In International Conference on Learning Representations, 2023.
  33. Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pp.  946–985, 2023.
  34. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, pp.  740–755, 2014.
  35. Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2022.
  36. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, pp. 5775–5787, 2022a.
  37. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
  38. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  39. Lyu, S. Interpretation and generalization of score matching. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp.  359–366, 2009.
  40. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171, 2021.
  41. Oksendal, B. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
  42. Pidstrigach, J. Score-based generative models detect manifolds. In Advances in Neural Information Processing Systems, pp. 35852–35865, 2022.
  43. Sdxl: Improving latent diffusion models for high-resolution image synthesis. In International Conference on Learning Representations, 2024.
  44. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  45. Least squares estimation without priors or supervision. Neural Computation, 23(2):374–420, 2011.
  46. Robbins, H. E. An empirical bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956.
  47. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  48. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  22500–22510, 2023.
  49. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
  50. Align your steps: Optimizing sampling schedules in diffusion models. arXiv preprint arXiv:2404.14507, 2024.
  51. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, pp. 36479–36494, 2022.
  52. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
  53. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
  54. Fast global kernel density mode seeking with application to localisation and tracking. In Proceedings of the International Conference on Computer Vision, pp.  1516–1523, 2005.
  55. Silverman, B. W. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B (Methodological), 43(1):97–99, 1981.
  56. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265, 2015.
  57. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
  58. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp. 11895–11907, 2019.
  59. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, pp. 12438–12448, 2020.
  60. Maximum likelihood training of score-based diffusion models. In Advances in Neural Information Processing Systems, pp. 1415–1428, 2021b.
  61. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021c.
  62. Consistency models. In International Conference on Machine learning, pp. 32211–32252, 2023.
  63. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems, pp. 11287–11302, 2021.
  64. Vershynin, R. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
  65. Vincent, P. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
  66. Extracting and composing robust features with denoising autoencoders. In International Conference on Machine learning, pp. 1096–1103, 2008.
  67. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021.
  68. Whitney, H. Differentiable manifolds. Annals of Mathematics, pp.  645–680, 1936.
  69. Stable target field for reduced variance score estimation in diffusion models. In International Conference on Learning Representations, 2023.
  70. Properties of mean shift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(9):2273–2286, 2020.
  71. University physics, volume 9. Addison-wesley Reading, MA, 1996.
  72. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  73. Physdiff: Physics-guided human motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  16010–16021, 2023.
  74. Fast sampling of diffusion models with exponential integrator. In International Conference on Learning Representations, 2023.
  75. Improved order analysis and design of exponential integrator for diffusion models sampling. arXiv preprint arXiv:2308.02157, 2023.
  76. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. In Advances in Neural Information Processing Systems, pp. 49842–49869, 2023.
  77. Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, pp. 42390–42402, 2023.
  78. Fast ode-based sampling for diffusion models in around 5 steps. arXiv preprint arXiv:2312.00094, 2023.
Citations (7)

Summary

We haven't generated a summary for this paper yet.