On the Trajectory Regularity of ODE-based Diffusion Sampling (2405.11326v1)
Abstract: Diffusion-based generative models use stochastic differential equations (SDEs) and their equivalent ordinary differential equations (ODEs) to establish a smooth connection between a complex data distribution and a tractable prior distribution. In this paper, we identify several intriguing trajectory properties in the ODE-based sampling process of diffusion models. We characterize an implicit denoising trajectory and discuss its vital role in forming the coupled sampling trajectory with a strong shape regularity, regardless of the generated content. We also describe a dynamic programming-based scheme to make the time schedule in sampling better fit the underlying trajectory structure. This simple strategy requires minimal modification to any given ODE-based numerical solvers and incurs negligible computational cost, while delivering superior performance in image generation, especially in $5\sim 10$ function evaluations.
- Anderson, B. D. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
- Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, 2022.
- Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems, pp. 899–907, 2013.
- Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22563–22575, 2023.
- Carreira-Perpinán, M. A. A review of mean-shift algorithms for clustering. arXiv preprint arXiv:1503.00687, 2015.
- A geometric perspective on diffusion models. arXiv preprint arXiv:2305.19947, 2023a.
- Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. In International Conference on Machine Learning, pp. 4672–4712, 2023b.
- Restoration-degradation beyond linear diffusions: A non-asymptotic analysis for ddim-type samplers. In International Conference on Machine Learning, pp. 4462–4484, 2023c.
- Cheng, Y. Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995.
- Mean shift analysis and applications. In Proceedings of the International Conference on Computer Vision, pp. 1197–1203, 1999.
- Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603–619, 2002.
- Real-time tracking of non-rigid objects using mean shift. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition, pp. 142–149, 2000.
- Kernel-based object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(5):564–577, 2003.
- Introduction to algorithms. MIT press, 2022.
- De Bortoli, V. Convergence of denoising diffusion models under the manifold hypothesis. Transactions on Machine Learning Research, 2022.
- Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, pp. 8780–8794, 2021.
- Genie: Higher-order denoising diffusion solvers. In Advances in Neural Information Processing Systems, pp. 30150–30166, 2022.
- Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
- Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
- Feller, W. On the theory of stochastic processes, with particular reference to applications. In Proceedings of the First Berkeley Symposium on Mathematical Statistics and Probability, pp. 403–432, 1949.
- The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21(1):32–40, 1975.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pp. 6626–6637, 2017.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pp. 6840–6851, 2020.
- Video diffusion models. In Advances in Neural Information Processing Systems, pp. 8633–8646, 2022.
- Hyvärinen, A. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6:695–709, 2005.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4401–4410, 2019.
- Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, pp. 26565–26577, 2022.
- Variational diffusion models. In Advances in Neural Information Processing Systems, pp. 21696–21707, 2021.
- Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2021.
- Learning multiple layers of features from tiny images. Technical Report, 2009.
- Diffusion models already have a semantic latent space. In International Conference on Learning Representations, 2023.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pp. 946–985, 2023.
- Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, pp. 740–755, 2014.
- Pseudo numerical methods for diffusion models on manifolds. In International Conference on Learning Representations, 2022.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, pp. 5775–5787, 2022a.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
- Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
- Lyu, S. Interpretation and generalization of score matching. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 359–366, 2009.
- Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp. 8162–8171, 2021.
- Oksendal, B. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013.
- Pidstrigach, J. Score-based generative models detect manifolds. In Advances in Neural Information Processing Systems, pp. 35852–35865, 2022.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. In International Conference on Learning Representations, 2024.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Least squares estimation without priors or supervision. Neural Computation, 23(2):374–420, 2011.
- Robbins, H. E. An empirical bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1956.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510, 2023.
- Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015.
- Align your steps: Optimizing sampling schedules in diffusion models. arXiv preprint arXiv:2404.14507, 2024.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, pp. 36479–36494, 2022.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
- Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.
- Fast global kernel density mode seeking with application to localisation and tracking. In Proceedings of the International Conference on Computer Vision, pp. 1516–1523, 2005.
- Silverman, B. W. Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society: Series B (Methodological), 43(1):97–99, 1981.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
- Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp. 11895–11907, 2019.
- Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, pp. 12438–12448, 2020.
- Maximum likelihood training of score-based diffusion models. In Advances in Neural Information Processing Systems, pp. 1415–1428, 2021b.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021c.
- Consistency models. In International Conference on Machine learning, pp. 32211–32252, 2023.
- Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems, pp. 11287–11302, 2021.
- Vershynin, R. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge university press, 2018.
- Vincent, P. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011.
- Extracting and composing robust features with denoising autoencoders. In International Conference on Machine learning, pp. 1096–1103, 2008.
- Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021.
- Whitney, H. Differentiable manifolds. Annals of Mathematics, pp. 645–680, 1936.
- Stable target field for reduced variance score estimation in diffusion models. In International Conference on Learning Representations, 2023.
- Properties of mean shift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(9):2273–2286, 2020.
- University physics, volume 9. Addison-wesley Reading, MA, 1996.
- Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
- Physdiff: Physics-guided human motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16010–16021, 2023.
- Fast sampling of diffusion models with exponential integrator. In International Conference on Learning Representations, 2023.
- Improved order analysis and design of exponential integrator for diffusion models sampling. arXiv preprint arXiv:2308.02157, 2023.
- Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. In Advances in Neural Information Processing Systems, pp. 49842–49869, 2023.
- Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, pp. 42390–42402, 2023.
- Fast ode-based sampling for diffusion models in around 5 steps. arXiv preprint arXiv:2312.00094, 2023.