Accelerating Diffusion Sampling with Optimized Time Steps (2402.17376v3)
Abstract: Diffusion probabilistic models (DPMs) have shown remarkable performance in high-resolution image synthesis, but their sampling efficiency is still to be desired due to the typically large number of sampling steps. Recent advancements in high-order numerical ODE solvers for DPMs have enabled the generation of high-quality images with much fewer sampling steps. While this is a significant development, most sampling methods still employ uniform time steps, which is not optimal when using a small number of steps. To address this issue, we propose a general framework for designing an optimization problem that seeks more appropriate time steps for a specific numerical ODE solver for DPMs. This optimization problem aims to minimize the distance between the ground-truth solution to the ODE and an approximate solution corresponding to the numerical solver. It can be efficiently solved using the constrained trust region method, taking less than $15$ seconds. Our extensive experiments on both unconditional and conditional sampling using pixel- and latent-space DPMs demonstrate that, when combined with the state-of-the-art sampling method UniPC, our optimized time steps significantly improve image generation performance in terms of FID scores for datasets such as CIFAR-10 and ImageNet, compared to using uniform time steps.
- Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313–326, 1982.
- Analytic-DPM: An analytic estimate of the optimal reverse variance in diffusion probabilistic models. In International Conference on Learning Representations, 2022.
- Align your latents: High-resolution video synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
- Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In International Conference on Machine Learning, pages 4735–4763. PMLR, 2023a.
- Pixart-α𝛼\alphaitalic_α: Fast training of diffusion transformer for photorealistic text-to-image synthesis, 2023b.
- Score approximation, estimation and distribution recovery of diffusion models on low-dimensional data. arXiv preprint arXiv:2302.07194, 2023c.
- The probability flow ode is provably fast. arXiv preprint arXiv:2305.11798, 2023d.
- Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. In International Conference on Learning Representations, 2023e.
- Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. Transactions on Machine Learning Research, 2022.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, 2009.
- Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, pages 8780–8794, 2021.
- Fast diffusion probabilistic model sampling through the lens of backward error analysis. arXiv preprint arXiv:2304.11446, 2023.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, pages 2672–2680, 2014.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
- Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851, 2020.
- Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47):1–33, 2022a.
- Video diffusion models. In Advances in Neural Information Processing Systems, 2022b.
- Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080, 2021.
- Elucidating the design space of diffusion-based generative models. In Proc. NeurIPS, 2022.
- Soft truncation: A universal training technique of score-based diffusion model for high precision score estimation. In ICML, pages 11201–11228. PMLR, 2022.
- Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.
- Variational diffusion models. In Advances in Neural Information Processing Systems, 2021.
- The CIFAR-10 Dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 55, 2014.
- Bddm: Bilateral denoising diffusion models for fast and high-quality speech synthesis. In International Conference on Learning Representations, 2022.
- Convergence for score-based generative modeling with polynomial complexity. Advances in Neural Information Processing Systems, 35:22870–22882, 2022.
- Convergence of score-based generative modeling for general data distributions. In International Conference on Algorithmic Learning Theory, pages 946–985. PMLR, 2023.
- Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7105–7114, 2023a.
- Scire-solver: Accelerating diffusion models sampling by score-integrand solver with recursive difference. 2023b.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. arXiv preprint arXiv:2306.08860, 2023.
- Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, pages 5775–5787, 2022.
- Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, 2023.
- Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
- Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models, 2023.
- On distillation of guided diffusion models. In NeurIPS 2022 Workshop on Score-Based Methods, 2022.
- GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML), 2022.
- Improved convergence of score-based diffusion models via prediction-correction. arXiv preprint arXiv:2305.14164, 2023.
- Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
- Hierarchical text-conditional image generation with clip latents, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
- Transcormer: Transformer for sentence scoring with sliding language modeling. In Advances in Neural Information Processing Systems, 2022.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
- Consistency models. arXiv preprint arXiv:2303.01469, 2023.
- Learning to schedule in diffusion probabilistic models. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2478–2488, 2023a.
- Diffusion-GAN: Training GANs with diffusion. In The Eleventh International Conference on Learning Representations, 2023b.
- Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2022.
- Towards more accurate diffusion model acceleration with a timestep aligner. arXiv preprint arXiv:2310.09469, 2023.
- Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022.
- Sa-solver: Stochastic adams solver for fast sampling of diffusion models, 2023.
- Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2023.
- Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. arXiv preprint arXiv:2302.04867, 2023.