Papers
Topics
Authors
Recent
Search
2000 character limit reached

Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models

Published 2 Mar 2024 in cs.LG, cs.AI, and cs.CV | (2403.01329v1)

Abstract: This paper introduces Bespoke Non-Stationary (BNS) Solvers, a solver distillation approach to improve sample efficiency of Diffusion and Flow models. BNS solvers are based on a family of non-stationary solvers that provably subsumes existing numerical ODE solvers and consequently demonstrate considerable improvement in sample approximation (PSNR) over these baselines. Compared to model distillation, BNS solvers benefit from a tiny parameter space ($<$200 parameters), fast optimization (two orders of magnitude faster), maintain diversity of samples, and in contrast to previous solver distillation approaches nearly close the gap from standard distillation methods such as Progressive Distillation in the low-medium NFE regime. For example, BNS solver achieves 45 PSNR / 1.76 FID using 16 NFE in class-conditional ImageNet-64. We experimented with BNS solvers for conditional image generation, text-to-image generation, and text-2-audio generation showing significant improvement in sample approximation (PSNR) in all.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Imagenet website. https://www.image-net.org/.
  2. Building normalizing flows with stochastic interpolants, 2022.
  3. Common voice: A massively-multilingual speech corpus. In International Conference on Language Resources and Evaluation, 2019.
  4. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE Journal of Selected Topics in Signal Processing, 16(6):1505–1518, 2022.
  5. A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv preprint arXiv:1707.08819, 2017.
  6. Fisher English training speech parts 1 and 2 LDC200{4,5}S13. Web Download. Linguistic Data Consortium, Philadelphia, 2004,2005.
  7. The spotify podcast dataset. arXiv preprint arXiv:2004.04270, 2020.
  8. Imagenet: A large-scale hierarchical image database. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  9. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  10. Optimal linear subspace search: Learning to construct fast and high-quality schedulers for diffusion models. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23. ACM, October 2023a. doi: 10.1145/3583780.3614999. URL http://dx.doi.org/10.1145/3583780.3614999.
  11. Optimal linear subspace search: Learning to construct fast and high-quality schedulers for diffusion models. arXiv preprint arXiv:2305.14677, 2023b.
  12. High fidelity neural audio compression, 2022.
  13. Switchboard: Telephone speech corpus for research and development. In Acoustics, Speech, and Signal Processing, IEEE International Conference on, volume 1, pp.  517–520. IEEE Computer Society, 1992.
  14. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  15. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  16. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  17. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp.  8867–8887. PMLR, 2022.
  18. Iserles, A. A first course in the numerical analysis of differential equations. Number 44. Cambridge university press, 2009.
  19. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
  20. Audiocaps: Generating captions for audios in the wild. In NAACL-HLT, 2019.
  21. Variational diffusion models. Advances in neural information processing systems, 34:21696–21707, 2021.
  22. Adam: A method for stochastic optimization, 2017.
  23. Pick-a-pic: An open dataset of user preferences for text-to-image generation, 2023.
  24. Learning multiple layers of features from tiny images. In University of Toronto, Canada, 2009.
  25. Microsoft coco: Common objects in context, 2015.
  26. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747, 2022.
  27. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022.
  28. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022a.
  29. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
  30. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models, 2023.
  31. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  32. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14297–14306, 2023.
  33. Expresso: A benchmark and analysis of discrete expressive speech resynthesis. arXiv preprint arXiv:2308.05725, 2023.
  34. Numerical optimization. Springer, 1999.
  35. Librispeech: An asr corpus based on public domain audio books. International Conference on Acoustics, Speech and Signal Processing, 2015.
  36. Training-free linear image inversion via flows. arXiv preprint arXiv:2310.04432, 2023.
  37. Robust speech recognition via large-scale weak supervision. ArXiv, abs/2212.04356, 2022.
  38. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  39. Hierarchical text-conditional image generation with clip latents, 2022.
  40. High-resolution image synthesis with latent diffusion models, 2021.
  41. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  42. Shampine, L. F. Some practical runge-kutta formulas. Mathematics of computation, 46(173):135–150, 1986.
  43. Bespoke solvers for generative flow models, 2023.
  44. Make-a-video: Text-to-video generation without text-video data, 2022.
  45. Denoising diffusion implicit models, 2022.
  46. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  47. Audiobox: Unified audio generation with natural language prompts, 2023.
  48. Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2021.
  49. Mosaic-sdf for 3d generative models, 2023.
  50. One-step diffusion with distribution matching distillation, 2023.
  51. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
  52. Fast sampling of diffusion models with exponential integrator, 2023.
  53. Guided flows for generative modeling and decision making. arXiv preprint arXiv:2311.13443, 2023.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.