Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

One-Step Diffusion Distillation via Deep Equilibrium Models (2401.08639v1)

Published 12 Dec 2023 in cs.CV and cs.LG

Abstract: Diffusion models excel at producing high-quality samples but naively require hundreds of iterations, prompting multiple attempts to distill the generation process into a faster network. However, many existing approaches suffer from a variety of challenges: the process for distillation training can be complex, often requiring multiple training stages, and the resulting models perform poorly when utilized in single-step generative applications. In this paper, we introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image. Of particular importance to our approach is to leverage a new Deep Equilibrium (DEQ) model as the distilled architecture: the Generative Equilibrium Transformer (GET). Our method enables fully offline training with just noise/image pairs from the diffusion model while achieving superior performance compared to existing one-step methods on comparable training budgets. We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5\times$ larger ViT in terms of FID scores while striking a critical balance of computational cost and image quality. Code, checkpoints, and datasets are available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (111)
  1. Donald G. Anderson. Iterative procedures for nonlinear integral equations. Journal of the ACM (JACM), 12(4):547–560, October 1965.
  2. Path independent equilibrium models can better exploit test-time computation. Advances in Neural Information Processing Systems, 35:7796–7809, 2022.
  3. Computer methods for ordinary differential equations and differential-algebraic equations, volume 61. Siam, 1998.
  4. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  5. Deep equilibrium models. In Neural Information Processing Systems (NeurIPS), 2019.
  6. Multiscale deep equilibrium models. Advances in Neural Information Processing Systems, 33:5238–5250, 2020.
  7. Stabilizing Equilibrium Models by Jacobian Regularization. In International Conference on Machine Learning (ICML), 2021.
  8. Deep equilibrium optical flow estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 620–630, 2022.
  9. Estimating the optimal covariance with imperfect mean in diffusion probabilistic models. arXiv preprint arXiv:2206.07309, 2022a.
  10. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503, 2022b.
  11. Tract: Denoising diffusion models with transitive closure time-distillation, 2023.
  12. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  13. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  14. Charles G Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations. Mathematics of computation, 19(92):577–593, 1965.
  15. Emerging properties in self-supervised vision transformers. In IEEE International Conference on Computer Vision (ICCV), 2021.
  16. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  17. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174, 2016.
  18. Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv:2210.01776, 2022.
  19. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
  20. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  21. BERT: Pre-training of deep bidirectional transformers for language understanding. In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
  22. Diffusion models beat gans on image synthesis. Neural Information Processing Systems (NeurIPS), 2021.
  23. Score-based generative modeling with critically-damped langevin diffusion. arXiv preprint arXiv:2112.07068, 2021.
  24. GENIE: Higher-Order Denoising Diffusion Solvers. In Advances in Neural Information Processing Systems, 2022.
  25. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021.
  26. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  27. Fixed Point Networks: Implicit Depth Models with Jacobian-Free Backprop. arXiv preprint arXiv:2103.12803, 2021.
  28. Is Attention Better Than Matrix Decomposition? In International Conference on Learning Representations (ICLR), 2021a.
  29. On training implicit models. In Neural Information Processing Systems (NeurIPS), 2021b.
  30. Diffuseq: Sequence to sequence text generation with diffusion models. arXiv preprint arXiv:2210.08933, 2022.
  31. Autogan: Neural architecture search for generative adversarial networks. In IEEE International Conference on Computer Vision (ICCV), 2019.
  32. Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
  33. Joint inference and input optimization in equilibrium networks. Neural Information Processing Systems (NeurIPS), 2021.
  34. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  35. Denoising diffusion probabilistic models. Neural Information Processing Systems (NeurIPS), 2020.
  36. Video diffusion models. In Neural Information Processing Systems (NeurIPS), 2022.
  37. Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
  38. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pages 8867–8887. PMLR, 2022.
  39. Noise2music: Text-conditioned music generation with diffusion models. arXiv preprint arXiv:2302.03917, 2023.
  40. Fastdiff: A fast conditional diffusion model for high-quality speech synthesis. arXiv preprint arXiv:2204.09934, 2022.
  41. Generative adversarial transformers. In International Conference on Machine Learning (ICML), 2021.
  42. Estimation of non-normalized statistical models by score matching. Journal of Machine Learning Research, 6(4), 2005.
  43. Gotta go fast when generating data with score-based models. arXiv preprint arXiv:2105.14080, 2021.
  44. Shap-e: Generating conditional 3d implicit functions, 2023.
  45. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  46. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33:12104–12114, 2020a.
  47. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020b.
  48. Elucidating the design space of diffusion-based generative models. In Neural Information Processing Systems (NeurIPS), 2022.
  49. Variational diffusion models. Neural Information Processing Systems (NeurIPS), 2021.
  50. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
  51. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761, 2020.
  52. Alex Krizhevsky. Learning multiple layers of features from tiny images. 2009.
  53. Cross-lingual language model pretraining. arXiv preprint arXiv:1901.07291, 2019.
  54. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019.
  55. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  56. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022.
  57. Flow matching for generative modeling. In International Conference on Learning Representations (ICLR), 2023.
  58. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 11020–11028, 2022a.
  59. Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778, 2022b.
  60. Flow straight and fast: Learning to generate and transfer data with rectified flow. In International Conference on Learning Representations (ICLR), 2023.
  61. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
  62. Swin transformer: Hierarchical vision transformer using shifted windows. In IEEE International Conference on Computer Vision (ICCV), 2021.
  63. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  64. Implicit normalizing flows. arXiv preprint arXiv:2103.09527, 2021.
  65. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022.
  66. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  67. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  68. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022.
  69. Recurrence without recurrence: Stable video landmark detection with deep equilibrium models. arXiv preprint arXiv:2304.00600, 2023.
  70. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning (ICML), 2021.
  71. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML), 2022.
  72. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  73. Stabilizing transformers for reinforcement learning. In International Conference on Machine Learning (ICML), 2020.
  74. PyTorch: An Imperative Style, High-performance Deep Learning Library. In Neural Information Processing Systems (NeurIPS), 2019.
  75. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
  76. Film: Visual reasoning with a general conditioning layer. In Association for the Advancement of Artificial Intelligence (AAAI), 2018.
  77. Deep equilibrium approaches to diffusion models. Advances in Neural Information Processing Systems, 35:37975–37990, 2022.
  78. Dreamfusion: Text-to-3d using 2d diffusion. arXiv, 2022.
  79. Improving language understanding by generative pre-training. 2018.
  80. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  81. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  82. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  83. Lipschitz bounded equilibrium networks. arXiv preprint arXiv:2010.01732, 2020.
  84. Exploring the limits of transfer learning with a unified text-to-text transformer. 2019.
  85. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  86. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022a.
  87. Photorealistic text-to-image diffusion models with deep language understanding. Neural Information Processing Systems (NeurIPS), 2022b.
  88. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  89. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  90. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
  91. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH 2022 conference proceedings, pages 1–10, 2022.
  92. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  93. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), 2015.
  94. Denoising diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021a.
  95. Generative modeling by estimating gradients of the data distribution. Neural Information Processing Systems (NeurIPS), 2019.
  96. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021b.
  97. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  98. Deep equilibrium models as estimators for continuous latent variables. In International Conference on Artificial Intelligence and Statistics, pages 1646–1671. PMLR, 2023.
  99. Attention is all you need. Neural Information Processing Systems (NeurIPS), 2017.
  100. Deep equilibrium object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6296–6306, 2023.
  101. What language model architecture and pretraining objective works best for zero-shot generalization? In International Conference on Machine Learning, pages 22964–22984. PMLR, 2022.
  102. Implicit Feature Pyramid Network for Object Detection. arXiv preprint arXiv:2012.13563, 2020.
  103. Learning to efficiently sample from diffusion probabilistic models. arXiv preprint arXiv:2106.03802, 2021.
  104. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  105. Monotone operator equilibrium networks. Advances in neural information processing systems, 33:10718–10728, 2020.
  106. On layer normalization in the transformer architecture. In International Conference on Machine Learning (ICML), 2020.
  107. Geodiff: A geometric diffusion model for molecular conformation generation. arXiv preprint arXiv:2203.02923, 2022.
  108. Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2023.
  109. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  110. An efficient deep equilibrium model for medical image segmentation. Computers in Biology and Medicine, 148:105831, 2022.
  111. Fast sampling of diffusion models via operator learning. arXiv preprint arXiv:2211.13449, 2022.
Citations (23)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets