Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon (2404.07946v1)

Published 14 Mar 2024 in cs.LG and cs.AI

Abstract: Diffusion models (DMs) are a powerful generative framework that have attracted significant attention in recent years. However, the high computational cost of training DMs limits their practical applications. In this paper, we start with a consistency phenomenon of DMs: we observe that DMs with different initializations or even different architectures can produce very similar outputs given the same noise inputs, which is rare in other generative models. We attribute this phenomenon to two factors: (1) the learning difficulty of DMs is lower when the noise-prediction diffusion model approaches the upper bound of the timestep (the input becomes pure noise), where the structural information of the output is usually generated; and (2) the loss landscape of DMs is highly smooth, which implies that the model tends to converge to similar local minima and exhibit similar behavior patterns. This finding not only reveals the stability of DMs, but also inspires us to devise two strategies to accelerate the training of DMs. First, we propose a curriculum learning based timestep schedule, which leverages the noise rate as an explicit indicator of the learning difficulty and gradually reduces the training frequency of easier timesteps, thus improving the training efficiency. Second, we propose a momentum decay strategy, which reduces the momentum coefficient during the optimization process, as the large momentum may hinder the convergence speed and cause oscillations due to the smoothness of the loss landscape. We demonstrate the effectiveness of our proposed strategies on various models and show that they can significantly reduce the training time and improve the quality of the generated images.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Renderdiffusion: Image diffusion for 3d reconstruction, inpainting and generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12608–12618, 2023.
  2. All are worth words: A vit backbone for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22669–22679, 2023.
  3. Curriculum learning. In International Conference on Machine Learning, 2009.
  4. Align your latents: High-resolution video synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22563–22575, 2023.
  5. Large scale gan training for high fidelity natural image synthesis. ArXiv, abs/1809.11096, 2018.
  6. Decaying momentum helps neural network training. ArXiv, abs/1910.04952, 2019.
  7. Ting Chen. On the importance of noise scheduling for diffusion models. ArXiv, abs/2301.10972, 2023.
  8. Perception prioritized training of diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11462–11471, 2022.
  9. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  10. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv, abs/2010.11929, 2020.
  12. M-fac: Efficient matrix-free approximations of second-order information. Advances in Neural Information Processing Systems, 34:14873–14886, 2021.
  13. Balanced self-paced learning for generative adversarial clustering network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4391–4400, 2019.
  14. Shampoo: Preconditioned stochastic tensor optimization. In International Conference on Machine Learning, pages 1842–1850. PMLR, 2018.
  15. Efficient diffusion training via min-snr weighting strategy. ArXiv, abs/2303.09556, 2023.
  16. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS, 2017.
  17. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  18. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  19. LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
  20. Diff-tts: A denoising diffusion model for text-to-speech. arXiv preprint arXiv:2104.01409, 2021.
  21. Elucidating the design space of diffusion-based generative models. ArXiv, abs/2206.00364, 2022.
  22. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  23. Learning multiple layers of features from tiny images. 2009.
  24. Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
  25. Sophia: A scalable stochastic second-order optimizer for language model pre-training. arXiv preprint arXiv:2305.14342, 2023.
  26. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11461–11471, 2022.
  27. Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pages 2408–2417. PMLR, 2015.
  28. Systematic investigation of sparse perturbed sharpness-aware minimization optimizer. arXiv preprint arXiv:2306.17504, 2023.
  29. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  30. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  31. Scalable diffusion models with transformers. ArXiv, abs/2212.09748, 2022.
  32. Competence-based curriculum learning for neural machine translation. arXiv preprint arXiv:1903.09848, 2019.
  33. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  34. High-resolution image synthesis with latent diffusion models, 2021.
  35. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  36. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  37. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  38. Image difficulty curriculum for generative adversarial networks (cugan). In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 3463–3472, 2020.
  39. Neural discrete representation learning. ArXiv, abs/1711.00937, 2017.
  40. Videocomposer: Compositional video synthesis with motion controllability. arXiv preprint arXiv:2306.02018, 2023.
  41. Fast diffusion model. ArXiv, abs/2306.06991, 2023.
  42. A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai. arXiv preprint arXiv:2303.13336, 2, 2023a.
  43. Adding conditional control to text-to-image diffusion models, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tianshuo Xu (9 papers)
  2. Peng Mi (10 papers)
  3. Ruilin Wang (10 papers)
  4. Yingcong Chen (35 papers)
Citations (5)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets