Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models (2403.08381v2)
Abstract: Most diffusion models assume that the reverse process adheres to a Gaussian distribution. However, this approximation has not been rigorously validated, especially at singularities, where t=0 and t=1. Improperly dealing with such singularities leads to an average brightness issue in applications, and limits the generation of images with extreme brightness or darkness. We primarily focus on tackling singularities from both theoretical and practical perspectives. Initially, we establish the error bounds for the reverse process approximation, and showcase its Gaussian characteristics at singularity time steps. Based on this theoretical insight, we confirm the singularity at t=1 is conditionally removable while it at t=0 is an inherent property. Upon these significant conclusions, we propose a novel plug-and-play method SingDiffusion to address the initial singular time step sampling, which not only effectively resolves the average brightness issue for a wide range of diffusion models without extra training efforts, but also enhances their generation capability in achieving notable lower FID scores.
- Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, pages 313–326, 1982.
- AUTOMATIC1111. Stable diffusion web ui. https://github.com/AUTOMATIC1111/stable-diffusion-webui, 2022.
- Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18208–18218, 2022.
- A continuous time framework for discrete denoising models. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems (NeurIPS), pages 8780–8794, 2021.
- Score-based generative modeling with critically-damped langevin diffusion. In International Conference on Learning Representations (ICLR), 2022.
- Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12873–12883, 2021.
- Generative adversarial nets. In Advances in Neural Information Processing Systems, 2014.
- Nicholas Guttenberg. Diffusion with offset noise. https://www.crosslabs.org/blog/diffusion-with-offset-noise, 2023.
- Prompt-to-prompt image editing with cross-attention control. In International Conference on Learning Representations (ICLR), 2023.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in Neural Information Processing Systems (NeurIPS), 2017.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems (NeurIPS), pages 6840–6851, 2020.
- Video diffusion models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022.
- Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems (NeurIPS), pages 26565–26577, 2022.
- Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6007–6017, 2023.
- Variational diffusion models. In Advances in Neural Information Processing Systems (NeurIPS), pages 21696–21707, 2021.
- Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations (ICLR), 2021.
- Common diffusion noise schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891, 2023.
- Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), pages 740–755, 2014.
- Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2019.
- Mathematical analysis of singularities in the diffusion model under the submanifold assumption. arXiv preprint arXiv:2301.07882, 2023.
- Srflow: Learning the super-resolution space with normalizing flow. In European Conference on Computer Vision (ECCV), page 715–732, 2020.
- David McAllester. On the mathematics of diffusion models. arXiv preprint arXiv:2301.11108, 2023.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations (ICLR), 2022.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML), pages 16784–16804, 2022.
- Sdxl: improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
- Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning (ICML), pages 8599–8608, 2021.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pages 8748–8763, 2021.
- Zero-shot text-to-image generation. In International Conference on Machine Learning (ICML), pages 8821–8831, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), pages 234–241, 2015.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22500–22510, 2023.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH Conference Proceedings, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NeurIPS), pages 36479–36494, 2022b.
- Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations (ICLR), 2022.
- LAION-5b: An open large-scale dataset for training next generation image-text models. In Advances in Neural Information Processing Systems (NeurIPS), pages 25278–25294, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning (ICML), pages 2256–2265, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021a.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (ICLR), 2021b.
- Dual diffusion implicit bridges for image-to-image translation. In International Conference on Learning Representations (ICLR), 2023.
- Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
- Paint by example: Exemplar-based image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18381–18391, 2023.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3836–3847, 2023a.
- Exploring dual-task correlation for pose guided person image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 7713–7722, 2022.
- Formulating discrete probability flow through optimal transport. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.