Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guidance with Spherical Gaussian Constraint for Conditional Diffusion (2402.03201v4)

Published 5 Feb 2024 in cs.LG

Abstract: Recent advances in diffusion models attempt to handle conditional generative tasks by utilizing a differentiable loss function for guidance without the need for additional training. While these methods achieved certain success, they often compromise on sample quality and require small guidance step sizes, leading to longer sampling processes. This paper reveals that the fundamental issue lies in the manifold deviation during the sampling process when loss guidance is employed. We theoretically show the existence of manifold deviation by establishing a certain lower bound for the estimation error of the loss guidance. To mitigate this problem, we propose Diffusion with Spherical Gaussian constraint (DSG), drawing inspiration from the concentration phenomenon in high-dimensional Gaussian distributions. DSG effectively constrains the guidance step within the intermediate data manifold through optimization and enables the use of larger guidance steps. Furthermore, we present a closed-form solution for DSG denoising with the Spherical Gaussian constraint. Notably, DSG can seamlessly integrate as a plugin module within existing training-free conditional diffusion methods. Implementing DSG merely involves a few lines of additional code with almost no extra computational overhead, yet it leads to significant performance improvements. Comprehensive experimental results in various conditional generation tasks validate the superiority and adaptability of DSG in terms of both sample quality and time efficiency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  843–852, 2023.
  2. Plug-and-play admm for image restoration: Fixed-point convergence and applications. IEEE Transactions on Computational Imaging, 3(1):84–98, 2016.
  3. Ilvr: Conditioning method for denoising diffusion probabilistic models. in 2021 ieee. In CVF international conference on computer vision (ICCV), pp.  14347–14356, 2021.
  4. Improving diffusion models for inverse problems using manifold constraints. Advances in Neural Information Processing Systems, 35:25683–25696, 2022.
  5. Diffusion posterior sampling for general noisy inverse problems. International Conference on Learning Representation, 2023.
  6. Score-based generative neural networks for large-scale optimal transport. Advances in neural information processing systems, 34:12955–12965, 2021.
  7. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  8. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4690–4699, 2019.
  9. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  10. Generative adversarial networks. Communications of the ACM, 63(11):139–144, 2020.
  11. Manifold preserving guided diffusion. International Conference on Learning Representation, 2024.
  12. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  13. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  14. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  15. Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
  16. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  17. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  11461–11471, 2022.
  18. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  19. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
  20. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pp.  8162–8171. PMLR, 2021.
  21. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  22. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  10684–10695, 2022.
  23. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022.
  24. Large-scale classification of fine-art paintings: Learning the right metric on the right feature. arxiv 2015. arXiv preprint arXiv:1505.00855, 2015.
  25. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  26. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  27. Loss-guided diffusion models for plug-and-play controllable generation. In International Conference on Machine Learning, pp.  32483–32498. PMLR, 2023.
  28. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  29. Improved techniques for training score-based generative models. Advances in neural information processing systems, 33:12438–12448, 2020.
  30. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  31. Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems, 34:1415–1428, 2021.
  32. Human motion diffusion model. arXiv preprint arXiv:2209.14916, 2022.
  33. Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022.
  34. Freedom: Training-free energy-guided conditional diffusion model. arXiv preprint arXiv:2303.09833, 2023.
  35. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  3836–3847, 2023.
  36. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
  37. Denoising diffusion models for plug-and-play image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  1219–1229, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Lingxiao Yang (24 papers)
  2. Shutong Ding (8 papers)
  3. Yifan Cai (13 papers)
  4. Jingyi Yu (171 papers)
  5. Jingya Wang (68 papers)
  6. Ye Shi (51 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.