Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint Method (2312.12030v1)

Published 19 Dec 2023 in cs.CV and cs.AI

Abstract: Training-free guided sampling in diffusion models leverages off-the-shelf pre-trained networks, such as an aesthetic evaluation model, to guide the generation process. Current training-free guided sampling algorithms obtain the guidance energy function based on a one-step estimate of the clean image. However, since the off-the-shelf pre-trained networks are trained on clean images, the one-step estimation procedure of the clean image may be inaccurate, especially in the early stages of the generation process in diffusion models. This causes the guidance in the early time steps to be inaccurate. To overcome this problem, we propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages. Firstly, SAG estimates the clean image via $n$ function calls, where $n$ serves as a flexible hyperparameter that can be tailored to meet specific image quality requirements. Secondly, SAG uses the symplectic adjoint method to obtain the gradients accurately and efficiently in terms of the memory requirements. Extensive experiments demonstrate that SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Analyzing inverse problems with invertible neural networks. In International Conference on Learning Representations, 2018.
  2. Universal guidance for diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pages 843–852, 2023.
  3. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, 2018.
  4. Diffusion posterior sampling for general noisy inverse problems. In The Eleventh International Conference on Learning Representations, 2022.
  5. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, 2021.
  6. James F Epperson. An introduction to numerical methods and analysis. John Wiley & Sons, 2021.
  7. Symplectic geometric algorithms for Hamiltonian systems. Springer, 2010.
  8. An image is worth one word: Personalizing text-to-image generation using textual inversion. In The Eleventh International Conference on Learning Representations, 2023.
  9. Structure-preserving algorithms for ordinary differential equations. Geometric numerical integration, 31, 2006.
  10. Prompt-to-prompt image editing with cross-attention control. In The Eleventh International Conference on Learning Representations, 2022.
  11. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  12. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  13. Pick-a-pic: An open dataset of user preferences for text-to-image generation. arXiv preprint arXiv:2305.01569, 2023.
  14. Upainting: Unified text-to-image diffusion generation with cross-modal guidance. arXiv preprint arXiv:2210.16031, 2022.
  15. Magicedit: High-fidelity and temporally coherent video editing. In arXiv, 2023.
  16. AudioLDM: Text-to-audio generation with latent diffusion models. In Proceedings of the 40th International Conference on Machine Learning, pages 21450–21474. PMLR, 2023a.
  17. Flowgrad: Controlling the output of generative odes with gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24335–24344, 2023b.
  18. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, 2022a.
  19. DPM-Solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022b.
  20. Symplectic adjoint method for exact gradient of neural ode with minimal memory. Advances in Neural Information Processing Systems, 34:20772–20784, 2021.
  21. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2021.
  22. Dreamix: Video Diffusion Models are General Video Editors, 2023. arXiv:2302.01329 [cs].
  23. Adjointdpm: Adjoint sensitivity method for gradient backpropagation of diffusion probabilistic models. arXiv preprint arXiv:2307.10711, 2023.
  24. Zero-shot image-to-image translation. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
  25. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. ISSN: 2640-3498.
  26. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  27. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  28. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022a.
  29. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems, 2022b.
  30. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020.
  31. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  32. End-to-end diffusion latent optimization improves classifier guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7280–7290, 2023a.
  33. Edict: Exact diffusion inversion via coupled transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22532–22541, 2023b.
  34. Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation, 2023a. arXiv:2212.11565 [cs].
  35. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341, 2023b.
  36. Freedom: Training-free energy-guided conditional diffusion model. In International Conference on Computer Vision (ICCV), 2023.
  37. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023.
  38. Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2022.
  39. MagicVideo: Efficient Video Generation With Latent Diffusion Models, 2022. arXiv:2211.11018 [cs].
Citations (5)

Summary

We haven't generated a summary for this paper yet.