Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Boundary Guided Learning-Free Semantic Control with Diffusion Models (2302.08357v3)

Published 16 Feb 2023 in cs.CV

Abstract: Applying pre-trained generative denoising diffusion models (DDMs) for downstream tasks such as image semantic editing usually requires either fine-tuning DDMs or learning auxiliary editing networks in the existing literature. In this work, we present our BoundaryDiffusion method for efficient, effective and light-weight semantic control with frozen pre-trained DDMs, without learning any extra networks. As one of the first learning-free diffusion editing works, we start by seeking a comprehensive understanding of the intermediate high-dimensional latent spaces by theoretically and empirically analyzing their probabilistic and geometric behaviors in the Markov chain. We then propose to further explore the critical step for editing in the denoising trajectory that characterizes the convergence of a pre-trained DDM and introduce an automatic search method. Last but not least, in contrast to the conventional understanding that DDMs have relatively poor semantic behaviors, we prove that the critical latent space we found already exhibits semantic subspace boundaries at the generic level in unconditional DDMs, which allows us to do controllable manipulation by guiding the denoising trajectory towards the targeted boundary via a single-step operation. We conduct extensive experiments on multiple DPMs architectures (DDPM, iDDPM) and datasets (CelebA, CelebA-HQ, LSUN-church, LSUN-bedroom, AFHQ-dog) with different resolutions (64, 256), achieving superior or state-of-the-art performance in various task scenarios (image semantic editing, text-based editing, unconditional semantic control) to demonstrate the effectiveness.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. Image2stylegan: How to embed images into the stylegan latent space? In ICCV, 2019.
  2. Image2stylegan++: How to edit the embedded images? In CVPR, 2020.
  3. Restyle: A residual-based stylegan encoder via iterative refinement. In ICCV, 2021.
  4. Structured denoising diffusion models in discrete state-spaces. In NeurIPS, 2021.
  5. Towards open-set identity preserving face synthesis. In CVPR, 2018.
  6. Inverting layers of a large generator. In ICLR Workshop, 2019.
  7. Seeing what a gan cannot generate. In ICCV, 2019.
  8. Foundations of data science. Cambridge University Press, 2020.
  9. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  10. Stargan v2: Diverse image synthesis for multiple domains. In CVPR, 2020.
  11. Discrete diffusion reward guidance methods for offline reinforcement learning. ICML Workshop, 2023.
  12. Inverting the generator of a generative adversarial network. IEEE TNNLS, 2018.
  13. Arcface: Additive angular margin loss for deep face recognition. In CVPR, 2019.
  14. Diffusion models beat gans on image synthesis. In NeurIPS, 2021.
  15. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  16. Generative adversarial nets. In NeurIPS, 2014.
  17. Vector quantized diffusion model for text-to-image synthesis. In CVPR, 2022.
  18. Support vector machines. IEEE Intelligent Systems and their applications, 1998.
  19. Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeuIPS, 2017.
  20. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  21. Denoising diffusion probabilistic models. In NeurIPS, 2020.
  22. Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 2022.
  23. Video diffusion models. NeurIPS Workshop, 2022.
  24. A variational perspective on diffusion-based generative models and score matching. In NeurIPS, 2021.
  25. Transforming and projecting images into class-conditional generative networks. In ECCV. Springer, 2020.
  26. Planning with diffusion for flexible behavior synthesis. ICML, 2022.
  27. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196, 2017.
  28. Elucidating the design space of diffusion-based generative models. arXiv preprint arXiv:2206.00364, 2022.
  29. A style-based generator architecture for generative adversarial networks. In CVPR, 2019.
  30. Analyzing and improving the image quality of stylegan. In CVPR, 2020.
  31. Diffusionclip: Text-guided diffusion models for robust image manipulation. In CVPR, 2022.
  32. Variational diffusion models. In NeurIPS, 2021.
  33. Diffwave: A versatile diffusion model for audio synthesis. In ICLR, 2020.
  34. Latent space diffusion models of cryo-em structures. NeurIPS Workshop, 2022.
  35. Diffusion models already have a semantic latent space. In ICLR, 2023.
  36. Fader networks: Manipulating images by sliding attributes. NIPS, 2017.
  37. Nu-wave: A diffusion probabilistic model for neural audio upsampling. Proc. Interspeech 2021, 2021.
  38. Markov chains and mixing times, volume 107. American Mathematical Soc., 2017.
  39. Deep learning face attributes in the wild. In ICCV, December 2015.
  40. Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021.
  41. Improved denoising diffusion probabilistic models. In ICML. PMLR, 2021.
  42. Conditional image synthesis with auxiliary classifier gans. In ICML, 2017.
  43. Styleclip: Text-driven manipulation of stylegan imagery. In CVPR, 2021.
  44. Diffusion autoencoders: Toward a meaningful and decodable representation. In CVPR, 2022.
  45. Learning transferable visual models from natural language supervision. In ICML. PMLR, 2021.
  46. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  47. Encoding in style: a stylegan encoder for image-to-image translation. In CVPR, 2021.
  48. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  49. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention. Springer, 2015.
  50. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  51. Interpreting the latent space of gans for semantic face editing. In CVPR, 2020.
  52. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  53. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML. PMLR, 2015.
  54. Denoising diffusion implicit models. ICLR, 2021.
  55. Maximum likelihood training of score-based diffusion models. In NeurIPS, 2021.
  56. Improved techniques for training score-based generative models. In NeurIPS, 2020.
  57. Score-based generative modeling through stochastic differential equations. In ICLR, 2020.
  58. High-resolution image reconstruction with latent diffusion models from human brain activity. bioRxiv, pages 2022–11, 2022.
  59. Disentangled representation learning gan for pose-invariant face recognition. In CVPR, 2017.
  60. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems, 2021.
  61. E2style: Improve the efficiency and effectiveness of stylegan inversion. IEEE TIP, 2022.
  62. Gan inversion: A survey. IEEE TPAMI, 2022.
  63. Denoising diffusion probabilistic models to predict the density of molecular clouds. The Astrophysical Journal, 950(2):146, 2023.
  64. Diffusion in diffusion: Cyclic one-way diffusion for text-vision-conditioned generation. arXiv preprint arXiv:2306.08247, 2023.
  65. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European conference on computer vision (ECCV), pages 325–341, 2018.
  66. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  67. Semantic understanding of scenes through the ade20k dataset. arXiv preprint arXiv:1608.05442, 2016.
  68. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
  69. In-domain gan inversion for real image editing. In ECCV. Springer, 2020.
  70. Generative visual manipulation on the natural image manifold. In ECCV. Springer, 2016.
  71. Discrete contrastive diffusion for cross-modal and conditional generation. In ICLR, 2023.
Citations (14)

Summary

We haven't generated a summary for this paper yet.