Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation (2404.01050v1)

Published 1 Apr 2024 in cs.CV, cs.GR, cs.HC, and cs.LG

Abstract: Point-based interactive editing serves as an essential tool to complement the controllability of existing generative models. A concurrent work, DragDiffusion, updates the diffusion latent map in response to user inputs, causing global latent map alterations. This results in imprecise preservation of the original content and unsuccessful editing due to gradient vanishing. In contrast, we present DragNoise, offering robust and accelerated editing without retracing the latent map. The core rationale of DragNoise lies in utilizing the predicted noise output of each U-Net as a semantic editor. This approach is grounded in two critical observations: firstly, the bottleneck features of U-Net inherently possess semantically rich features ideal for interactive editing; secondly, high-level semantics, established early in the denoising process, show minimal variation in subsequent stages. Leveraging these insights, DragNoise edits diffusion semantics in a single denoising step and efficiently propagates these changes, ensuring stability and efficiency in diffusion editing. Comparative experiments reveal that DragNoise achieves superior control and semantic retention, reducing the optimization time by over 50% compared to DragDiffusion. Our codes are available at https://github.com/haofengl/DragNoise.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Image2stylegan: How to embed images into the stylegan latent space? In ICCV, pages 4432–4441, 2019.
  2. Histogan: Controlling colors of gan-generated and real images via color histograms. In CVPR, pages 7941–7950, 2021.
  3. Blended diffusion for text-driven editing of natural images. In CVPR, pages 18208–18218, 2022.
  4. Label-efficient semantic segmentation with diffusion models. In ICLR, 2022.
  5. Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
  6. Ilvr: Conditioning method for denoising diffusion probabilistic models. In ICCV, pages 14367–14376, 2021.
  7. Diffusion models beat gans on image synthesis. NeurIPS, 34:8780–8794, 2021.
  8. Generative adversarial nets. NeurIPS, 27, 2014.
  9. PHOTOSWAP: Personalized subject swapping in images. In NeurIPS, 2023.
  10. Prompt-to-prompt image editing with cross-attention control. In ICLR, 2022.
  11. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  12. Lora: Low-rank adaptation of large language models. In ICLR, 2021.
  13. Diffuse3d: Wide-angle 3d photography via bilateral diffusion. In ICCV, pages 8998–9008, 2023.
  14. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
  15. Analyzing and improving the image quality of stylegan. In CVPR, pages 8110–8119, 2020.
  16. Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2023.
  17. BLIP-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing. In NeurIPS, 2023a.
  18. Parsing-conditioned anime translation: A new dataset and method. ACM TOG, 42(3):1–14, 2023b.
  19. Freedrag: Point tracking is not you need for interactive point-based image editing. arXiv preprint arXiv:2307.04684, 2023.
  20. Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2021.
  21. Null-text inversion for editing real images using guided diffusion models. In CVPR, pages 6038–6047, 2023.
  22. Dragondiffusion: Enabling drag-style manipulation on diffusion models. arXiv preprint arXiv:2307.02421, 2023.
  23. Drag your gan: Interactive point-based manipulation on the generative image manifold. In SIGGRAPH, pages 1–11, 2023.
  24. Everything is there in latent space: Attribute editing and attribute style manipulation by stylegan latent space exploration. In ACM MM, pages 1828–1836, 2022.
  25. Styleclip: Text-driven manipulation of stylegan imagery. In ICCV, pages 2085–2094, 2021.
  26. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  27. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
  28. Interpreting the latent space of gans for semantic face editing. In CVPR, pages 9243–9252, 2020.
  29. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv preprint arXiv:2306.14435, 2023.
  30. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265, 2015.
  31. Editing out-of-domain gan inversion via differential activations. In ECCV, pages 1–17, 2022.
  32. Denoising diffusion implicit models. In ICLR, 2020.
  33. Emergent correspondence from image diffusion. arXiv preprint arXiv:2306.03881, 2023.
  34. p+limit-from𝑝p+italic_p +: Extended textual conditioning in text-to-image generation. arXiv preprint arXiv:2303.09522, 2023.
  35. Make your own sprites: Aliasing-aware and cell-controllable pixelization. ACM TOG, 41(6):1–16, 2022.
  36. Gan inversion: A survey. IEEE TPAMI, 45(3):3121–3138, 2022.
  37. From continuity to editability: Inverting gans with consecutive images. In ICCV, pages 13910–13918, 2021.
  38. Rigid: Recurrent gan inversion and editing of real face videos. In ICCV, pages 13691–13701, 2023.
  39. Paint by example: Exemplar-based image editing with diffusion models. In CVPR, pages 18381–18391, 2023.
  40. Discovering interpretable latent space directions of gans beyond binary attributes. In CVPR, pages 12177–12185, 2021.
  41. Pastiche master: Exemplar-based high-resolution portrait style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7693–7702, 2022.
  42. Beyond textual constraints: Learning novel diffusion conditions with fewer examples. In CVPR, 2024.
  43. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
  44. Learning an interpretable stylized subspace for 3d-aware animatable artforms. IEEE TVCG, 2024.
  45. In-domain gan inversion for real image editing. In ECCV, pages 592–608, 2020.
Citations (14)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets