Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models (2404.07206v1)

Published 10 Apr 2024 in cs.CV, cs.AI, cs.GR, cs.LG, and cs.MM

Abstract: In this paper, we introduce GoodDrag, a novel approach to improve the stability and image quality of drag editing. Unlike existing methods that struggle with accumulated perturbations and often result in distortions, GoodDrag introduces an AlDD framework that alternates between drag and denoising operations within the diffusion process, effectively improving the fidelity of the result. We also propose an information-preserving motion supervision operation that maintains the original features of the starting point for precise manipulation and artifact reduction. In addition, we contribute to the benchmarking of drag editing by introducing a new dataset, Drag100, and developing dedicated quality assessment metrics, Dragging Accuracy Index and Gemini Score, utilizing Large Multimodal Models. Extensive experiments demonstrate that the proposed GoodDrag compares favorably against the state-of-the-art approaches both qualitatively and quantitatively. The project page is https://gooddrag.github.io.

Introducing GoodDrag: Improving Stability and Quality in Drag Editing with Alternating Drag and Denoising

Overview of GoodDrag

The paper presents GoodDrag, an advanced approach for enhancing the stability and image quality in drag editing. This novel method integrates an Alternating Drag and Denoising (AlDD) framework with an information-preserving motion supervision technique. The key innovations include:

  • AlDD Framework: Alternates between drag and denoising operations within the diffusion process to prevent the accumulation of distortions, ensuring refined editing.
  • Information-Preserving Motion Supervision: Maintains the originality of the starting point features during manipulation, significantly reducing artifacts.
  • Drag100 Dataset and Dedicated Evaluation Metrics: Introduces a new dataset for benchmarking drag editing and develops new quality assessment metrics leveraging Large Multimodal Models (LMMs).

Methodology

GoodDrag's methodological contributions are twofold. Firstly, the Alternating Drag and Denoising framework distributes drag operations across multiple diffusion denoising steps, effectively reducing accumulated perturbations and maintaining high fidelity. This process contrasts with existing methods that perform all drag operations at once, leading to substantial distortion that is difficult to correct. Secondly, the information-preserving motion supervision approach addresses the feature drifting issue common in drag editing, ensuring that dragged features remain consistent with the original starting point for more accurate and artifact-free results.

The paper also takes significant strides in benchmarking drag editing advancements by introducing the Drag100 dataset alongside two novel evaluation metrics: the Dragging Accuracy Index (DAI) and Gemini Score (GScore). These metrics, developed utilizing Large Multimodal Models, offer a more reliable assessment of drag editing quality compared to conventional No-Reference Image Quality Assessment methods.

Experimental Results

Extensive experiments demonstrate GoodDrag's superior performance over state-of-the-art approaches in both qualitative and quantitative measures. GoodDrag achieves more precise manipulation with significantly reduced artifacts and improved stability. Furthermore, the introduction of the Drag100 dataset and dedicated evaluation metrics facilitates a comprehensive benchmarking framework for the drag editing field. Evaluation against the Drag100 dataset with the newly proposed DAI and GScore metrics indicates that GoodDrag consistently outperforms existing methods, delivering high-quality drag editing outcomes.

Implications and Future Directions

GoodDrag's introduction of AlDD and information-preserving motion supervision contributes significantly to the theoretical understanding of drag editing challenges and solutions. Practically, it establishes a new baseline for drag editing algorithms, offering an efficient and effective tool for both academic research and practical applications.

The establishment of the Drag100 dataset and the DAI and GScore metrics provide a robust framework for evaluating drag editing techniques, setting a foundation for future research and development in this area.

Speculating on future developments, integrating GoodDrag with other image editing tasks could unlock new applications and enhance existing workflows. Extending its capabilities to video editing presents an exciting avenue for research, potentially transforming the landscape of video manipulation technology.

In conclusion, GoodDrag represents an important step forward in drag editing, combining innovation in methodological approaches with advancements in evaluation frameworks. Its implications for both theory and practice signal a promising direction for future research in generative AI and image manipulation technologies.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
  3. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In International Conference on Computer Vision (ICCV), 2023.
  4. Topiq: A top-down approach from semantics to distortions for image quality assessment. IEEE Transactions on Image Processing (TIP), 2023.
  5. Deepfacedrawing: Deep generation of face images from sketches. In ACM Transactions on Graphics (TOG), pages 72–1. ACM New York, NY, USA, 2020.
  6. Deepfaceediting: Deep face generation and editing with disentangled geometry and appearance control. arXiv preprint arXiv:2105.08935, 2021.
  7. Diffusion models beat gans on image synthesis. In Advances in neural information processing systems, pages 8780–8794, 2021.
  8. One-for-all: Towards universal domain translation with a single stylegan. arXiv preprint arXiv:2310.14222, 2023.
  9. Taming transformers for high-resolution image synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, pages 12873–12883, 2021.
  10. Thomas D Gauthier. Detecting trends using spearman’s rank correlation coefficient. Environmental forensics, 2(4):359–362, 2001.
  11. No-reference image quality assessment via transformers, relative ranking, and self-consistency. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1220–1230, 2022.
  12. Generative adversarial nets. In Advances in neural information processing systems, 2014.
  13. Photorealistic video generation with diffusion models. arXiv preprint arXiv:2312.06662, 2023.
  14. Denoising diffusion probabilistic models. In Advances in neural information processing systems, pages 6840–6851, 2020.
  15. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
  16. Image-to-image translation with conditional adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1125–1134, 2017.
  17. A style-based generator architecture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
  18. Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, pages 23593–23606, 2022.
  19. Musiq: Multi-scale image quality transformer. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5148–5157, 2021.
  20. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  21. Discrete predictor-corrector diffusion models for image synthesis. In International Conference on Learning Representations, 2022.
  22. Text-driven image editing via learnable regions. arXiv preprint arXiv:2311.16432, 2023.
  23. Freedrag: Point tracking is not you need for interactive point-based image editing. arXiv preprint arXiv:2307.04684, 2023.
  24. Gan-based facial attribute manipulation. In IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 2023.
  25. Learning to see through obstructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.
  26. Dragondiffusion: Enabling drag-style manipulation on diffusion models. arXiv preprint arXiv:2307.02421, 2023.
  27. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, pages 16784–16804. PMLR, 2022.
  28. The blessing of randomness: Sde beats ode in general diffusion-based image editing. arXiv preprint arXiv:2311.01410, 2023.
  29. Restoring vision in adverse weather conditions with patch-based denoising diffusion models. In IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE, 2023.
  30. Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH Conference Proceedings, 2023.
  31. Semantic image synthesis with spatially-adaptive normalization. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2337–2346, 2019.
  32. Dreambooth3d: Subject-driven text-to-3d generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2349–2359, 2023.
  33. Hierarchical text-conditional image generation with clip latents. In arXiv preprint arXiv:2204.06125, page 3, 2022.
  34. Pivotal tuning for latent-based editing of real images. In ACM Transactions on Graphics (TOG), pages 1–13. ACM New York, NY, 2022.
  35. High-resolution image synthesis with latent diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  36. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In IEEE Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023a.
  37. Hyperdreambooth: Hypernetworks for fast personalization of text-to-image models. arXiv preprint arXiv:2307.06949, 2023b.
  38. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
  39. Dragdiffusion: Harnessing diffusion models for interactive point-based image editing. arXiv preprint arXiv:2306.14435, 2023.
  40. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  41. Denoising diffusion implicit models. In International Conference on Learning Representations, 2020a.
  42. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020b.
  43. Drawinginstyles: Portrait image generation and editing with spatially conditioned stylegan. In IEEE Transactions on Visualization and Computer Graphics. IEEE, 2022.
  44. Styleretoucher: Generalized portrait image retouching with gan priors. arXiv preprint arXiv:2312.14389, 2023.
  45. Gan inversion: A survey. arXiv preprint arXiv:2101.05278, 2021.
  46. Customsketching: Sketch concept extraction for sketch-based image synthesis and editing. arXiv preprint arXiv:2402.17624, 2024.
  47. Learning to super-resolve blurry face and text images. In Proceedings of the IEEE international conference on computer vision, pages 251–260, 2017.
  48. Rigid: Recurrent gan inversion and editing of real face videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13691–13701, 2023.
  49. Training class-imbalanced diffusion model via overlap optimization. arXiv preprint arXiv:2402.10821, 2024.
  50. Generative image inpainting with contextual attention. In IEEE Conference on Computer Vision and Pattern Recognition, pages 5505–5514, 2018.
  51. Adding conditional control to text-to-image diffusion models. In IEEE Conference on Computer Vision and Pattern Recognition, pages 3836–3847, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zewei Zhang (4 papers)
  2. Huan Liu (283 papers)
  3. Jun Chen (374 papers)
  4. Xiangyu Xu (48 papers)
Citations (6)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com