Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 59 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 80 tok/s Pro
Kimi K2 181 tok/s Pro
GPT OSS 120B 454 tok/s Pro
Claude Sonnet 4.5 33 tok/s Pro
2000 character limit reached

Doubly Abductive Counterfactual Inference for Text-based Image Editing (2403.02981v2)

Published 5 Mar 2024 in cs.CV

Abstract: We study text-based image editing (TBIE) of a single image by counterfactual inference because it is an elegant formulation to precisely address the requirement: the edited image should retain the fidelity of the original one. Through the lens of the formulation, we find that the crux of TBIE is that existing techniques hardly achieve a good trade-off between editability and fidelity, mainly due to the overfitting of the single-image fine-tuning. To this end, we propose a Doubly Abductive Counterfactual inference framework (DAC). We first parameterize an exogenous variable as a UNet LoRA, whose abduction can encode all the image details. Second, we abduct another exogenous variable parameterized by a text encoder LoRA, which recovers the lost editability caused by the overfitted first abduction. Thanks to the second abduction, which exclusively encodes the visual transition from post-edit to pre-edit, its inversion -- subtracting the LoRA -- effectively reverts pre-edit back to post-edit, thereby accomplishing the edit. Through extensive experiments, our DAC achieves a good trade-off between editability and fidelity. Thus, we can support a wide spectrum of user editing intents, including addition, removal, manipulation, replacement, style transfer, and facial change, which are extensively validated in both qualitative and quantitative evaluations. Codes are in https://github.com/xuesong39/DAC.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Blended diffusion for text-driven editing of natural images. In CVPR, pages 18208–18218, 2022.
  2. Can we improve model robustness through secondary attribute counterfactuals? In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4701–4712, 2021.
  3. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18392–18402, 2023.
  4. Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22560–22570, 2023.
  5. Prompt tuning inversion for text-driven image editing using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  6. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  7. Making llama see and draw with seed tokenizer. arXiv preprint arXiv:2310.01218, 2023.
  8. Counterfactual visual explanations. In ICML, pages 2376–2384, 2019.
  9. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  10. Delta denoising score. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2328–2337, 2023.
  11. Denoising diffusion probabilistic models. NeurIPS, 33:6840–6851, 2020.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  13. Learning the difference that makes a difference with counterfactually-augmented data. arXiv preprint arXiv:1909.12434, 2019.
  14. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  15. Diffusionclip: Text-guided diffusion models for robust image manipulation. In CVPR, pages 2426–2435, 2022.
  16. Counterfactual fairness. Advances in neural information processing systems, 30, 2017.
  17. Cones 2: Customizable image synthesis with multiple subjects. arXiv preprint arXiv:2305.19327, 2023.
  18. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023.
  19. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
  20. Counterfactual vqa: A cause-effect look at language bias. In CVPR, pages 12700–12710, 2021.
  21. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
  22. Effective real image editing with accelerated iterative diffusion inversion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15912–15921, 2023.
  23. Causal inference in statistics: A primer. John Wiley & Sons, 2016.
  24. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  25. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  26. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  27. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
  28. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, pages 36479–36494, 2022.
  29. Herbert A Simon. Spurious correlation: A causal interpretation. Journal of the American statistical Association, pages 467–479, 1954.
  30. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  31. Consistency models. 2023.
  32. Generative multimodal models are in-context learners. arXiv preprint arXiv:2312.13286, 2023.
  33. Long-tailed classification by keeping the good and removing the bad momentum causal effect. NeurIPS, 33:1513–1524, 2020.
  34. A language for counterfactual generative models. In International Conference on Machine Learning, pages 10173–10182. PMLR, 2021.
  35. Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1921–1930, 2023.
  36. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers, 2022.
  37. Edict: Exact diffusion inversion via coupled transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22532–22541, 2023.
  38. Not all steps are created equal: Selective diffusion distillation for image manipulation. In ICCV, pages 7472–7481, 2023.
  39. Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023.
  40. Fast diffusion model. arXiv preprint arXiv:2306.06991, 2023.
  41. Inversion-free image editing with natural language. arXiv preprint arXiv:2312.04965, 2023.
  42. Fairness in decision-making—the causal explanation formula. In AAAI, 2018.
  43. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  44. Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6027–6037, 2023.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube