Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

High-Fidelity Diffusion-based Image Editing (2312.15707v3)

Published 25 Dec 2023 in cs.CV

Abstract: Diffusion models have attained remarkable success in the domains of image generation and editing. It is widely recognized that employing larger inversion and denoising steps in diffusion model leads to improved image reconstruction quality. However, the editing performance of diffusion models tends to be no more satisfactory even with increasing denoising steps. The deficiency in editing could be attributed to the conditional Markovian property of the editing process, where errors accumulate throughout denoising steps. To tackle this challenge, we first propose an innovative framework where a rectifier module is incorporated to modulate diffusion model weights with residual features, thereby providing compensatory information to bridge the fidelity gap. Furthermore, we introduce a novel learning paradigm aimed at minimizing error propagation during the editing process, which trains the editing procedure in a manner similar to denoising score-matching. Extensive experiments demonstrate that our proposed framework and training strategy achieve high-fidelity reconstruction and editing results across various levels of denoising steps, meanwhile exhibits exceptional performance in terms of both quantitative metric and qualitative assessments. Moreover, we explore our model's generalization through several applications like image-to-image translation and out-of-domain image editing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Image2stylegan++: How to edit the embedded images? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8296–8305.
  2. Restyle: A residual-based stylegan encoder via iterative refinement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6711–6720.
  3. Hyperstyle: Stylegan inversion with hypernetworks for real image editing. In Proceedings of the IEEE/CVF conference on computer Vision and pattern recognition, 18511–18521.
  4. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18208–18218.
  5. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938.
  6. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8188–8197.
  7. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427.
  8. Hypernetworks. arXiv preprint arXiv, 1609.
  9. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34: 8780–8794.
  10. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618.
  11. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4): 1–13.
  12. Generative adversarial networks. Communications of the ACM, 63(11): 139–144.
  13. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778.
  14. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626.
  15. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840–6851.
  16. Cascaded diffusion models for high fidelity image generation. The Journal of Machine Learning Research, 23(1): 2249–2281.
  17. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598.
  18. Curricularface: adaptive curriculum learning loss for deep face recognition. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5901–5910.
  19. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
  20. Training generative adversarial networks with limited data. Advances in neural information processing systems, 33: 12104–12114.
  21. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4401–4410.
  22. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6007–6017.
  23. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2426–2435.
  24. Diffusion models already have a semantic latent space. arXiv preprint arXiv:2210.10960.
  25. ReGANIE: Rectifying GAN Inversion Errors for Accurate Real Image Editing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, 1269–1277.
  26. Flow matching for generative modeling. arXiv preprint arXiv:2210.02747.
  27. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11461–11471.
  28. Guided Image Synthesis via Initial Image Editing in Diffusion Model. arXiv preprint arXiv:2305.03382.
  29. Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models. arXiv preprint arXiv:2212.02024.
  30. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073.
  31. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6038–6047.
  32. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741.
  33. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, 8162–8171. PMLR.
  34. Styleres: Transforming the residuals for real image editing with stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1828–1837.
  35. Diffusion autoencoders: Toward a meaningful and decodable representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10619–10629.
  36. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2287–2296.
  37. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695.
  38. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 234–241. Springer.
  39. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502.
  40. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456.
  41. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4): 1–14.
  42. High-fidelity gan inversion for image attribute editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11379–11388.
  43. Paint by example: Exemplar-based image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18381–18391.
  44. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.
  45. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543.
  46. Unsupervised representation learning from pre-trained diffusion probabilistic models. Advances in Neural Information Processing Systems, 35: 22117–22130.
  47. In-domain gan inversion for real image editing. In European conference on computer vision, 592–608. Springer.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Chen Hou (8 papers)
  2. Guoqiang Wei (14 papers)
  3. Zhibo Chen (176 papers)
Citations (3)