Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing (2401.09794v1)
Abstract: In the field of image editing, Null-text Inversion (NTI) enables fine-grained editing while preserving the structure of the original image by optimizing null embeddings during the DDIM sampling process. However, the NTI process is time-consuming, taking more than two minutes per image. To address this, we introduce an innovative method that maintains the principles of the NTI while accelerating the image editing process. We propose the WaveOpt-Estimator, which determines the text optimization endpoint based on frequency characteristics. Utilizing wavelet transform analysis to identify the image's frequency characteristics, we can limit text optimization to specific timesteps during the DDIM sampling process. By adopting the Negative-Prompt Inversion (NPI) concept, a target prompt representing the original image serves as the initial text value for optimization. This approach maintains performance comparable to NTI while reducing the average editing time by over 80% compared to the NTI method. Our method presents a promising approach for efficient, high-quality image editing based on diffusion models.
- “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- “Generative adversarial networks,” 2014.
- “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 5907–5915.
- “Generative adversarial networks: An overview,” IEEE signal processing magazine, vol. 35, no. 1, pp. 53–65, 2018.
- “Large scale gan training for high fidelity natural image synthesis,” 2019.
- “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684–10695.
- “Hierarchical text-conditional image generation with clip latents,” 2022.
- “Photorealistic text-to-image diffusion models with deep language understanding,” 2022.
- “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” arXiv preprint arXiv:2112.10741, 2021.
- “Stylediffusion: Prompt-embedding inversion for text-based editing,” arXiv preprint arXiv:2303.15649, 2023.
- “Prompt-to-prompt image editing with cross attention control,” arXiv preprint arXiv:2208.01626, 2022.
- “Null-text inversion for editing real images using guided diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6038–6047.
- “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
- “Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models,” arXiv preprint arXiv:2305.16807, 2023.
- “Diffusion models beat gans on image synthesis,” Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021.
- “Diffedit: Diffusion-based semantic image editing with mask guidance,” arXiv preprint arXiv:2210.11427, 2022.
- “Plug-and-play diffusion features for text-driven image-to-image translation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1921–1930.
- “Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing,” arXiv preprint arXiv:2304.08465, 2023.
- “Imagic: Text-based real image editing with diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6007–6017.
- “Improving negative-prompt inversion via proximal guidance,” arXiv preprint arXiv:2306.05414, 2023.