Fast Personalized Text-to-Image Syntheses With Attention Injection (2403.11284v1)
Abstract: Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts. We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image. Our method can generate personalized images without any fine-tuning while maintaining the inherent text-to-image generation ability of diffusion models. Given a prompt and a reference image, we merge the custom concept into generated images by manipulating cross-attention and self-attention layers of the original diffusion model to generate personalized images that match the text description. Comprehensive experiments highlight the superiority of our method.
- “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022.
- “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022.
- “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2022.
- “Multi-concept customization of text-to-image diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1931–1941.
- “Dreamartist: Towards controllable one-shot text-to-image generation via contrastive prompt-tuning,” arXiv preprint arXiv:2211.11337, 2022.
- “Svdiff: Compact parameter space for diffusion fine-tuning,” arXiv preprint arXiv:2303.11305, 2023.
- “Break-a-scene: Extracting multiple concepts from a single image,” arXiv preprint arXiv:2305.16311, 2023.
- “Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation,” arXiv preprint arXiv:2302.13848, 2023.
- “Designing an encoder for fast personalization of text-to-image models,” arXiv preprint arXiv:2302.12228, 2023.
- “Domain-agnostic tuning-encoder for fast personalization of text-to-image models,” arXiv preprint arXiv:2307.06925, 2023.
- “Instantbooth: Personalized text-to-image generation without test-time finetuning,” arXiv preprint arXiv:2304.03411, 2023.
- “Face0: Instantaneously conditioning a text-to-image model on a face,” arXiv preprint arXiv:2306.06638, 2023.
- “Taming encoder for zero fine-tuning image customization with text-to-image diffusion models,” arXiv preprint arXiv:2304.02642, 2023.
- “Prompt-to-prompt image editing with cross attention control,” arXiv preprint arXiv:2208.01626, 2022.
- “Null-text inversion for editing real images using guided diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6038–6047.
- “Zero-shot image-to-image translation,” arXiv preprint arXiv:2302.03027, 2023.
- “Learning transferable visual models from natural language supervision,” 2021.
- “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
- “Stargan v2: Diverse image synthesis for multiple domains,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8188–8197.
- “Clipscore: A reference-free evaluation metric for image captioning,” 2022.
- Davis E. King, “Dlib-ml: A machine learning toolkit,” Journal of Machine Learning Research, vol. 10, no. 3, pp. 1755–1758, 2009.
- ,” jun 2015, IEEE.
- “Laion-5b: An open large-scale dataset for training next generation image-text models,” 2022.
- Yuxuan Zhang (119 papers)
- Yiren Song (30 papers)
- Jinpeng Yu (5 papers)
- Han Pan (6 papers)
- Zhongliang Jing (11 papers)