Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fast Personalized Text-to-Image Syntheses With Attention Injection (2403.11284v1)

Published 17 Mar 2024 in cs.CV

Abstract: Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts. We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image. Our method can generate personalized images without any fine-tuning while maintaining the inherent text-to-image generation ability of diffusion models. Given a prompt and a reference image, we merge the custom concept into generated images by manipulating cross-attention and self-attention layers of the original diffusion model to generate personalized images that match the text description. Comprehensive experiments highlight the superiority of our method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “An image is worth one word: Personalizing text-to-image generation using textual inversion,” 2022.
  2. “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022.
  3. “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” 2022.
  4. “Multi-concept customization of text-to-image diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1931–1941.
  5. “Dreamartist: Towards controllable one-shot text-to-image generation via contrastive prompt-tuning,” arXiv preprint arXiv:2211.11337, 2022.
  6. “Svdiff: Compact parameter space for diffusion fine-tuning,” arXiv preprint arXiv:2303.11305, 2023.
  7. “Break-a-scene: Extracting multiple concepts from a single image,” arXiv preprint arXiv:2305.16311, 2023.
  8. “Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation,” arXiv preprint arXiv:2302.13848, 2023.
  9. “Designing an encoder for fast personalization of text-to-image models,” arXiv preprint arXiv:2302.12228, 2023.
  10. “Domain-agnostic tuning-encoder for fast personalization of text-to-image models,” arXiv preprint arXiv:2307.06925, 2023.
  11. “Instantbooth: Personalized text-to-image generation without test-time finetuning,” arXiv preprint arXiv:2304.03411, 2023.
  12. “Face0: Instantaneously conditioning a text-to-image model on a face,” arXiv preprint arXiv:2306.06638, 2023.
  13. “Taming encoder for zero fine-tuning image customization with text-to-image diffusion models,” arXiv preprint arXiv:2304.02642, 2023.
  14. “Prompt-to-prompt image editing with cross attention control,” arXiv preprint arXiv:2208.01626, 2022.
  15. “Null-text inversion for editing real images using guided diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6038–6047.
  16. “Zero-shot image-to-image translation,” arXiv preprint arXiv:2302.03027, 2023.
  17. “Learning transferable visual models from natural language supervision,” 2021.
  18. “A style-based generator architecture for generative adversarial networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4401–4410.
  19. “Stargan v2: Diverse image synthesis for multiple domains,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8188–8197.
  20. “Clipscore: A reference-free evaluation metric for image captioning,” 2022.
  21. Davis E. King, “Dlib-ml: A machine learning toolkit,” Journal of Machine Learning Research, vol. 10, no. 3, pp. 1755–1758, 2009.
  22. ,” jun 2015, IEEE.
  23. “Laion-5b: An open large-scale dataset for training next generation image-text models,” 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yuxuan Zhang (119 papers)
  2. Yiren Song (30 papers)
  3. Jinpeng Yu (5 papers)
  4. Han Pan (6 papers)
  5. Zhongliang Jing (11 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.