IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models (2403.13535v2)
Abstract: Leveraging Stable Diffusion for the generation of personalized portraits has emerged as a powerful and noteworthy tool, enabling users to create high-fidelity, custom character avatars based on their specific prompts. However, existing personalization methods face challenges, including test-time fine-tuning, the requirement of multiple input images, low preservation of identity, and limited diversity in generated outcomes. To overcome these challenges, we introduce IDAdapter, a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image. IDAdapter integrates a personalized concept into the generation process through a combination of textual and visual injections and a face identity loss. During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details, guiding the model to generate images with more diverse styles, expressions, and angles compared to previous works. Extensive evaluations demonstrate the effectiveness of our method, achieving both diversity and identity fidelity in generated images.
- Low-rank adaptation for fast text-to-image diffusion fine-tuning. https://github.com/cloneofsimo/lora, 2022.
- Break-a-scene: Extracting multiple concepts from a single image. In SIGGRAPH Asia, 2023.
- David Beniaguev. Synthetic faces high quality (sfhq) dataset. https://github.com/SelfishGene/SFHQ-dataset, 2022.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, 2023.
- Vggface2: A dataset for recognising faces across pose and age. In FG, 2018.
- Photoverse: Tuning-free image customization with text-to-image diffusion models. arXiv:2309.05793, 2023a.
- Subject-driven text-to-image generation via apprenticeship learning. In NIPS, 2023b.
- Re-imagen: Retrieval-augmented text-to-image generator. In ICLR, 2023c.
- Arcface: Additive angular margin loss for deep face recognition. In CVPR, 2019.
- Retinaface: Single-shot multi-level face localisation in the wild. In CVPR, 2020.
- Insightface: 2d and 3d face analysis project. Github, 2022.
- Dreamartist: Towards controllable one-shot text-to-image generation via contrastive prompt-tuning. arXiv:2211.11337, 2022.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023a.
- Encoder-based domain tuning for fast personalization of text-to-image models. ACM Trans Graph, 2023b.
- Svdiff: Compact parameter space for diffusion fine-tuning. In ICCV, 2023.
- Vico: Detail-preserving visual condition for personalized text-to-image generation. arXiv:2306.00971, 2023.
- Denoising diffusion probabilistic models. NIPS, 2020.
- Parameter-efficient transfer learning for nlp. ICML, 2019.
- LoRA: Low-rank adaptation of large language models. In ICLR, 2022.
- Magicapture: High-resolution multi-concept portrait customization. arXiv:2309.06895, 2023.
- Taming encoder for zero fine-tuning image customization with text-to-image diffusion models. arXiv:2304.02642, 2023.
- Scaling up gans for text-to-image synthesis. In CVPR, 2023.
- Multi-concept customization of text-to-image diffusion. In CVPR, 2023.
- Subject-diffusion: Open domain personalized text-to-image generation without test-time fine-tuning. arXiv:2307.11410, 2023a.
- Unified multi-modal latent diffusion for joint subject and text conditional image generation. arXiv:2303.09319, 2023b.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv:2302.08453, 2023.
- Improved denoising diffusion probabilistic models. In ICML, 2021.
- Mystyle: A personalized generative prior. TOG, 2022.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Zero-shot text-to-image generation. In ICML, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125, 2022.
- Pivotal tuning for latent-based editing of real images. TOG, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
- Photorealistic text-to-image diffusion models with deep language understanding. NIPS, 2022.
- Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv:2111.02114, 2021.
- Instantbooth: Personalized text-to-image generation without test-time finetuning. arXiv:2304.03411, 2023.
- Continual diffusion: Continual customization of text-to-image diffusion with c-lora. arXiv:2304.06027, 2023.
- Styledrop: Text-to-image synthesis of any style. In NIPS, 2023.
- Key-locked rank one editing for text-to-image personalization. SIGGRAPH, 2023.
- Face0: Instantaneously conditioning a text-to-image model on a face. SIGGRAPH, 2023.
- p+limit-from𝑝p+italic_p +: Extended textual conditioning in text-to-image generation. arXiv:2303.09522, 2023.
- Styleadapter: A single-pass lora-free model for stylized image generation. arXiv:2309.01770, 2023.
- Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation. In ICCV, 2023.
- Tedigan: Text-guided diverse face image generation and manipulation, 2021.
- Fastcomposer: Tuning-free multi-subject image generation with localized attention. arXiv:2305.10431, 2023.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv:2308.06721, 2023.
- Inserting anybody in diffusion models via celeb basis. In NIPS, 2023.
- Enhancing detail preservation for customized text-to-image generation: A regularization-free approach. arXiv:2305.13579, 2023.
- Siying Cui (2 papers)
- Jiankang Deng (96 papers)
- Jia Guo (101 papers)
- Xiang An (15 papers)
- Yongle Zhao (4 papers)
- Xinyu Wei (15 papers)
- Ziyong Feng (25 papers)