Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation (2307.00300v1)

Published 1 Jul 2023 in cs.CV

Abstract: While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an optimization-free method for each face identity, meanwhile keeping the editability for text-to-image models. Specifically, we propose a novel face-identity encoder to learn an accurate representation of human faces, which applies multi-scale face features followed by a multi-embedding projector to directly generate the pseudo words in the text embedding space. Besides, we propose self-augmented editability learning to enhance the editability of models, which is achieved by constructing paired generated face and edited face images using celebrity names, aiming at transferring mature ability of off-the-shelf text-to-image models in celebrity faces to unseen faces. Extensive experiments show that our methods can generate identity-preserved images under different scenes at a much faster speed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Face recognition in the age of clip & billion image datasets. arXiv preprint arXiv:2301.07315, 2023.
  2. Large scale gan training for high fidelity natural image synthesis. arXiv preprint arXiv:1809.11096, 2018.
  3. Instructpix2pix: Learning to follow image editing instructions. arXiv preprint arXiv:2211.09800, 2022.
  4. Rifegan: Rich feature generation for text-to-image synthesis from prior knowledge. In CVPR, pages 10911–10920, 2020.
  5. Arcface: Additive angular margin loss for deep face recognition. in 2019 ieee. In CVPR, pages 4685–4694, 2018.
  6. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  7. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  8. Designing an encoder for fast personalization of text-to-image models. arXiv preprint arXiv:2302.12228, 2023.
  9. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.
  10. Dse-gan: Dynamic semantic evolution generative adversarial network for text-to-image generation. In Proceedings of the 30th ACM International Conference on Multimedia, pages 4345–4354, 2022.
  11. Scaling up gans for text-to-image synthesis. arXiv preprint arXiv:2303.05511, 2023.
  12. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
  13. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  14. Multi-concept customization of text-to-image diffusion. arXiv preprint arXiv:2212.04488, 2022.
  15. Controllable text-to-image generation. Advances in Neural Information Processing Systems, 32, 2019.
  16. Cpgan: Content-parsing generative adversarial networks for text-to-image synthesis. In ECCV, pages 491–508, 2020.
  17. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  18. Unified multi-modal latent diffusion for joint subject and text conditional image generation. arXiv preprint arXiv:2303.09319, 2023.
  19. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  20. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  21. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  22. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  23. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  24. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  25. Dae-gan: Dynamic aspect-aware gan for text-to-image synthesis. In ICCV, pages 13960–13969, 2021.
  26. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242, 2022.
  27. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  28. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  29. Instantbooth: Personalized text-to-image generation without test-time finetuning. arXiv preprint arXiv:2304.03411, 2023.
  30. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  31. Df-gan: Deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865, 2020.
  32. Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation. arXiv preprint arXiv:2302.13848, 2023.
  33. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, pages 1316–1324, 2018.
  34. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  35. Cross-modal contrastive learning for text-to-image generation. In CVPR, pages 833–842, 2021.
  36. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV, pages 5907–5915, 2017.
  37. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  38. Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In CVPR, pages 6199–6208, 2018.
  39. Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis. In CVPR, pages 5802–5810, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zhuowei Chen (9 papers)
  2. Shancheng Fang (11 papers)
  3. Wei Liu (1135 papers)
  4. Qian He (65 papers)
  5. Mengqi Huang (29 papers)
  6. Yongdong Zhang (119 papers)
  7. Zhendong Mao (55 papers)
Citations (21)

Summary

We haven't generated a summary for this paper yet.