Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Concept-centric Personalization with Large-scale Diffusion Priors (2312.08195v1)

Published 13 Dec 2023 in cs.CV, cs.AI, and cs.MM

Abstract: Despite large-scale diffusion models being highly capable of generating diverse open-world content, they still struggle to match the photorealism and fidelity of concept-specific generators. In this work, we present the task of customizing large-scale diffusion priors for specific concepts as concept-centric personalization. Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to open-world models, enabling applications in diverse tasks such as concept-centric stylization and image translation. To tackle these challenges, we identify catastrophic forgetting of guidance prediction from diffusion priors as the fundamental issue. Consequently, we develop a guidance-decoupled personalization framework specifically designed to address this task. We propose Generalized Classifier-free Guidance (GCFG) as the foundational theory for our framework. This approach extends Classifier-free Guidance (CFG) to accommodate an arbitrary number of guidances, sourced from a variety of conditions and models. Employing GCFG enables us to separate conditional guidance into two distinct components: concept guidance for fidelity and control guidance for controllability. This division makes it feasible to train a specialized model for concept guidance, while ensuring both control and unconditional guidance remain intact. We then present a null-text Concept-centric Diffusion Model as a concept-specific generator to learn concept guidance without the need for text annotations. Code will be available at https://github.com/PRIV-Creation/Concept-centric-Personalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. Domain-agnostic tuning-encoder for fast personalization of text-to-image models. arXiv preprint arXiv:2307.06925, 2023.
  2. Coco-stuff: Thing and stuff classes in context. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1209–1218, 2018.
  3. Lsap: Rethinking inversion fidelity, perception and editability in gan latent space. arXiv preprint arXiv:2209.12746, 2022.
  4. What decreases editing capability? domain-specific hybrid refinement for improved gan inversion. arXiv preprint arXiv:2301.12141, 2023.
  5. Disenbooth: Disentangled parameter-efficient tuning for subject-driven text-to-image generation. arXiv preprint arXiv:2305.03374, 2023.
  6. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  7. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
  8. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  9. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  10. Prompt tuning inversion for text-driven image editing using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7430–7440, 2023.
  11. Compositional visual generation with energy based models. Advances in Neural Information Processing Systems, 33:6637–6647, 2020.
  12. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  13. Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of diffusion models. arXiv preprint arXiv:2305.18292, 2023.
  14. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  15. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
  16. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  17. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  18. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  19. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  20. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  21. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1931–1941, 2023.
  22. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22511–22521, 2023.
  23. Accelerate neural style transfer with super-resolution. Multimedia Tools and Applications, 79(7):4347–4364, 2020.
  24. Compositional visual generation with composable diffusion models. In European Conference on Computer Vision, pages 423–439. Springer, 2022.
  25. Cones: Concept neurons in diffusion models for customized generation. arXiv preprint arXiv:2303.05125, 2023a.
  26. Cones 2: Customizable image synthesis with multiple subjects. arXiv preprint arXiv:2305.19327, 2023b.
  27. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095, 2022.
  28. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
  29. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  30. Controlling text-to-image diffusion by orthogonal finetuning. arXiv preprint arXiv:2306.07280, 2023.
  31. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  32. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
  33. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  34. Pivotal tuning for latent-based editing of real images. ACM Transactions on graphics (TOG), 42(1):1–13, 2022.
  35. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  36. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  37. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  38. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  39. Laion-400m: Open dataset of clip-filtered 400 million image-text pairs. arXiv preprint arXiv:2111.02114, 2021.
  40. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, 35:25278–25294, 2022.
  41. Instantbooth: Personalized text-to-image generation without test-time finetuning. arXiv preprint arXiv:2304.03411, 2023.
  42. Continual diffusion: Continual customization of text-to-image diffusion with c-lora. arXiv preprint arXiv:2304.06027, 2023.
  43. Designing an encoder for stylegan image manipulation. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
  44. Sketch-guided text-to-image diffusion models. In ACM SIGGRAPH 2023 Conference Proceedings, pages 1–11, 2023.
  45. Reconstruct-and-generate diffusion model for detail-preserving image denoising. arXiv preprint arXiv:2309.10714, 2023a.
  46. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7677–7689, 2023b.
  47. Hsr-diff: hyperspectral image super-resolution via conditional diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7083–7093, 2023.
  48. Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7378–7387, 2023.
  49. Parsing r-cnn for instance-level human analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 364–373, 2019.
  50. Renovating parsing r-cnn for accurate multiple human parsing. In European Conference on Computer Vision, pages 421–437. Springer, 2020.
  51. Quality-aware network for face parsing. arXiv preprint arXiv:2106.07368, 2021.
  52. Part decomposition and refinement network for human parsing. IEEE/CAA Journal of Automatica Sinica, 9(6):1111–1114, 2022a.
  53. Quality-aware network for human parsing. IEEE Transactions on Multimedia, 2022b.
  54. Deep learning technique for human parsing: A survey and outlook. arXiv preprint arXiv:2301.00394, 2023a.
  55. Zero-shot contrastive loss for text-guided diffusion image style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22873–22882, 2023b.
  56. Adding conditional control to text-to-image diffusion models, 2023.
  57. Towards robust blind face restoration with codebook lookup transformer. In NeurIPS, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Pu Cao (10 papers)
  2. Lu Yang (82 papers)
  3. Feng Zhou (195 papers)
  4. Tianrui Huang (3 papers)
  5. Qing Song (23 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com