Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models (2403.10983v2)

Published 16 Mar 2024 in cs.CV

Abstract: Personalization is an important topic in text-to-image generation, especially the challenging multi-concept personalization. Current multi-concept methods are struggling with identity preservation, occlusion, and the harmony between foreground and background. In this work, we propose OMG, an occlusion-friendly personalized generation framework designed to seamlessly integrate multiple concepts within a single image. We propose a novel two-stage sampling solution. The first stage takes charge of layout generation and visual comprehension information collection for handling occlusions. The second one utilizes the acquired visual comprehension information and the designed noise blending to integrate multiple concepts while considering occlusions. We also observe that the initiation denoising timestep for noise blending is the key to identity preservation and layout. Moreover, our method can be combined with various single-concept models, such as LoRA and InstantID without additional tuning. Especially, LoRA models on civitai.com can be exploited directly. Extensive experiments demonstrate that OMG exhibits superior performance in multi-concept personalization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zhe Kong (7 papers)
  2. Yong Zhang (660 papers)
  3. Tianyu Yang (67 papers)
  4. Tao Wang (700 papers)
  5. Kaihao Zhang (55 papers)
  6. Bizhu Wu (3 papers)
  7. Guanying Chen (32 papers)
  8. Wei Liu (1136 papers)
  9. Wenhan Luo (88 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.