Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning (2310.06968v1)

Published 10 Oct 2023 in cs.CV and cs.LG

Abstract: Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object generation is important to many downstream tasks like generating comic book illustrations with consistent characters and setting. Numerous approaches attempt to solve this problem by extending the vocabulary of diffusion models through fine-tuning. However, even lightweight fine-tuning approaches can be prohibitively expensive to run at scale and in real-time. We introduce a method called ObjectComposer for generating compositions of multiple objects that resemble user-specified images. Our approach is training-free, leveraging the abilities of preexisting models. We build upon the recent BLIP-Diffusion model, which can generate images of single objects specified by reference images. ObjectComposer enables the consistent generation of compositions containing multiple specific objects simultaneously, all without modifying the weights of the underlying models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Multidiffusion: Fusing diffusion paths for controlled image generation, 2023.
  2. An image is worth one word: Personalizing text-to-image generation using textual inversion, 2022.
  3. Prompt-to-prompt image editing with cross attention control, 2022.
  4. Denoising diffusion probabilistic models, 2020.
  5. Blip-diffusion: Pre-trained subject representation for controllable text-to-image generation and editing, 2023.
  6. Null-text inversion for editing real images using guided diffusion models, 2022.
  7. Nobuyuki Otsu. A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9(1):62–66, 1979.
  8. High-resolution image synthesis with latent diffusion models, 2022.
  9. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023.
  10. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.