Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 87 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 16 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 210 tok/s Pro
GPT OSS 120B 451 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models (2402.13490v1)

Published 21 Feb 2024 in cs.CV

Abstract: Text-to-image diffusion models have achieved remarkable performance in image synthesis, while the text interface does not always provide fine-grained control over certain image factors. For instance, changing a single token in the text can have unintended effects on the image. This paper shows a simple modification of classifier-free guidance can help disentangle image factors in text-to-image models. The key idea of our method, Contrastive Guidance, is to characterize an intended factor with two prompts that differ in minimal tokens: the positive prompt describes the image to be synthesized, and the baseline prompt serves as a "baseline" that disentangles other factors. Contrastive Guidance is a general method we illustrate whose benefits in three scenarios: (1) to guide domain-specific diffusion models trained on an object class, (2) to gain continuous, rig-like controls for text-to-image generation, and (3) to improve the performance of zero-shot image editors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Demystifying MMD GANs. ICLR, 2018.
  2. Neural ordinary differential equations. NeurIPS, 2018.
  3. Analog bits: Generating discrete data using diffusion models with self-conditioning. ArXiv, 2022.
  4. ILVR: Conditioning method for denoising diffusion probabilistic models. ICCV, 2021.
  5. StarGAN v2: Diverse image synthesis for multiple domains. CVPR, 2020.
  6. Diffusion models beat GANs on image synthesis. NeurIPS, 2021.
  7. CogView: Mastering text-to-image generation via Transformers. NeurIPS, 2021.
  8. CogView2: Faster and better text-to-image generation via hierarchical Transformers. NeurIPS, 2022.
  9. Taming Transformers for high-resolution image synthesis. CVPR, 2021.
  10. An image is worth one word: Personalizing text-to-image generation using textual inversion. ArXiv, 2022a.
  11. StyleGAN-NADA: CLIP-guided domain adaptation of image generators. SIGGRAPH, 2022b.
  12. Generative adversarial nets. NIPS, 2014.
  13. FFJORD: Free-form continuous dynamics for scalable reversible generative models. ICLR, 2019.
  14. Prompt-to-prompt image editing with cross attention control. ArXiv, 2022.
  15. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NIPS, 2017.
  16. Classifier-free diffusion guidance. NeurIPS Workshop, 2021.
  17. Denoising diffusion probabilistic models. NeurIPS, 2020.
  18. Improving sample quality of diffusion models using self-attention guidance. ArXiv, 2022.
  19. FairStyle: Debiasing StyleGAN2 with style channel manipulations. ArXiv, 2022.
  20. A style-based generator architecture for generative adversarial networks. CVPR, 2019.
  21. Analyzing and improving the image quality of StyleGAN. CVPR, 2020.
  22. Alias-free generative adversarial networks. NeurIPS, 2021.
  23. Elucidating the design space of diffusion-based generative models. NeurIPS, 2022.
  24. Denoising diffusion restoration models. NeurIPS, 2022a.
  25. Imagic: Text-based real image editing with diffusion models. ArXiv, 2022b.
  26. GeDi: Generative discriminator guided sequence generation. Findings of EMNLP, 2021.
  27. Contrastive decoding: Open-ended text generation as optimization. ArXiv, 2022.
  28. Pseudo numerical methods for diffusion models on manifolds. ICLR, 2022a.
  29. Compositional visual generation with composable diffusion models. ECCV, 2022b.
  30. More control for free! Image synthesis with semantic diffusion guidance. ArXiv, 2021.
  31. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. NeurIPS, 2022.
  32. RePaint: Inpainting using denoising diffusion probabilistic models. CVPR, 2022.
  33. Interacting particle solutions of Fokker–Planck equations through gradient–log–density estimation. Entropy, 2020.
  34. A very preliminary analysis of DALL-E 2. ArXiv, 2022.
  35. SDEdit: Guided image synthesis and editing with stochastic differential equations. ICLR, 2022.
  36. PULSE: Self-supervised photo upsampling via latent space exploration of generative models. CVPR, 2020.
  37. Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR, 2017.
  38. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. ICML, 2022.
  39. Controllable and compositional generation with latent-space energy-based models. NeurIPS, 2021.
  40. On aliased resizing and surprising subtleties in GAN evaluation. CVPR, 2022.
  41. StyleCLIP: Text-driven manipulation of StyleGAN imagery. ICCV, 2021.
  42. Learning transferable visual models from natural language supervision. ICML, 2021.
  43. Zero-shot text-to-image generation. ICML, 2021.
  44. Hierarchical text-conditional image generation with CLIP latents. ArXiv, 2022.
  45. High-resolution image synthesis with latent diffusion models. CVPR, 2022.
  46. DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. ArXiv, 2022.
  47. Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
  48. Progressive distillation for fast sampling of diffusion models. ICLR, 2022.
  49. Denoising diffusion implicit models. ICLR, 2021a.
  50. Generative modeling by estimating gradients of the data distribution. NeurIPS, 2019.
  51. Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
  52. Dual diffusion implicit bridges for image-to-image translation. ArXiv, 2022.
  53. Winoground: Probing vision and language models for visio-linguistic compositionality. ArXiv, 2022.
  54. Learning fast samplers for diffusion models by differentiating through sample quality. ICLR, 2022.
  55. Chen Henry Wu and Fernando De la Torre. Unifying diffusion models’ latent space, with applications to CycleDiffusion and guidance. ArXiv, 2022.
  56. Generative visual prompt: Unifying distributional control of pre-trained generative models. NeurIPS, 2022.
  57. Scaling autoregressive models for content-rich text-to-image generation. ArXiv, 2022.
  58. Fast sampling of diffusion models with exponential integrator. ArXiv, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.