Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models (2402.13490v1)
Abstract: Text-to-image diffusion models have achieved remarkable performance in image synthesis, while the text interface does not always provide fine-grained control over certain image factors. For instance, changing a single token in the text can have unintended effects on the image. This paper shows a simple modification of classifier-free guidance can help disentangle image factors in text-to-image models. The key idea of our method, Contrastive Guidance, is to characterize an intended factor with two prompts that differ in minimal tokens: the positive prompt describes the image to be synthesized, and the baseline prompt serves as a "baseline" that disentangles other factors. Contrastive Guidance is a general method we illustrate whose benefits in three scenarios: (1) to guide domain-specific diffusion models trained on an object class, (2) to gain continuous, rig-like controls for text-to-image generation, and (3) to improve the performance of zero-shot image editors.
- Demystifying MMD GANs. ICLR, 2018.
- Neural ordinary differential equations. NeurIPS, 2018.
- Analog bits: Generating discrete data using diffusion models with self-conditioning. ArXiv, 2022.
- ILVR: Conditioning method for denoising diffusion probabilistic models. ICCV, 2021.
- StarGAN v2: Diverse image synthesis for multiple domains. CVPR, 2020.
- Diffusion models beat GANs on image synthesis. NeurIPS, 2021.
- CogView: Mastering text-to-image generation via Transformers. NeurIPS, 2021.
- CogView2: Faster and better text-to-image generation via hierarchical Transformers. NeurIPS, 2022.
- Taming Transformers for high-resolution image synthesis. CVPR, 2021.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. ArXiv, 2022a.
- StyleGAN-NADA: CLIP-guided domain adaptation of image generators. SIGGRAPH, 2022b.
- Generative adversarial nets. NIPS, 2014.
- FFJORD: Free-form continuous dynamics for scalable reversible generative models. ICLR, 2019.
- Prompt-to-prompt image editing with cross attention control. ArXiv, 2022.
- GANs trained by a two time-scale update rule converge to a local Nash equilibrium. NIPS, 2017.
- Classifier-free diffusion guidance. NeurIPS Workshop, 2021.
- Denoising diffusion probabilistic models. NeurIPS, 2020.
- Improving sample quality of diffusion models using self-attention guidance. ArXiv, 2022.
- FairStyle: Debiasing StyleGAN2 with style channel manipulations. ArXiv, 2022.
- A style-based generator architecture for generative adversarial networks. CVPR, 2019.
- Analyzing and improving the image quality of StyleGAN. CVPR, 2020.
- Alias-free generative adversarial networks. NeurIPS, 2021.
- Elucidating the design space of diffusion-based generative models. NeurIPS, 2022.
- Denoising diffusion restoration models. NeurIPS, 2022a.
- Imagic: Text-based real image editing with diffusion models. ArXiv, 2022b.
- GeDi: Generative discriminator guided sequence generation. Findings of EMNLP, 2021.
- Contrastive decoding: Open-ended text generation as optimization. ArXiv, 2022.
- Pseudo numerical methods for diffusion models on manifolds. ICLR, 2022a.
- Compositional visual generation with composable diffusion models. ECCV, 2022b.
- More control for free! Image synthesis with semantic diffusion guidance. ArXiv, 2021.
- DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. NeurIPS, 2022.
- RePaint: Inpainting using denoising diffusion probabilistic models. CVPR, 2022.
- Interacting particle solutions of Fokker–Planck equations through gradient–log–density estimation. Entropy, 2020.
- A very preliminary analysis of DALL-E 2. ArXiv, 2022.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. ICLR, 2022.
- PULSE: Self-supervised photo upsampling via latent space exploration of generative models. CVPR, 2020.
- Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR, 2017.
- GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. ICML, 2022.
- Controllable and compositional generation with latent-space energy-based models. NeurIPS, 2021.
- On aliased resizing and surprising subtleties in GAN evaluation. CVPR, 2022.
- StyleCLIP: Text-driven manipulation of StyleGAN imagery. ICCV, 2021.
- Learning transferable visual models from natural language supervision. ICML, 2021.
- Zero-shot text-to-image generation. ICML, 2021.
- Hierarchical text-conditional image generation with CLIP latents. ArXiv, 2022.
- High-resolution image synthesis with latent diffusion models. CVPR, 2022.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. ArXiv, 2022.
- Photorealistic text-to-image diffusion models with deep language understanding. NeurIPS, 2022.
- Progressive distillation for fast sampling of diffusion models. ICLR, 2022.
- Denoising diffusion implicit models. ICLR, 2021a.
- Generative modeling by estimating gradients of the data distribution. NeurIPS, 2019.
- Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
- Dual diffusion implicit bridges for image-to-image translation. ArXiv, 2022.
- Winoground: Probing vision and language models for visio-linguistic compositionality. ArXiv, 2022.
- Learning fast samplers for diffusion models by differentiating through sample quality. ICLR, 2022.
- Chen Henry Wu and Fernando De la Torre. Unifying diffusion models’ latent space, with applications to CycleDiffusion and guidance. ArXiv, 2022.
- Generative visual prompt: Unifying distributional control of pre-trained generative models. NeurIPS, 2022.
- Scaling autoregressive models for content-rich text-to-image generation. ArXiv, 2022.
- Fast sampling of diffusion models with exponential integrator. ArXiv, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.