Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
116 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

Scene Graph Conditioning in Latent Diffusion (2310.10338v1)

Published 16 Oct 2023 in cs.CV

Abstract: Diffusion models excel in image generation but lack detailed semantic control using text prompts. Additional techniques have been developed to address this limitation. However, conditioning diffusion models solely on text-based descriptions is challenging due to ambiguity and lack of structure. In contrast, scene graphs offer a more precise representation of image content, making them superior for fine-grained control and accurate synthesis in image generation models. The amount of image and scene-graph data is sparse, which makes fine-tuning large diffusion models challenging. We propose multiple approaches to tackle this problem using ControlNet and Gated Self-Attention. We were able to show that using out proposed methods it is possible to generate images from scene graphs with much higher quality, outperforming previous methods. Our source code is publicly available on https://github.com/FrankFundel/SGCond

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Novelai improvements on stable diffusion — by novelai — medium. https://blog.novelai.net/novelai-improvements-on-stable-diffusion-e10d38db82ac. (Accessed on 05/25/2023).
  2. Image colorization: A survey and dataset, 2022.
  3. Multi-stage variational auto-encoders for coarse-to-fine image generation, 2017.
  4. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
  5. P. Dhariwal and A. Nichol. Diffusion models beat gans on image synthesis, 2021.
  6. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
  7. Taming transformers for high-resolution image synthesis, 2021.
  8. Scenegenie: Scene graph guided diffusion models for image synthesis, 2023.
  9. Generative adversarial networks, 2014.
  10. Denoising diffusion probabilistic models, 2020.
  11. J. Ho and T. Salimans. Classifier-free diffusion guidance, 2022.
  12. Lora: Low-rank adaptation of large language models, 2021.
  13. Structure-clip: Enhance multi-modal language representations with structure knowledge, 2023.
  14. Huggingface. diffusers. https://github.com/huggingface/diffusers/. (Accessed on 05/25/2023).
  15. Image-to-image translation with conditional adversarial networks, 2018.
  16. Image generation from scene graphs, 2018.
  17. D. P. Kingma and M. Welling. Auto-encoding variational bayes, 2022.
  18. T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks, 2017.
  19. Relation-aware graph attention network for visual question answering, 2019.
  20. Gligen: Open-set grounded text-to-image generation, 2023.
  21. Pastegan: A semi-parametric method to generate image from scene graph, 2019.
  22. Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning, 2022.
  23. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models, 2023.
  24. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022.
  25. Diffusevae: Efficient, controllable and high-fidelity generation from low-dimensional latents, 2022.
  26. K. Panguluri and K. Kamarajugadda. Image generation using variational autoencoders. IJITEE (International Journal of Information Technology and Electrical Engineering), 9, 03 2020.
  27. W. Peebles and S. Xie. Scalable diffusion models with transformers, 2023.
  28. M. A. Pinsky and S. Karlin. 3 - markov chains: Introduction. In M. A. Pinsky and S. Karlin, editors, An Introduction to Stochastic Modeling (Fourth Edition), pages 79–163. Academic Press, Boston, fourth edition edition, 2011.
  29. Learning transferable visual models from natural language supervision, 2021.
  30. Hierarchical text-conditional image generation with clip latents, 2022.
  31. Generating diverse high-fidelity images with vq-vae-2, 2019.
  32. Generative adversarial text to image synthesis, 2016.
  33. High-resolution image synthesis with latent diffusion models, 2022.
  34. Deep unsupervised learning using nonequilibrium thermodynamics, 2015.
  35. Denoising diffusion implicit models, 2022.
  36. Score-based generative modeling through stochastic differential equations, 2021.
  37. Transformer-based image generation from scene graphs, 2023.
  38. Neural discrete representation learning, 2018.
  39. Attention is all you need, 2017.
  40. Semantic image synthesis via diffusion models, 2022.
  41. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1):4–24, jan 2021.
  42. Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning, 2023.
  43. Diffusion-based scene graph to image generation with masked contrastive pre-training, 2022.
  44. Unsupervised image generation with infinite generative adversarial networks, 2021.
  45. L. Zhang and M. Agrawala. Adding conditional control to text-to-image diffusion models, 2023.
  46. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2017.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)