Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation (2404.11824v4)

Published 18 Apr 2024 in cs.CV

Abstract: Recent advancements in Text-to-image (T2I) generation have witnessed a shift from adapting text to fixed backgrounds to creating images around text. Traditional approaches are often limited to generate layouts within static images for effective text placement. Our proposed approach, TextCenGen, introduces a dynamic adaptation of the blank region for text-friendly image generation, emphasizing text-centric design and visual harmony generation. Our method employs force-directed attention guidance in T2I models to generate images that strategically reserve whitespace for pre-defined text areas, even for text or icons at the golden ratio. Observing how cross-attention maps affect object placement, we detect and repel conflicting objects using a force-directed graph approach, combined with a Spatial Excluding Cross-Attention Constraint for smooth attention in whitespace areas. As a novel task in graphic design, experiments indicate that TextCenGen outperforms existing methods with more harmonious compositions. Furthermore, our method significantly enhances T2I model outcomes on our specially collected prompt datasets, catering to varied text positions. These results demonstrate the efficacy of TextCenGen in creating more harmonious and integrated text-image compositions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Spatext: Spatio-textual representation for controllable image generation. In CVPR, pages 18370–18380, 2023.
  2. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  3. Automatic stylistic manga layout. ACM Transactions on Graphics, 31(6):1–10, 2012.
  4. Layoutdm: Transformer-based diffusion model for layout generation. In CVPR, pages 18349–18358, 2023.
  5. Textdiffuser-2: Unleashing the power of language models for text rendering. arXiv preprint arXiv:2311.16465, 2023.
  6. Textdiffuser: Diffusion models as text painters. arXiv preprint arXiv:2305.10855, 2023.
  7. Seine: Short-to-long video diffusion model for generative transition and prediction. arXiv preprint arXiv:2310.20700, 2023.
  8. Clipscore. https://github.com/jmhessel/clipscore, 2022.
  9. Diffusion self-guidance for controllable image generation. Advances in Neural Information Processing Systems, 2023.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. Vinci: an intelligent graphic design system for generating advertising posters. In CHI, pages 1–17, 2021.
  12. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  13. Clipscore: A reference-free evaluation metric for image captioning. In EMNLP, pages 7514–7528, 2021.
  14. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  15. Unifying multimodal transformer for bi-directional image and text generation. In ICML, pages 1138–1147, 2021.
  16. Unifying layout generation with a decoupled diffusion model. In CVPR, pages 1942–1951, 2023.
  17. Layoutdm: Discrete diffusion model for controllable layout generation. In CVPR, pages 10167–10176, 2023.
  18. Towards flexible multi-modal document models. In CVPR, pages 14287–14296, 2023.
  19. Text2poster: Laying out stylized texts on retrieved images. In ICASSP, pages 4823–4827. IEEE, 2022.
  20. Layoutvae: Stochastic scene layout generation from a label set. In ICCV, pages 9895–9904, 2019.
  21. Layoutgan: Generating graphic layouts with wireframe discriminators. In International Conference on Learning Representations, 2019.
  22. Harmonious textual layout generation over natural images via deep aesthetics learning. IEEE Transactions on Multimedia, 24:3416–3428, 2022.
  23. Relation-aware diffusion model for controllable poster layout generation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, page 1249–1258, 2023.
  24. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  25. Gligen: Open-set grounded text-to-image generation. In CVPR, pages 22511–22521, 2023.
  26. More control for free! image synthesis with semantic diffusion guidance. In WACV, pages 289–299, 2023.
  27. Glyphdraw: Learning to draw chinese characters in image synthesis models coherently. arXiv preprint arXiv:2303.17870, 2023.
  28. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  29. Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition. arXiv preprint arXiv:2312.07536, 2023.
  30. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML, pages 16784–16804. PMLR, 2022.
  31. Learning layouts for single-pagegraphic designs. IEEE Transactions on Visualization and Computer Graphics, 20:1200–1213, 2014.
  32. Localizing object-level shape variations with text-to-image diffusion models. In ICCV, 2023.
  33. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  34. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  35. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  36. Total variation based image restoration with free local constraints. In Proceedings of 1st international conference on image processing, volume 1, pages 31–35. IEEE, 1994.
  37. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  38. Make-a-video: Text-to-video generation without text-video data. In ICLR, 2023.
  39. Anytext: Multilingual visual text generation and editing. arXiv preprint arXiv:2311.03054, 2023.
  40. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  41. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14896 [cs], 2022.
  42. Compositional text-to-image synthesis with attention map control of diffusion models. arXiv preprint arXiv:2305.13921, 2023.
  43. Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. In ICCV, pages 7452–7461, 2023.
  44. Brush your text: Synthesize any scene text on images via diffusion model. 2023.
  45. Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
  46. Uni-controlnet: All-in-one control to text-to-image diffusion models. Advances in Neural Information Processing Systems, 2023.
  47. Content-aware generative modeling of graphic design layouts. ACM Transactions on Graphics, 38(4):1–15, 2019.
  48. Composition-aware graphic layout gan for visual-textual presentation designs. In Proceedings of the Thirty-First International Joint Conference on, pages 4995–5001, 7 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com