Papers
Topics
Authors
Recent
Search
2000 character limit reached

TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

Published 18 Apr 2024 in cs.CV | (2404.11824v5)

Abstract: Text-to-image (T2I) generation has made remarkable progress in producing high-quality images, but a fundamental challenge remains: creating backgrounds that naturally accommodate text placement without compromising image quality. This capability is non-trivial for real-world applications like graphic design, where clear visual hierarchy between content and text is essential. Prior work has primarily focused on arranging layouts within existing static images, leaving unexplored the potential of T2I models for generating text-friendly backgrounds. We present TextCenGen, a training-free dynamic background adaptation in the blank region for text-friendly image generation. Instead of directly reducing attention in text areas, which degrades image quality, we relocate conflicting objects before background optimization. Our method analyzes cross-attention maps to identify conflicting objects overlapping with text regions and uses a force-directed graph approach to guide their relocation, followed by attention excluding constraints to ensure smooth backgrounds. Our method is plug-and-play, requiring no additional training while well balancing both semantic fidelity and visual quality. Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across four seed datasets, TextCenGen outperforms existing methods by achieving 23% lower saliency overlap in text regions while maintaining 98% of the semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Spatext: Spatio-textual representation for controllable image generation. In CVPR, pages 18370–18380, 2023.
  2. ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv preprint arXiv:2211.01324, 2022.
  3. Automatic stylistic manga layout. ACM Transactions on Graphics, 31(6):1–10, 2012.
  4. Layoutdm: Transformer-based diffusion model for layout generation. In CVPR, pages 18349–18358, 2023.
  5. Textdiffuser-2: Unleashing the power of language models for text rendering. arXiv preprint arXiv:2311.16465, 2023.
  6. Textdiffuser: Diffusion models as text painters. arXiv preprint arXiv:2305.10855, 2023.
  7. Seine: Short-to-long video diffusion model for generative transition and prediction. arXiv preprint arXiv:2310.20700, 2023.
  8. Clipscore. https://github.com/jmhessel/clipscore, 2022.
  9. Diffusion self-guidance for controllable image generation. Advances in Neural Information Processing Systems, 2023.
  10. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  11. Vinci: an intelligent graphic design system for generating advertising posters. In CHI, pages 1–17, 2021.
  12. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  13. Clipscore: A reference-free evaluation metric for image captioning. In EMNLP, pages 7514–7528, 2021.
  14. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  15. Unifying multimodal transformer for bi-directional image and text generation. In ICML, pages 1138–1147, 2021.
  16. Unifying layout generation with a decoupled diffusion model. In CVPR, pages 1942–1951, 2023.
  17. Layoutdm: Discrete diffusion model for controllable layout generation. In CVPR, pages 10167–10176, 2023.
  18. Towards flexible multi-modal document models. In CVPR, pages 14287–14296, 2023.
  19. Text2poster: Laying out stylized texts on retrieved images. In ICASSP, pages 4823–4827. IEEE, 2022.
  20. Layoutvae: Stochastic scene layout generation from a label set. In ICCV, pages 9895–9904, 2019.
  21. Layoutgan: Generating graphic layouts with wireframe discriminators. In International Conference on Learning Representations, 2019.
  22. Harmonious textual layout generation over natural images via deep aesthetics learning. IEEE Transactions on Multimedia, 24:3416–3428, 2022.
  23. Relation-aware diffusion model for controllable poster layout generation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, page 1249–1258, 2023.
  24. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597, 2023.
  25. Gligen: Open-set grounded text-to-image generation. In CVPR, pages 22511–22521, 2023.
  26. More control for free! image synthesis with semantic diffusion guidance. In WACV, pages 289–299, 2023.
  27. Glyphdraw: Learning to draw chinese characters in image synthesis models coherently. arXiv preprint arXiv:2303.17870, 2023.
  28. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  29. Freecontrol: Training-free spatial control of any text-to-image diffusion model with any condition. arXiv preprint arXiv:2312.07536, 2023.
  30. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML, pages 16784–16804. PMLR, 2022.
  31. Learning layouts for single-pagegraphic designs. IEEE Transactions on Visualization and Computer Graphics, 20:1200–1213, 2014.
  32. Localizing object-level shape variations with text-to-image diffusion models. In ICCV, 2023.
  33. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763. PMLR, 2021.
  34. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  35. High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
  36. Total variation based image restoration with free local constraints. In Proceedings of 1st international conference on image processing, volume 1, pages 31–35. IEEE, 1994.
  37. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  38. Make-a-video: Text-to-video generation without text-video data. In ICLR, 2023.
  39. Anytext: Multilingual visual text generation and editing. arXiv preprint arXiv:2311.03054, 2023.
  40. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  41. DiffusionDB: A large-scale prompt gallery dataset for text-to-image generative models. arXiv:2210.14896 [cs], 2022.
  42. Compositional text-to-image synthesis with attention map control of diffusion models. arXiv preprint arXiv:2305.13921, 2023.
  43. Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. In ICCV, pages 7452–7461, 2023.
  44. Brush your text: Synthesize any scene text on images via diffusion model. 2023.
  45. Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
  46. Uni-controlnet: All-in-one control to text-to-image diffusion models. Advances in Neural Information Processing Systems, 2023.
  47. Content-aware generative modeling of graphic design layouts. ACM Transactions on Graphics, 38(4):1–15, 2019.
  48. Composition-aware graphic layout gan for visual-textual presentation designs. In Proceedings of the Thirty-First International Joint Conference on, pages 4995–5001, 7 2022.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.