Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 209 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Desigen: A Pipeline for Controllable Design Template Generation (2403.09093v1)

Published 14 Mar 2024 in cs.CV

Abstract: Templates serve as a good starting point to implement a design (e.g., banner, slide) but it takes great effort from designers to manually create. In this paper, we present Desigen, an automatic template creation pipeline which generates background images as well as harmonious layout elements over the background. Different from natural images, a background image should preserve enough non-salient space for the overlaying layout elements. To equip existing advanced diffusion-based models with stronger spatial control, we propose two simple but effective techniques to constrain the saliency distribution and reduce the attention weight in desired regions during the background generation process. Then conditioned on the background, we synthesize the layout with a Transformer-based autoregressive generator. To achieve a more harmonious composition, we propose an iterative inference strategy to adjust the synthesized background and layout in multiple rounds. We constructed a design dataset with more than 40k advertisement banners to verify our approach. Extensive experiments demonstrate that the proposed pipeline generates high-quality templates comparable to human designers. More than a single-page design, we further show an application of presentation generation that outputs a set of theme-consistent slides. The data and code are available at https://whaohan.github.io/desigen.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Saliency based image cropping. In Image Analysis and Processing–ICIAP 2013: 17th International Conference, Naples, Italy, September 9-13, 2013. Proceedings, Part I 17, pages 773–782. Springer, 2013.
  2. Variational Transformer Networks for Layout Generation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13637–13647. IEEE, 2021.
  3. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18208–18218, 2022.
  4. eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers, 2022.
  5. Geometry Aligned Variational Transformer for Image-conditioned Layout Generation. In Proceedings of the 30th ACM International Conference on Multimedia, pages 1561–1571, Lisboa Portugal, 2022. ACM.
  6. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14367–14376, 2021.
  7. Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
  8. Diffusion Models Beat GANs on Image Synthesis. In Advances in Neural Information Processing Systems, pages 8780–8794. Curran Associates, Inc., 2021.
  9. Frido: Feature pyramid diffusion for complex scene image synthesis. arXiv preprint arXiv:2208.13753, 2022.
  10. Exploring heterogeneous feature representation for document layout understanding. In Proceedings of the 34th IEEE International Conference on Tools with Artificial Intelligence, 2022.
  11. Make-a-scene: Scene-based text-to-image generation with human priors. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XV, pages 89–106. Springer, 2022.
  12. An image is worth one word: Personalizing text-to-image generation using textual inversion, 2022.
  13. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  14. The layout generation algorithm of graphic design based on transformer-cvae. 2021 International Conference on Signal Processing and Machine Learning, pages 219–224, 2021.
  15. Layouttransformer: Layout generation and completion with self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1004–1014, 2021.
  16. Relation-enhanced detr for component detection in graphic design reverse engineering. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 4785–4793, 2023.
  17. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  18. Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
  19. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  20. Denoising Diffusion Probabilistic Models. In Advances in Neural Information Processing Systems, pages 6840–6851. Curran Associates, Inc., 2020.
  21. Densitylayout: Density-conditioned layout gan for visual-textual presentation designs. In International Conference on Image and Graphics, pages 187–199. Springer, 2023a.
  22. Posterlayout: A new benchmark and approach for content-aware visual-textual presentation layout. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6018–6026, 2023b.
  23. Visual style extraction from chart images for chart restyling. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 7625–7632. IEEE, 2021.
  24. A survey for graphic design intelligence. arXiv preprint arXiv:2309.01371, 2023.
  25. Coarse-to-fine generative modeling for graphic layouts. Proceedings of the AAAI Conference on Artificial Intelligence, 36(1):1096–1103, 2022.
  26. LayoutVAE: Stochastic Scene Layout Generation From a Label Set. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9894–9903, 2019.
  27. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  28. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  29. Elucidating the Design Space of Diffusion-Based Generative Models, 2022.
  30. Constrained Graphic Layout Generation via Latent Optimization. In Proceedings of the 29th ACM International Conference on Multimedia, pages 88–96, 2021. arXiv: 2108.00871.
  31. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2426–2435, 2022.
  32. Blt: bidirectional layout transformer for controllable layout generation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pages 474–490. Springer, 2022.
  33. Neural Design Network: Graphic Layout Generation with Constraints. In Computer Vision – ECCV 2020: 16th European Conference, 2020, Proceedings, Part III, pages 491–506. Springer-Verlag, 2020.
  34. Harmonious Textual Layout Generation Over Natural Images via Deep Aesthetics Learning. IEEE Transactions on Multimedia, 24:3416–3428, 2022.
  35. Towards topic-aware slide generation for academic papers with unsupervised mutual learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 13243–13251, 2021.
  36. Relation-aware diffusion model for controllable poster layout generation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, page 1249–1258, New York, NY, USA, 2023a. Association for Computing Machinery.
  37. LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators. In ICLR, 2019. arXiv: 1901.06767.
  38. Attribute-conditioned layout gan for automatic graphic design. IEEE Transactions on Visualization and Computer Graphics, 27(10):4039–4048, 2020.
  39. Gligen: Open-set grounded text-to-image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22511–22521, 2023b.
  40. Autoposter: A highly automatic and content-aware design system for advertising poster generation. In Proceedings of the 31st ACM International Conference on Multimedia, pages 1250–1260, 2023.
  41. Spot the error: Non-autoregressive graphic layout generation with wireframe locator. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  42. Robust saliency-aware distillation for few-shot fine-grained visual recognition. IEEE Transactions on Multimedia, pages 1–14, 2024.
  43. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  44. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  45. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the 38th International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  46. READ: Recursive Autoencoders for Document Layout Generation. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 2316–2325, 2020.
  47. Basnet: Boundary-aware salient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7479–7489, 2019.
  48. U2-net: Going deeper with nested u-structure for salient object detection. Pattern recognition, 106:107404, 2020.
  49. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  50. Hierarchical text-conditional image generation with clip latents, 2022.
  51. Generative adversarial text to image synthesis. In International conference on machine learning, pages 1060–1069. PMLR, 2016.
  52. High-Resolution Image Synthesis With Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  53. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation, 2022.
  54. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
  55. Reverse-engineering information presentations: Recovering hierarchical grouping from layouts of visual elements. arXiv preprint arXiv:2201.05194, 2022.
  56. Denoising Diffusion Implicit Models. In ICLR. arXiv, 2022.
  57. Harmonized Banner Creation from Multimodal Design Assets. In CHI Conference on Human Factors in Computing Systems Extended Abstracts, pages 1–7, New Orleans LA USA, 2022. ACM.
  58. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  59. Aesthetic Text Logo Synthesis via Content-aware Layout Inferring. arXiv:2204.02701 [cs], 2022. arXiv: 2204.02701.
  60. Learn and sample together: Collaborative generation for graphic design layout. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 5851–5859, 2023.
  61. Canvasemb: Learning layout representation with large-scale pre-training for graphic design. In Proceedings of the 29th ACM international conference on multimedia, pages 4100–4108, 2021.
  62. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1316–1324, 2018.
  63. Kota Yamaguchi. Canvasvae: Learning to generate vector graphic documents. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5481–5489, 2021.
  64. Layout generation as intermediate sequence generation. In Proceedings of the AAAI Conference on Artificial Intelligence, 2023.
  65. Automatic generation of visual-textual presentation layout. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 12(2):1–22, 2016.
  66. Modeling image composition for complex scene generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7764–7773, 2022a.
  67. ReCo: Region-Controlled Text-to-Image Generation, 2022b.
  68. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  69. Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 5907–5915, 2017.
  70. Stackgan++: Realistic image synthesis with stacked generative adversarial networks. IEEE transactions on pattern analysis and machine intelligence, 41(8):1947–1962, 2018.
  71. Cross-modal contrastive learning for text-to-image generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 833–842, 2021.
  72. A broad generative network for two-stage image outpainting. IEEE Transactions on Neural Networks and Learning Systems, pages 1–15, 2023.
  73. Content-aware generative modeling of graphic design layouts. ACM Transactions on Graphics, 38(4):1–15, 2019.
  74. Composition-aware Graphic Layout GAN for Visual-Textual Presentation Designs. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, pages 4995–5001, Vienna, Austria, 2022.
  75. Unsupervised graphic layout grouping with transformers. In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision, 2024.
Citations (4)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.