PLay: Parametrically Conditioned Layout Generation using Latent Diffusion (2301.11529v2)
Abstract: Layout design is an important task in various design fields, including user interface, document, and graphic design. As this task requires tedious manual effort by designers, prior works have attempted to automate this process using generative models, but commonly fell short of providing intuitive user controls and achieving design objectives. In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices. Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user study. Moreover, it brings a novel and interactive experience to professional layout design processes.
- Variational transformer networks for layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13642–13652, 2021.
- Geometry aligned variational transformer for image-conditioned layout generation. In Proceedings of the 30th ACM International Conference on Multimedia, pp. 1561–1571, 2022.
- End-to-end object detection with transformers. In European conference on computer vision, pp. 213–229. Springer, 2020.
- Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Information Processing Systems, 33:16351–16361, 2020.
- Building-gan: Graph-conditioned architectural volumetric design generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11956–11965, 2021.
- Rico: A mobile app dataset for building data-driven design applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, pp. 845–854, 2017.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Layouttransformer: Layout generation and completion with self-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1004–1014, 2021.
- Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
- Solidgen: An autoregressive model for direct b-rep synthesis. arXiv preprint arXiv:2203.13944, 2022.
- Coarse-to-fine generative modeling for graphic layouts. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 1096–1103, 2022.
- Layoutvae: Stochastic scene layout generation from a label set. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9895–9904, 2019.
- Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- Blt: bidirectional layout transformer for controllable layout generation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVII, pp. 474–490. Springer, 2022.
- Neural design network: Graphic layout generation with constraints. In European Conference on Computer Vision, pp. 491–506. Springer, 2020.
- Learning to denoise raw mobile ui layouts for improving datasets at scale. In CHI Conference on Human Factors in Computing Systems, pp. 1–13, 2022.
- Layoutgan: Generating graphic layouts with wireframe discriminators. arXiv preprint arXiv:1901.06767, 2019.
- Attribute-conditioned layout gan for automatic graphic design. IEEE Transactions on Visualization and Computer Graphics, 27(10):4039–4048, 2020.
- Magic3d: High-resolution text-to-3d content creation. arXiv preprint arXiv:2211.10440, 2022.
- Learning design semantics for mobile apps. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, pp. 569–579, 2018.
- Novice-ai music co-creation via ai-steering tools for deep generative models. In Proceedings of the 2020 CHI conference on human factors in computing systems, pp. 1–13, 2020.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations, 2021.
- Symbolic music generation with diffusion models. arXiv preprint arXiv:2103.16091, 2021.
- Monedero, J. Parametric design: a review and some experiences. Automation in construction, 9(4):369–377, 2000.
- House-gan: Relational generative adversarial networks for graph-constrained house layout generation. In European Conference on Computer Vision, pp. 162–177. Springer, 2020.
- House-gan++: Generative adversarial layout refinement network towards intelligent computational agent for professional architects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13632–13641, 2021.
- Diverse multimedia layout generation with multi choice learning. In Proceedings of the 29th ACM International Conference on Multimedia, pp. 218–226, 2021.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- Gaugan: semantic image synthesis with spatially adaptive normalization. In ACM SIGGRAPH 2019 Real-Time Live!, pp. 1–1. 2019.
- Read: Recursive autoencoders for document layout generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 544–545, 2020.
- Film: Visual reasoning with a general conditioning layer. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695, 2022.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–10, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pp. 2256–2265. PMLR, 2015.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Self-conditioned embedding diffusion for text generation. arXiv preprint arXiv:2211.04236, 2022.
- Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence, 2017.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Inferring cad modeling sequences using zone graphs. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6062–6070, 2021.
- Skexgen: Autoregressive generation of cad construction sequences with disentangled codebooks. arXiv preprint arXiv:2207.04632, 2022.
- Yamaguchi, K. Canvasvae: Learning to generate vector graphic documents. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5481–5489, 2021.
- Layoutdetr: Detection transformer is a good multimodal layout designer. arXiv preprint arXiv:2212.09877, 2022.
- Publaynet: largest dataset ever for document layout analysis. In 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE, 2019.
- Composition-aware graphic layout gan for visual-textual presentation designs. arXiv preprint arXiv:2205.00303, 2022.
- Generative visual manipulation on the natural image manifold. In European conference on computer vision, pp. 597–613. Springer, 2016.
- Chin-Yi Cheng (21 papers)
- Forrest Huang (10 papers)
- Gang Li (579 papers)
- Yang Li (1142 papers)