iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models

Published 31 Mar 2026 in cs.HC and cs.AI | (2603.29469v1)

Abstract: We present iPoster, an interactive layout generation framework that empowers users to guide content-aware poster layout design by specifying flexible constraints. iPoster enables users to specify partial intentions within the intention module, such as element categories, sizes, positions, or coarse initial drafts. Then, the generation module instantly generates refined, context-sensitive layouts that faithfully respect these constraints. iPoster employs a unified graph-enhanced diffusion architecture that supports various design tasks under user-specified constraints. These constraints are enforced through masking strategies that precisely preserve user input at every denoising step. A cross content-aware attention module aligns generated elements with salient regions of the canvas, ensuring visual coherence. Extensive experiments show that iPoster not only achieves state-of-the-art layout quality, but offers a responsive and controllable framework for poster layout design with constraints.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a unified framework that combines graph neural networks with diffusion models to generate high-quality, user-controlled poster layouts.
The methodology employs a dual graph architecture to encode spatial relationships and enforce multiple design constraints through iterative masking.
Experimental results show significant improvements in layout integrity and efficiency over GAN and transformer baselines on benchmark datasets.

iPoster: Content-Aware Layout Generation for Interactive Poster Design via Graph-Enhanced Diffusion Models

Introduction and Motivation

The paper introduces iPoster, a content-aware, user-interactive poster layout generation system that fuses graph neural networks (GNNs) with denoising diffusion probabilistic models. The objective is to overcome two fundamental limitations of previous approaches: (1) the inability to integrate explicit user constraints and partial edits into the generation process, and (2) persistent layout artifacts such as occlusion of salient regions, undesirable overlap, and element misalignments. iPoster targets practical poster and graphic layout design workflows, enabling users to encode element-level intentions—ranging from full specification of size/position, content categories, coarse drafts, or partial anchor elements—and then leveraging a unified model architecture to synthesize or refine layout configurations that satisfy these constraints.

System Architecture

iPoster’s architecture comprises two key modules: (1) a flexible user interaction module allowing arbitrary constraint composition, and (2) a generation module, built on a graph-enhanced conditional diffusion process, capable of real-time layout synthesis and iterative refinement.

The system encodes the relevant layout context by processing (i) the poster canvas and its visual saliency map via a ViT-based encoder, (ii) the element-level layout specification, and (iii) extracted bounding boxes of salient regions. To model spatial/topological relationships, iPoster constructs two task-specific graphs: a fully connected Bbox-Layout Module (BLM) and a spatially-gridded Image-Layout Module (ILM), each capturing different relationships among layout elements, salient regions, and image patches.

Figure 1: The overall training framework of the content-aware layout generation model, highlighting the Cross Content-aware Attention Module.

This information is fused in a Cross Content-aware Attention Module, implemented with GNNs. BLM captures high-level spatial alignments and saliency avoidance (through fully connected interactions with saliency nodes), while ILM enables fine-grained interaction between spatially adjacent patches and layout elements. The output features guide the diffusion model to sample context-aligned and constraint-compliant layouts.

Figure 2: The construction process of $G_\mathrm{BLM}$ and $G_\mathrm{ILM}$ , showing the connectivity among layout elements, saliency region nodes, and image patches.

Unified Constraint Incorporation

A primary technical contribution is the formalization of four recurrent constraint paradigms motivated by graphic design practice: (i) category-to-size+position ( $C \rightarrow S + P$ ), (ii) category+size-to-position ( $C+S \rightarrow P$ ), (iii) Completion with partial fixed anchors, and (iv) Refinement of an initial coarse layout.

During both training and inference, constraints are injected through masking strategies. User-specified fields are preserved with a binary mask at each denoising step, while unconstrained attributes are generated by the model. This design enables a single unified model to handle a combinatorial space of constraint types, avoiding task-specific re-training or architectural changes.

Figure 3: iPoster’s interactive framework, showing end-user constraint injection and iterative mask application during denoising-based layout synthesis.

Experimental Evaluation

The system’s efficacy is evaluated on CGL and PKU benchmark datasets with a suite of content and graphic metrics: Occlusion (Occ) for saliency overlap, Readability (Rea) scores for text region clarity, and several metrics (Loose/Strict Underlay Validness and Overlay) for layout integrity and element overlap minimization.

Across all constraint modes, iPoster outperforms prior GAN, transformer, and retrieval-augmented baselines. Notably, iPoster consistently achieves the lowest overlay (Ove) scores, demonstrating a significant reduction in spatial overlap compared to CGL-GAN, RALF, and LayoutDiT. For example, on PKU under $C \rightarrow S + P$ , Overlay drops to 0.0018 versus 0.0029 (LayoutDiT) and 0.0095 (RALF). Similar trends are observed for occlusion, with iPoster maintaining lower or comparable Occ and Rea scores, signifying robust avoidance of salient region masking and improved layout readability.

Figure 4: Test examples for each constrained generation task, illustrating user intent adherence and high-quality, collision-free layouts.

From a computational perspective, iPoster (33M parameters, ~1.1 sec per sample on A100) is significantly more efficient than LLM-based systems (e.g., PosterLlama or PosterO at 8B parameters, >7 sec/sample) due to its lightweight architecture and fully unified constraint handling. This efficiency directly supports interactive design.

Practical Integration and User Scenarios

The paper explores end-to-end user workflows, where designers incrementally build posters by specifying constraints, receiving immediate model feedback, and optionally refining or sampling alternative layouts. The system generates multi-candidate layouts and automates the rendering of finalized poster visuals from the structure output.

Figure 5: Application scenario showing user interface for constraint specification, candidate generation, and final rendered poster outcomes.

Discussion, Limitations, and Future Directions

iPoster advances state-of-the-art controllable layout generation by integrating GNN-driven spatial reasoning with diffusion-based generative modeling and a highly flexible constraint mechanism. It achieves strong alignment between user intent and automatic layout generation, supporting diverse practical workflows and outperforming transformer or GAN baselines in both quality and efficiency.

Remaining limitations include modeling only intermediate/hierarchical visual attributes and not capturing higher-level design semantics or deep style transfer—limiting expressivity in advanced design contexts. Potential future directions include: hierarchical or multi-scale graph representations, richer interaction paradigms such as sketch-based or natural language constraint input, and tighter coupling with post-layout rendering engines for seamless end-to-end design automation.

Conclusion

iPoster establishes a robust, scalable, and user-centric framework for interactive graphic layout generation, bridging the gap between direct user manipulation and intelligent automation. By leveraging graph-enhanced diffusion models and a unified masking-based interface for arbitrary constraint handling, it delivers high-quality, responsive, and semantically consistent layout synthesis suitable for both professional and novice designers. Its architectural principles provide a foundation for further advances in controllable design generation and human-AI co-creation systems.