Poster Tree: Hierarchical Layouts

Updated 1 September 2025

Poster Tree is a hierarchical structure that encodes spatial, semantic, and relational design intents for poster layouts.
It integrates SVG primitives and vectorized shapes to map layout elements and organize design intent regions.
The framework supports multi-agent and LLM collaboration to refine content fidelity and achieve scalable poster generation.

A Poster Tree is a hierarchical, intermediate representation central to modern layout generation frameworks for both visual and scientific posters. It structurally encodes the spatial, semantic, and relational properties of poster components—text blocks, visual assets, and design intents—enabling logical consistency and content-aware synthesis by leveraging structured trees, vectorized shapes, and collaborative agent strategies. This concept is foundational in frameworks such as PosterO (Hsu et al., 6 May 2025) and PosterForest (Choi et al., 29 Aug 2025), where Poster Tree representations enable generalized and scalable poster generation across diverse genres and formats.

1. Conceptual Foundations of the Poster Tree

The Poster Tree formalizes poster layouts through hierarchical data structures that link spatial arrangement with underlying content and intent. In PosterO, the tree organizes layout primitives and design intent regions using scalable vector graphics (SVG) elements, such as <rect>, <ellipse>, and <path>, with node hierarchies reflecting containment relations and design purpose allocation. PosterForest, in contrast, builds the Poster Tree by merging two trees: a semantic content tree parsed from document structure (title, sections, paragraphs, assets) and a spatial layout tree encoding arrangement (panels, rows, columns, subpanels). Each node integrates both semantic and geometric attributes, preserving document logic and visual relationships.

This dual encoding ensures both frameworks:

Maintain the logical groupings native to input documents or image intent regions.
Capture spatial nesting, adjacency, and containment (e.g., underlay or enclosing SVG groups).
Support extensibility for diverse shape types and layout intent cues.

A plausible implication is that Poster Trees provide a universal interface for LLMs and agent systems to reason about and generate layouts according to both content and design constraints.

2. Tree Construction Methodologies

In PosterO, Poster Trees are constructed through three primary processes:

Universal Shape Vectorization: Layout elements are mapped to SVG primitives, supporting regular rectangles, vertical/rotated rectangles, ellipses, and complex curves via Bézier path abstractions.
Design Intent Vectorization: Regions available for content placement are extracted via semi-supervised U-Net detection, thresholded into polygons, and embedded into a latent intent space $Z_D$ .
Hierarchical Node Representation: The tree is built by assembling design intent nodes ( $N_D$ ) and element nodes ( $N_E$ ) and structuring parent-child relations based on spatial containment. The containment condition is mathematically defined:

where $\epsilon$ is a margin threshold.

PosterForest, by contrast, uses multi-agent parsing and summarization:

A parser agent ( $\mathcal{A}_{Parser}$ ) builds a Raw Document Tree from the input paper.
A summarization agent ( $\mathcal{A}_{Summ}$ ) prunes and merges the raw tree into a Content Tree.
A layout agent ( $\mathcal{A}_{Layout\_Init}$ ) produces a Layout Tree from the content tree, defining spatial regions.
The final Poster Tree, $\mathcal{T}_{poster}$ , merges content and layout by integrating semantic and positional information (see equations (1)-(4) as specified).

This explicit construction methodology is critical for aligning logical structure and visual arrangement, minimizing loss of information and avoiding spatial disorder.

3. LLMs and Multi-Agent Collaboration

PosterO leverages LLMs via in-context learning for layout inference:

Prompts are built by concatenating $k$ intent-aligned example trees, selected through nearest neighbor retrieval in the intent feature embedding space.
Each prompt includes descriptive context (image resolution, design intent area, node ids) and a postscript instruction for generating a new tree for the test case.
LLMs predict SVG-based tree layouts; post-processing via conversational edits enables conversion into full poster designs (HTML/text/image elements).

PosterForest implements node-wise multi-agent collaboration:

A Content Agent and Layout Agent analyze, summarize, and refine tree nodes through iterative feedback loops.
Content is summarized to fit spatial panels; layout is adjusted to balance visual organization, conform to aspect ratios, and prevent overflow or under-utilization.
Repeated refinement ensures the final Poster Tree achieves logical consistency, content fidelity, and visual coherence.

This hybrid agent-LLM approach allows for flexible adaptation to diverse input and design requirements, supporting both human evaluation and model-based judging.

4. Empirical Evaluation and Benchmarking

PosterO demonstrates state-of-the-art performance across benchmarks such as PKU PosterLayout and CGL. Key metrics include lower overlay ( $Ove\downarrow$ ) and higher underlay effectiveness ( $Und_l\uparrow$ , $Und_s\uparrow$ ), matching target statistics from real-world designs. Standardized intent and saliency metrics (Int, Sal) are defined by:

$Int = \Sigma \left( \frac{|Cov_{\hat{L}} - Cov_L|}{1 - Cov_L}, \frac{|Con_{\hat{L}} - Con_L|}{Con_L} \right)$

$Sal = \Sigma \left( \frac{|Uti_{\hat{L}} - Uti_L|}{1 - Uti_L}, \frac{|Occ_{\hat{L}} - Occ_L|}{Occ_L} \right)$

where $L$ and $\hat{L}$ denote ground-truth and generated layouts, and $Cov$ , $Con$ , $Uti$ , $Occ$ measure coverage, conflict, non-salient space utilization, and salient space occlusion.

PosterForest surpasses baselines such as P2P and Paper2Poster on content preservation (retaining figures/tables and avoiding excessive truncation), balanced panel sizing, and user preference ratings (including human and MLLM-as-Judge metrics). Ablation studies confirm the necessity of the hierarchical Poster Tree for maintaining order, readability, and spatial logic.

5. Generalized Poster Design and the Role of Datasets

To examine generalization, the PosterO framework introduces the PStylish7 dataset:

Seven poster purposes (artwork exhibition, cultural education, public safety, entertainment marketing, merchandising display, public advocacy, social-media interaction)
Eight element types, including four complex text variants (vertical, rotated, elliptical, and curved text)
152 few-shot learning samples and 100 test images, ensuring evaluations under sparse, diverse, and real-world conditions

The dataset requires layout generation models to address nontrivial shape variance and intent coverage/conflict measures. This provides a challenging environment for assessing the robustness and scalability of Poster Tree-based frameworks.

6. Impact and Future Research Directions

The Poster Tree’s integration of content semantics, spatial arrangement, and intent vectorization enables scalable and generalized layout generation for posters across multiple domains—artistic, informational, and scientific. The use of SVG representations, latent design intent embeddings, and multi-agent/LLM collaboration offers a flexible blueprint for future models.

A plausible implication is that further progress may involve:

More expressive and context-aware node attributes, enabling deeper design intent alignment.
Extension of the agent-LLM paradigm to multi-modal layouts and richer collaborations.
Expansion of datasets (such as PStylish7) for broader assessment of real-world variability.

The Poster Tree thus represents a convergence of hierarchical data modeling, geometric abstraction, and collaborative intelligence—establishing a core technical nexus in content-aware layout synthesis for posters in research and design environments.