Paper2Poster: Automated Poster Generation
- The paper introduces a systematic approach to transform scientific papers into coherent posters by integrating probabilistic graphical models, neural layout generators, and multi-agent pipelines.
- Paper2Poster is a comprehensive framework that converts textual and visual elements of academic papers into visually engaging posters using hierarchical encoding and retrieval-augmented strategies.
- It streamlines the design process by reducing manual poster creation bottlenecks while ensuring logical structure, content fidelity, and aesthetic consistency.
The Paper2Poster system is a family of computational methodologies and frameworks that automate the transformation of scientific papers into visually coherent, reader-engaging academic posters. Designed to optimize the summarization, integration, and layout of textual and visual content from long multimodal documents, Paper2Poster models leverage probabilistic graphical modeling, hierarchical multi-agent collaboration, neural layout generation, and retrieval-augmented design pipelines. These systems fundamentally address the bottleneck of manual poster creation while preserving scientific fidelity, logical structure, and the semantic interplay between elements (Qiang et al., 2016, Qiang et al., 2017, Sun et al., 21 May 2025, Inadumi et al., 27 Nov 2025, Choi et al., 29 Aug 2025).
1. Foundations and Historical Development
Automated poster generation was first formalized as a cognitive challenge in 2016–17 through data-driven frameworks using probabilistic graphical models (PGMs). In "Learning to Generate Posters of Scientific Papers" (Qiang et al., 2016) and its successor (Qiang et al., 2017), the process was decomposed into panel-level and within-panel models. These operate by inferring optimal panel layouts (size/aspect ratio) based on the importance and type of paper content, while positioning figures/tables via likelihood-weighted sampling subject to design constraints. This stage laid the groundwork for research pipelines mapping papers to readable, informative, aesthetically valid poster layouts using learned statistical priors. The NJU-Fudan Poster-Paper dataset provided an annotated corpus for parameter training and standardized benchmarking.
2. Agentic and Neural Architectures
Recent systems employ multi-agent pipelines, combining deep learning, LLMs, and high-resolution vision-language processing. P2P (Sun et al., 21 May 2025) introduced an agentic approach with dedicated agents for figure extraction (DocLayout-YOLO), text summarization (LLMs), and orchestration into HTML/CSS or editable PPTX formats. PosterGen (Zhang et al., 24 Aug 2025) mirrored professional designer workflows with a cascaded agent pipeline: parsing, curation, layout, styling, and rendering, each module tightly scoped and exchangeable via structured JSON.
PosterForest (Choi et al., 29 Aug 2025) advanced this paradigm through a training-free hierarchical intermediate—Poster Tree—encoding the nested structure and semantic linkage of document content and visual references, enabling iterative refinement via content and layout specialist agents. This structure achieves joint optimization of information density, logical consistency, and balanced spatial composition. Unlike prior systems, hierarchical encoding is explicit, ensuring correct placement of figures/tables with their explanatory text and preserving deep document structure.
3. Layout Modeling and Generation Techniques
Graphical model-based frameworks (Qiang et al., 2017, Qiang et al., 2016) infer panel sizes and aspect ratios using conditional Gaussians conditioned on normalized text and visual ratios, and solve for layouts via binary partition tree algorithms minimizing shape and symmetry penalties. Neural systems such as Text2Poster (Jin et al., 2023) and Scan-and-Print (Hsu et al., 27 May 2025) automate layout via cascaded auto-encoders (for region detection/distribution) and patch-based vertex-level augmentation to optimize placement fidelity and computational efficiency. Scan-and-Print utilizes a Vertex-Based Layout Representation (VLR), converting boxes to vertices for mixup-based augmentation, reducing encoder FLOPs by >95%, and achieving state-of-the-art metrics on PKU and CGL benchmarks.
Retrieval-augmented pipelines (SciPostGen (Inadumi et al., 27 Nov 2025)) use large paired datasets of paper-poster layouts. Document and poster encoders (DiT-base) map inputs to dense embeddings, with similarity-guided retrieval of structurally aligned exemplars. Conditioned on these, LLM-based generators synthesize layouts in HTML representation, optionally respecting user-specified constraints (partial element anchoring).
4. Content Selection, Summarization, and Paraphrasing
Automated content selection is performed through methods such as deep submodular optimization (PostDoc (Jaisankar et al., 30 May 2024)), which maximizes coverage and diversity in multimodal embedding space while ensuring alignment between textual and visual modalities. The maximization of the objective
produces extractive summaries of bounded length. These are further paraphrased using LLMs into topic-titled, bullet-pointed sections. Figure and table selection is performed by matching extraction scores, with paraphrased captions generated automatically or retained from ground truth.
5. Styling, Aesthetics, and Rendering
Poster design quality involves not only layout coherence but also typographic and color harmony. PosterCraft (Chen et al., 12 Jun 2025) employs a latent-diffusion backbone with a cascaded workflow: text-rendering pretraining (Text-Render-2M), region-aware supervised fine-tuning, aesthetic-text reinforcement learning (preference optimization), and vision-language feedback refinement. Losses are weighted for major/minor text regions, and optional layout tokens can bias the placement of figures/tables.
TextPainter (Gao et al., 2023) contributes visual-harmony and text-comprehension modules: a ResNet-34 style encoder and CLIP-based text encoder, fusing sentence-level and word-level semantic features into a style-GAN generator that harmonizes the rendered text with local and global backgrounds. Matching-based stylization (Text2Poster) maps text features to nearest style exemplars drawn from a clustered style library.
Rendering outputs span LaTeX Beamerposter, HTML/CSS, editable PPTX, and high-resolution PNG, with automated modules for conversion (LibreOffice headless).
6. Evaluation Protocols and Benchmarks
Evaluation is multidimensional:
- Quantitative: layout fidelity via mean Intersection-over-Union (mIoU), layout transportation similarity (LTSim), space utilization, occlusion and overlay minimization, text/visual region occupancy histograms.
- Qualitative: content fidelity, narrative, synergy, design coherence, font legibility, via VLM-as-Judge (GPT-4o, Claude).
- User studies: expert ratings vs. novice and original posters, measuring readability, informativeness, aesthetics, and overall engagement.
- Large instruction and benchmark sets: P2Pinstruct (30k+ examples), SciPostGen (18k+ paired annotations) support cross-domain and reference-free evaluation.
Recent systems demonstrate competitive or superior performance compared to human-created posters in content retention, layout consistency, and visual appeal. Ablations confirm critical gains from agentic reflection loops, hierarchical modeling, and domain-specific tuning.
7. Limitations and Future Directions
Current systems face challenges in cross-domain generalization (e.g., biology, physics), domain-aware figure description, advanced aesthetic modeling (fonts, color themes), and interactive design integration (tooltips, hyperlinks). Some pipelines require manual input for figure selection or refinement, and few support true dynamic or interactive poster outputs. Future work includes learned layout planning agents, poster-level style controllers, enhanced multimodal VLMs for scientific graphics, and semi-automatic workflows for author-guided draft editing. Continued standardization of benchmarks and expansion of paired datasets are essential for progress in automated poster generation.
Paper2Poster encapsulates the evolution from probabilistic modeling, through multi-agent, neural, and retrieval-augmented pipelines, culminating in flexible, high-fidelity systems that advance the state-of-the-art in automated scientific communication (Qiang et al., 2016, Qiang et al., 2017, Jin et al., 2023, Jaisankar et al., 30 May 2024, Sun et al., 21 May 2025, Choi et al., 29 Aug 2025, Hsu et al., 27 May 2025, Inadumi et al., 27 Nov 2025, Chen et al., 12 Jun 2025, Zhang et al., 24 Aug 2025, Gao et al., 2023).