To view this video please enable JavaScript, and consider upgrading to a web browser that supports HTML5 video.

PaperBanana: Automating Scientific Illustration

This lightning talk presents PaperBanana, a multi-agent framework that automates the creation of publication-ready academic figures for AI scientists. The presentation covers the core problem of labor-intensive figure creation, introduces the novel reference-driven approach with five specialized agents, and demonstrates significant improvements in faithfulness, readability, and aesthetics on the new PaperBananaBench dataset.

Script

Imagine spending hours crafting the perfect methodology diagram, only to realize it needs complete redesign when your approach evolves. Even with autonomous AI scientist systems generating research, creating publication-ready figures remains frustratingly manual and time-intensive.

Building on this challenge, the researchers identify that scientific figures need to balance multiple competing demands. They must communicate complex technical content while meeting rigorous academic publication standards.

PaperBanana tackles this with a fundamentally different strategy.

The core insight is separating the problem into distinct roles that mirror how human researchers actually create figures. Rather than trying to solve everything at once, each agent has a focused responsibility.

This comparison reveals why existing approaches struggle with academic figures. The authors designed PaperBanana to address each limitation systematically through specialized components working together.

Let me walk you through the five-agent workflow that makes this possible.

The retriever agent makes a clever choice by focusing on structural similarity rather than topical match. A pipeline diagram from computer vision research can teach structure to a natural language processing methodology.

These agents tackle the crucial separation between content and presentation. The stylist automatically learns what makes figures look professional by analyzing visual patterns across the reference collection.

The critic agent closes the loop by checking if the generated figure actually communicates what was intended. For statistical plots, they discovered that generating code rather than images directly prevents numerical hallucinations.

To properly evaluate this approach, the researchers created the first dedicated benchmark for academic diagram generation.

Creating this benchmark required careful curation from 2000 papers, with human verification to ensure quality. The authors organized the data into agent reasoning, vision perception, generative learning, and science applications categories.

The evaluation framework carefully balances technical accuracy with visual quality. Faithfulness and readability receive primary weight, while conciseness and aesthetics provide secondary scoring.

The results demonstrate substantial improvements across all evaluation dimensions.

These improvements span from technical accuracy to visual polish, with particularly strong gains in reducing visual clutter. The human preference results validate that automated judges align well with human perception of figure quality.

The ablation studies reveal surprising insights about what drives performance. The finding that random retrieval works as well as semantic retrieval suggests that exposure to diverse diagram structures matters more than topical similarity.

For statistical plots, the researchers discovered an important trade-off between visual appeal and numerical accuracy. Code generation provides better faithfulness while image generation excels at aesthetic presentation.

The authors are transparent about current limitations that point toward future research directions.

These limitations highlight important areas for future work. The challenge of creating editable vector outputs remains significant, and current evaluation methods still struggle with subtle technical accuracy issues.

The framework opens several promising research directions, from technical improvements in output format to broader applications in other domains requiring visual communication standards. The paradigm of separating retrieval for structure from style summarization could prove valuable beyond academic publishing.

PaperBanana represents a significant step toward fully automated scientific publishing workflows by tackling the challenging problem of publication-ready figure generation. For more details about this research and other cutting-edge AI developments, visit EmergentMind.com to explore the full paper and stay updated on the latest breakthroughs.