Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

GraphicBench: A Planning Benchmark for Graphic Design with Language Agents (2504.11571v1)

Published 15 Apr 2025 in cs.AI and cs.CL

Abstract: LLM-powered agents have unlocked new possibilities for automating human tasks. While prior work has focused on well-defined tasks with specified goals, the capabilities of agents in creative design tasks with open-ended goals remain underexplored. We introduce GraphicBench, a new planning benchmark for graphic design that covers 1,079 user queries and input images across four design types. We further present GraphicTown, an LLM agent framework with three design experts and 46 actions (tools) to choose from for executing each step of the planned workflows in web environments. Experiments with six LLMs demonstrate their ability to generate workflows that integrate both explicit design constraints from user queries and implicit commonsense constraints. However, these workflows often do not lead to successful execution outcomes, primarily due to challenges in: (1) reasoning about spatial relationships, (2) coordinating global dependencies across experts, and (3) retrieving the most appropriate action per step. We envision GraphicBench as a challenging yet valuable testbed for advancing LLM-agent planning and execution in creative design tasks.

Insightful Overview of GraphicBench and the GraphicTown Framework

The paper addresses the underexplored domain of LLMs (LLM)-powered agents performing complex, creative tasks with open-ended goals in the context of graphic design. It introduces a benchmark called GraphicBench and a framework named GraphicTown, devised to assess the planning capabilities of LLM agents across various graphic design tasks.

Core Contributions

The significant contribution of this work lies in its development of GraphicBench, a substantial testbed featuring 1,079 user queries and input images spanning four distinct graphic design categories: book covers, business cards, postcards, and posters. The dataset is meticulously curated to ensure diverse representation of design types and user queries, presenting a comprehensive benchmark for evaluating the planning prowess of LLM agents in this creative domain.

Furthermore, the paper presents GraphicTown, an LLM agent framework structured to execute graphic design planning and generation tasks. The framework comprises three specialized design expert agents, capable of utilizing a total of 46 defined actions to execute workflows. The paper outlines the proposed multi-step workflow involving generating design outlines, expert recruitment, and generating integrated workflows, culminating in the retrieval and execution of appropriate actions.

Experimental Evaluation

The experimental setup involves evaluating six LLMs on their ability to plan and execute design workflows. The findings suggest that while these models can effectively incorporate both explicit design constraints and implicit commonsense constraints in their planning, they frequently fall short of achieving successful execution outcomes. Common failure modes identified include difficulties in precise spatial reasoning, managing dependencies across experts, and retrieving suitable actions.

Numerical results exhibit high efficiency in step execution and expert use but highlight the inadequacies of the resulting design outcomes, signaling the need for improved multi-step reasoning capabilities in LLM agents.

Theoretical and Practical Implications

Theoretically, the findings underscore the complexity of translating user queries with high-level or vague requirements into detailed, executably accurate workflows. The challenge intensifies with the need to integrate commonsense reasoning and multi-agent coordination, drawing attention to critical areas for advancement in LLM-agent capabilities.

From a practical standpoint, GraphicBench and the GraphicTown framework provide pivotal tools for developing and benchmarking future LLMs in creative domains. They offer a structured environment to advance research in automating graphic design processes while highlighting current model limitations and guiding future architectural enhancements.

Future Developments

There is considerable scope for future research to refine the spatial reasoning and global dependency management capabilities of LLM agents. Moreover, incorporating interactive elements within the framework could allow agents to seek clarification, refine user requirements dynamically, and potentially improve execution efficacy.

In summary, the paper lays the groundwork for advancing the application of LLMs in creative task automation, with GraphicBench serving as a valuable benchmark facilitating the evaluation and enhancement of LLMs in planning complex workflows in graphic design.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Dayeon Ki (10 papers)
  2. Tianyi Zhou (172 papers)
  3. Marine Carpuat (56 papers)
  4. Gang Wu (143 papers)
  5. Puneet Mathur (22 papers)
  6. Viswanathan Swaminathan (15 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com