Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 10 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 139 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems (2502.18836v1)

Published 26 Feb 2025 in cs.AI

Abstract: This benchmark suite provides a comprehensive evaluation framework for assessing both individual LLMs and multi-agent systems in real-world planning scenarios. The suite encompasses eleven designed problems that progress from basic to highly complex, incorporating key aspects such as multi-agent coordination, inter-agent dependencies, and dynamic environmental disruptions. Each problem can be scaled along three dimensions: the number of parallel planning threads, the complexity of inter-dependencies, and the frequency of unexpected disruptions requiring real-time adaptation. The benchmark includes detailed specifications, evaluation metrics, and baseline implementations using contemporary frameworks like LangGraph, enabling rigorous testing of both single-agent and multi-agent planning capabilities. Through standardized evaluation criteria and scalable complexity, this benchmark aims to drive progress in developing more robust and adaptable AI planning systems for real-world applications.

Summary

  • The paper presents REALM-Bench as a comprehensive benchmark that integrates LLMs with multi-agent systems for real-world planning challenges.
  • It details a sequential multi-agent pipeline using LangGraph to manage tasks, resource allocation, and real-time disruption handling in wedding logistics.
  • The implementation demonstrates modularity and scalability, offering actionable insights for dynamic scheduling and effective response to unexpected events.

LangGraph Implementation for Wedding Logistics

This appendix elucidates the implementation of a wedding logistics problem using LangGraph, focusing on scenarios both with and without real-time disruptions. The implementation leverages a multi-agent collaborative pipeline to manage various aspects of wedding planning, including location setup, task scheduling, resource management, constraint validation, and disruption handling.

Collaborative Agent Pipeline

The core of the implementation is a collaborative agent pipeline constructed using LangGraph. The pipeline consists of several specialized agents, each responsible for a specific aspect of the wedding logistics problem.

Agent Name Backstory Task Description Task Expected Output
Locations and Time Setup Agent Defines locations, travel times, and guest arrival schedules. Sets up locations, travel times, and ensures accurate scheduling of arrivals. Structured location data and expected arrival times.
Task Setup Agent Manages the scheduling of required wedding tasks. Schedules gift collection, clothes pickup, and photo session while adhering to temporal constraints. Optimized task schedule aligned with constraints.
Disruption Update Agent Monitors road closures and dynamically reroutes transportation as needed. Identifies road closures or unexpected disruptions and adjusts travel plans accordingly. Updated task schedule, ensuring minimal delays and timely arrivals with new updates.
Resource Management Agent Allocates available transport resources efficiently. Coordinates vehicle usage for guest transportation and task fulfiLLMent. Optimized vehicle allocation, ensuring timely arrivals.
Constraint Validation Agent Verifies all scheduling constraints to ensure smooth execution. Ensures all tasks are completed within operating hours and vehicle constraints are met. Validated schedule with no conflicts.
Wedding Event Oversight Agent Oversees the entire wedding logistics to ensure a smooth execution of tasks. Monitors and ensures all tasks are completed on time, resolving any logistical issues. A comprehensive wedding scheduling plan for people, tasks, and time.
Writer Agent Specializes in writing text into .json files. Writes the .json response into './p5_output.json'. A .json file containing the given string.

The agents are connected in a sequential pipeline, where the output of one agent serves as the input for the next. In the scenario without disruption, the pipeline flows as follows: LT_Agent >> TS_Agent >> RM_Agent >> CV_Agent >> WEO_Agent >> Writer_agent. When disruptions are considered, the DU_Agent is inserted into the pipeline to handle rerouting and rescheduling: LT_Agent >> TS_Agent >> DU_Agent >> RM_Agent >> CV_Agent >> WEO_Agent >> Writer_agent.

Meta Plan Implementation

The meta plan is implemented differently based on whether real-time search and disruption handling are included.

Without Real-time Search (Without Disruption)

In this scenario, the agents execute sequentially to generate a comprehensive wedding day schedule. The Locations and Time Setup Agent initializes the process by defining locations, travel times, and guest arrival schedules. The Task Setup Agent then creates an optimized task schedule, considering constraints such as gift collection after 12:00 PM and clothes pickup before 2:00 PM. The Resource Management Agent allocates vehicles and personnel to handle transportation needs. The Constraint Validation Agent ensures that all tasks are completed within the specified operating hours and vehicle constraints. Finally, the Wedding Event Oversight Agent consolidates all the information into a detailed scheduling plan. The Writer Agent then writes the final plan to a JSON file.

Without Real-time Search (With Disruption)

When disruptions are introduced, the Disruption Update Agent plays a crucial role in dynamically rerouting transportation and adjusting the task schedule. For instance, if a road closure occurs between locations B and G, the Disruption Update Agent reroutes guests Alex and Jamie through location T to reach their destination W. The agent updates the schedule to reflect these changes, ensuring minimal delays and timely arrivals.

Conclusion

The LangGraph implementation provides a flexible and collaborative approach to managing complex wedding logistics. By employing a multi-agent pipeline, the system can handle various aspects of wedding planning, from initial setup to dynamic disruption handling. The use of specialized agents allows for modularity and scalability, making it easier to adapt the system to different wedding scenarios and constraints.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube