Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics (2504.21716v1)

Published 30 Apr 2025 in cs.RO, cs.AI, and cs.CL

Abstract: We present an embodied robotic system with an LLM-driven agent-orchestration architecture for autonomous household object management. The system integrates memory-augmented task planning, enabling robots to execute high-level user commands while tracking past actions. It employs three specialized agents: a routing agent, a task planning agent, and a knowledge base agent, each powered by task-specific LLMs. By leveraging in-context learning, our system avoids the need for explicit model training. RAG enables the system to retrieve context from past interactions, enhancing long-term object tracking. A combination of Grounded SAM and LLaMa3.2-Vision provides robust object detection, facilitating semantic scene understanding for task planning. Evaluation across three household scenarios demonstrates high task planning accuracy and an improvement in memory recall due to RAG. Specifically, Qwen2.5 yields best performance for specialized agents, while LLaMA3.1 excels in routing tasks. The source code is available at: https://github.com/marc1198/chat-hsr.

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics

The paper presents a sophisticated approach to autonomous task planning through embodied agents in household robotics, utilizing LLMs to streamline object management and long-term memory integration. It addresses several critical obstacles in deploying robots in dynamic home environments, such as the unpredictability of user commands and the need for flexible object tracking and task execution.

Architecture Overview

The proposed framework comprises three specialized agents, orchestrated to fulfil user-defined commands effectively:

  1. Routing Agent: Determines the nature of incoming requests and assigns them to the appropriate subsequent agent for processing.
  2. Task Planning Agent: Operates on high-level user directives, formulating detailed, actionable task plans by leveraging in-context learning without explicit training. This agent benefits from robust scene understanding and contextual reasoning capabilities provided by LLMs.
  3. Knowledge Base Agent: Utilizes Retrieval-Augmented Generation (RAG) to address inquiries about past actions, thus ensuring reliable long-term memory recall and adaptive reasoning in changing environments.

The integration of Grounded SAM and Vision-LLMs (VLMs) facilitates semantic scene understanding by providing accurate object detection and scene segmentation, grounding task execution in real-world conditions.

Experimental Validation

The experimentation is structured into three scenarios: "Dining Table Cleanup," "Living Room Cleanup," and "Desk Organization," each representing common household tasks. These scenarios reveal the system's capability for:

  • Flexible Task Planning: Achieving a high degree of accuracy in assigning correct destinations for objects during cleanup maneuvers.
  • Memory Recall: Through RAG, the knowledge base agent efficiently retrieves information, enhancing the performance of follow-up queries related to past object handling, thus offering a potent solution for memory-based reasoning.
  • Agent Coordination: Effective routing of task-specific queries showcases the modularity and inter-agent communication efficiency within the orchestration system.

The numerical results from these evaluations demonstrate notable superiority of the Qwen2.5 model in specialized agent tasks, achieving a substantial overall task planning accuracy of 77.2% (strict) and 84.3% (lenient). Conversely, LLaMa3.1 model exhibits reliability in routing tasks due to its structured execution approach.

Implications and Future Directions

This work pioneers the integration of LLMs in dynamic, real-world environments, effectively utilizing their capabilities for expansive reasoning and interaction. Such integration promises enhanced flexibility in robotic task planning and opens avenues for further exploration into long-term memory mechanisms in AI systems.

Future developments should consider:

  • Enhanced multimodal integration of perception systems into agent orchestration to improve robustness and autonomy.
  • Exploration of user-centric evaluations to better adapt task planning algorithms to individual preferences and varied environmental setups.
  • Investigating structured memory models like scene graphs to complement RAG for improved memory recall efficiency in complex sequences.

These efforts could substantially elevate the practical usability of household robotics, further aligning the capabilities of AI with human-centric environments and tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Marc Glocker (1 paper)
  2. Peter Hönig (8 papers)
  3. Matthias Hirschmanner (4 papers)
  4. Markus Vincze (46 papers)
Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com