Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
95 tokens/sec
Gemini 2.5 Pro Premium
32 tokens/sec
GPT-5 Medium
18 tokens/sec
GPT-5 High Premium
18 tokens/sec
GPT-4o
97 tokens/sec
DeepSeek R1 via Azure Premium
87 tokens/sec
GPT OSS 120B via Groq Premium
475 tokens/sec
Kimi K2 via Groq Premium
259 tokens/sec
2000 character limit reached

AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot (2503.06791v1)

Published 9 Mar 2025 in cs.RO, cs.AI, cs.HC, and cs.MA

Abstract: The social robot's open API allows users to customize open-domain interactions. However, it remains inaccessible to those without programming experience. In this work, we introduce AutoMisty, the first multi-agent collaboration framework powered by LLMs, to enable the seamless generation of executable Misty robot code from natural language instructions. AutoMisty incorporates four specialized agent modules to manage task decomposition, assignment, problem-solving, and result synthesis. Each agent incorporates a two-layer optimization mechanism, with self-reflection for iterative refinement and human-in-the-loop for better alignment with user preferences. AutoMisty ensures a transparent reasoning process, allowing users to iteratively refine tasks through natural language feedback for precise execution. To evaluate AutoMisty's effectiveness, we designed a benchmark task set spanning four levels of complexity and conducted experiments in a real Misty robot environment. Extensive evaluations demonstrate that AutoMisty not only consistently generates high-quality code but also enables precise code control, significantly outperforming direct reasoning with ChatGPT-4o and ChatGPT-o1. All code, optimized APIs, and experimental videos will be publicly released through the webpage: https://wangxiaoshawn.github.io/AutoMisty.html

Summary

  • The paper introduces a multi-agent framework that decomposes natural language instructions into executable code for the Misty social robot.
  • The paper refines 136 APIs and employs a dual-layer self-reflective feedback loop to enhance code accuracy and task success rates.
  • The paper demonstrates superior performance over baseline models by achieving a 100% task completion rate across 28 tasks of varying complexity.

The paper "AutoMisty: A Multi-Agent LLM Framework for Automated Code Generation in the Misty Social Robot" (2503.06791) introduces a framework designed to enable non-programmers to generate executable code for the Misty social robot using natural language instructions. The core problem addressed is the inaccessibility of robot programming, despite open APIs, for users without technical coding skills.

AutoMisty proposes a multi-agent collaboration framework powered by LLMs to tackle this. The framework consists of four specialized agent modules:

  1. Planner Agent: Analyzes the user's high-level instruction, decomposes it into manageable subtasks, creates a plan with an execution order, and assigns subtasks to the appropriate specialized agents. It also incorporates a human-in-the-loop step for plan validation.
  2. Action Agent: Handles tasks related to the robot's physical movements and actions.
  3. Touch Agent: Manages tasks triggered by or involving the robot's touch sensors.
  4. Audiovisual Agent: Deals with tasks involving audio processing (like speech recognition via Whisper) and visual processing, enabling the robot to "see" and "hear".

A key aspect of AutoMisty's implementation is the optimization of the Misty APIs. The authors found that the original APIs had issues like unclear descriptions and missing parameters, leading to LLM errors. They refined and restructured 136 APIs, adding comprehensive documentation (input/output formats, parameters, scenarios) to improve LLM comprehension and reduce hallucinations. These optimized APIs are then specifically assigned to the relevant subtask agents.

Each agent within the framework incorporates a two-layer optimization mechanism for robust code generation:

  • Layer 1 (Self-Reflective Feedback): Involves a Critic-Designer interaction where the Designer agent proposes solutions (using In-Context Learning with optimized APIs) and the Critic agent evaluates them for accuracy and feasibility. This forms an iterative loop for refinement until the Critic approves the output.
  • Layer 2 (Human-in-the-Loop): After Layer 1 approval, the generated output is presented to the user via a Drafter module. The user can provide feedback, triggering Layer 1 for further refinement if necessary, until the user is satisfied.

The framework also includes a memory module to store successful task executions and user preferences, improving teachability and future code generation consistency.

For practical deployment, AutoMisty uses a system verification step. Generated MistyCode is first checked in a local environment. If compilation or runtime errors occur, details are fed back to the Designer agent for debugging. Once the code passes local verification, it's sent to the Misty robot for execution. A User Proxy mechanism (inspired by AutoGen (Hossen et al., 29 Apr 2024)) allows the user to monitor the robot's real-world performance and provide feedback for further adjustments.

To evaluate AutoMisty, the authors created a benchmark of 28 tasks classified into four complexity levels: Elementary, Simple, Compound, and Complex. They compared AutoMisty against direct use of ChatGPT-4o and ChatGPT-o1 using metrics like Task Completion (TC), Number of User Interactions (NUI, broken down into User Preference (UPI) and Technical Correctness (TCI) interactions), Code Efficiency (CE), and User Satisfaction (US).

Experimental results showed that while all models performed well on Elementary and Simple tasks, AutoMisty significantly outperformed the direct LLM baselines on Compound and especially Complex tasks. AutoMisty achieved a 100% task completion rate across all complexity levels, whereas ChatGPT-4o failed on many complex tasks and ChatGPT-o1 failed on some. AutoMisty demonstrated higher robustness and adaptability. Ablation studies confirmed the importance of the Self-Reflective Feedback mechanism, showing improved task success rates on complex tasks when it is enabled, despite a slight increase in interaction count for those tasks. The teachability assessment showed the system's ability to learn and correctly retrieve previously saved user preferences for emotions, though some misclassifications occurred.

The authors highlight that AutoMisty's approach, leveraging optimized APIs and in-context learning within a multi-agent framework, provides strong generalization capabilities and allows for low-cost migration to other API-driven social robots. The framework effectively lowers the technical barrier for programming social robots, making customization accessible to non-technical users through natural language conversation. Future work aims to extend the framework to handle collaboration between multiple robots and humans.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com