Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination (2312.15224v2)

Published 23 Dec 2023 in cs.AI and cs.HC
LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

Abstract: AI agents powered by LLMs have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.

Understanding Human-AI Coordination Through the Lens of LLMs

The Challenge of Interaction Latency

AI agents powered by LLMs have showcased impressive capabilities. Their adoption is widespread, with implications across various sectors, including content creation, robotics, and more. Despite their promising advantages, a major challenge surfaces when considering the application in real-time scenarios, such as interactive gaming: inference latency. LLM-driven agents typically lean on LLM APIs coupled with complex prompts, causing latency periods ranging from several seconds to minutes. This latency critically undermines their effectiveness in domains calling for immediate interaction.

A Novel Hierarchical Language Agent

In response to this latency hurdle, researchers have introduced a Hierarchical Language Agent (HLA) adept in not only reasoning but also executing tasks in real time. HLA employs a hierarchical structure, combining a proficient LLM, referenced as Slow Mind, for intentional reasoning and language-based communication, with a lightweight LLM, termed Fast Mind, to initiate macro actions. Additionally, a reactive policy known as Executor translates these macro actions into executable atomic actions. This structure allows for an efficient parsing of human instructions into actionable commands, significantly enhancing human-AI collaboration in time-sensitive tasks.

Overcooked as a Real-Time Testbed

The validity of this approach has been gauged using the cooperative cooking game Overcooked as a testbed. Within this environment, the LLMs exhibit human-like cooperation through natural language communication, encountering frequent human commands like "Chop 3 tomatoes". The AI player promptly interprets and executes such instructions, showcasing the agent's responsiveness and comprehension of vague language cues. These operations are realized within time constraints, demonstrating effective real-time human-AI interaction.

HLA Outperforms Baselines in Human Studies

An empirical evaluation comparing HLA against other baseline agents—each lacking in specific HLA components—showed HLA's superior operational abilities. The human studies further quantified its performance, revealing that HLA outperformed baselines with a remarkable lead in game scores and demonstrated faster action responses. Human participants favored HLA significantly over the slow-mind-only and fast-mind-only agents, underscoring its enhanced cooperative skills, quick responsiveness, and consistent language communication.

In summation, the Hierarchical Language Agent puts forward a robust framework for real-time human-AI coordination tasks. The paper underscores the importance of hierarchical reasoning and planning within AI systems for applications demanding high-frequency interactions and swift responses. This paves a promising avenue for more dynamic and responsive AI-driven collaborations in various real-time applications.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jijia Liu (5 papers)
  2. Chao Yu (116 papers)
  3. Jiaxuan Gao (14 papers)
  4. Yuqing Xie (24 papers)
  5. Qingmin Liao (52 papers)
  6. Yi Wu (171 papers)
  7. Yu Wang (939 papers)
Citations (24)