Understanding Human-AI Coordination Through the Lens of LLMs
The Challenge of Interaction Latency
AI agents powered by LLMs have showcased impressive capabilities. Their adoption is widespread, with implications across various sectors, including content creation, robotics, and more. Despite their promising advantages, a major challenge surfaces when considering the application in real-time scenarios, such as interactive gaming: inference latency. LLM-driven agents typically lean on LLM APIs coupled with complex prompts, causing latency periods ranging from several seconds to minutes. This latency critically undermines their effectiveness in domains calling for immediate interaction.
A Novel Hierarchical Language Agent
In response to this latency hurdle, researchers have introduced a Hierarchical Language Agent (HLA) adept in not only reasoning but also executing tasks in real time. HLA employs a hierarchical structure, combining a proficient LLM, referenced as Slow Mind, for intentional reasoning and language-based communication, with a lightweight LLM, termed Fast Mind, to initiate macro actions. Additionally, a reactive policy known as Executor translates these macro actions into executable atomic actions. This structure allows for an efficient parsing of human instructions into actionable commands, significantly enhancing human-AI collaboration in time-sensitive tasks.
Overcooked as a Real-Time Testbed
The validity of this approach has been gauged using the cooperative cooking game Overcooked as a testbed. Within this environment, the LLMs exhibit human-like cooperation through natural language communication, encountering frequent human commands like "Chop 3 tomatoes". The AI player promptly interprets and executes such instructions, showcasing the agent's responsiveness and comprehension of vague language cues. These operations are realized within time constraints, demonstrating effective real-time human-AI interaction.
HLA Outperforms Baselines in Human Studies
An empirical evaluation comparing HLA against other baseline agents—each lacking in specific HLA components—showed HLA's superior operational abilities. The human studies further quantified its performance, revealing that HLA outperformed baselines with a remarkable lead in game scores and demonstrated faster action responses. Human participants favored HLA significantly over the slow-mind-only and fast-mind-only agents, underscoring its enhanced cooperative skills, quick responsiveness, and consistent language communication.
In summation, the Hierarchical Language Agent puts forward a robust framework for real-time human-AI coordination tasks. The paper underscores the importance of hierarchical reasoning and planning within AI systems for applications demanding high-frequency interactions and swift responses. This paves a promising avenue for more dynamic and responsive AI-driven collaborations in various real-time applications.