Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Dual Process Theory in Language Agent Framework for Real-time Simultaneous Human-AI Collaboration (2502.11882v5)

Published 17 Feb 2025 in cs.AI, cs.CL, cs.HC, cs.LG, and cs.MA

Abstract: Agents built on LLMs have excelled in turn-by-turn human-AI collaboration but struggle with simultaneous tasks requiring real-time interaction. Latency issues and the challenge of inferring variable human strategies hinder their ability to make autonomous decisions without explicit instructions. Through experiments with current independent System 1 and System 2 methods, we validate the necessity of using Dual Process Theory (DPT) in real-time tasks. We propose DPT-Agent, a novel language agent framework that integrates System 1 and System 2 for efficient real-time simultaneous human-AI collaboration. DPT-Agent's System 1 uses a Finite-state Machine (FSM) and code-as-policy for fast, intuitive, and controllable decision-making. DPT-Agent's System 2 integrates Theory of Mind (ToM) and asynchronous reflection to infer human intentions and perform reasoning-based autonomous decisions. We demonstrate the effectiveness of DPT-Agent through further experiments with rule-based agents and human collaborators, showing significant improvements over mainstream LLM-based frameworks. DPT-Agent can effectively help LLMs convert correct slow thinking and reasoning into executable actions, thereby improving performance. To the best of our knowledge, DPT-Agent is the first language agent framework that achieves successful real-time simultaneous human-AI collaboration autonomously. Code of DPT-Agent can be found in https://github.com/sjtu-marl/DPT-Agent.

Leveraging Dual Process Theory (DPT) within language agent frameworks offers a structured approach to tackling the challenges inherent in real-time, simultaneous human-AI collaboration. Standard LLM-based agents often falter in such scenarios due to the high latency of large models and difficulties in autonomously inferring and adapting to dynamic human strategies without explicit instructions. The DPT-Agent framework (Zhang et al., 17 Feb 2025 ) provides a concrete implementation strategy by explicitly modeling the fast, intuitive System 1 and the slow, deliberative System 2 cognitive processes.

DPT-Agent Architecture for Real-time Collaboration

The core innovation of DPT-Agent lies in its hybrid architecture that decouples real-time action execution from slower, more complex reasoning processes, addressing the latency-capability trade-off observed when using LLMs alone for either System 1 or System 2 functions.

System 1: Fast, Reactive Control

System 1 is engineered for low-latency decision-making and continuous action output, crucial for interacting within dynamic, shared environments. Its implementation relies on:

  • Finite-state Machine (FSM): The FSM serves as the backbone of System 1. It operates based on pre-defined states (e.g., 'idle', 'fetching_ingredient', 'cooking', 'serving') and transitions triggered by environmental events or internal logic. The FSM continuously outputs "macro-actions" representing high-level goals suitable for the current state. This design ensures that the agent can always produce an action quickly, independent of potentially slow LLM inference times. The FSM provides a structured, predictable, and fast execution layer.
  • Code-as-Policy Generator: This component acts as the interface between System 2's deliberate reasoning and System 1's reactive execution. It receives high-level "Behavior Guidelines" and inferred human "Beliefs" from System 2. Based on these inputs and the current environmental state, it generates executable code snippets (specifically, Python lambda functions in the reference implementation). These code snippets dynamically modify the transition logic or conditional checks within the FSM. For instance, a generated lambda function might change the condition under which the FSM transitions from 'idle' to 'fetching_ingredient' based on the inferred human partner's likely next action.
  • Action Executor: This module translates the macro-actions emitted by the FSM into low-level, atomic actions executable within the specific environment (e.g., grid world movements). It may incorporate pathfinding algorithms (like A*) to navigate the environment efficiently to achieve the goal specified by the macro-action.

The FSM structure ensures constant reactivity, while the code-as-policy mechanism allows this reactivity to be intelligently guided and adapted by deeper reasoning, albeit with some delay inherent in the reasoning process itself.

System 2: Deliberative Reasoning and Adaptation

System 2 operates in parallel to System 1, performing computationally intensive reasoning tasks without blocking the agent's real-time interaction capabilities.

  • Theory of Mind (ToM) Module: Implemented using an LLM, this module analyzes the history of interaction data (states, joint actions, rewards) to infer the human collaborator's mental state, intentions, and strategies. It outputs a natural language "Belief" summarizing its understanding of the human partner (e.g., "Human tends to prioritize chopping onions first," "Human follows a fixed pattern for delivery"). This belief provides crucial context for both refining the agent's own strategy and coordinating actions via System 1. The quality of the ToM inference heavily depends on the capability of the underlying LLM.
  • Asynchronous Reflection Module: This LLM-driven module facilitates long-term strategy adaptation and self-improvement. It processes the interaction history, performance feedback (e.g., task scores), and the current ToM "Belief" to generate updated "Behavior Guidelines" in natural language. These guidelines represent the agent's high-level strategy (e.g., "Prioritize assisting the human with their current sub-task," "Focus on plating dishes while the human fetches ingredients"). Crucially, this reflection process runs asynchronously, typically in a separate thread or process. This ensures that the potentially time-consuming reasoning involved in reflection does not introduce latency into System 1's action loop. The updated guidelines are then fed to the Code-as-Policy generator to influence future FSM modifications.

Integration via Code-as-Policy

The synergy between System 1 and System 2 is achieved through the Code-as-Policy generator. System 2 produces high-level insights (Beliefs about the human, strategic Behavior Guidelines), which are abstract and not directly executable. The generator translates these insights into concrete, executable modifications for the low-level FSM logic.

Consider a simplified example in a collaborative cooking task:

  1. System 2 (ToM): Observes the human repeatedly fetching plates. Generates Belief: "Human is focusing on plating."
  2. System 2 (Reflection): Based on Belief and goal, generates Guideline: "Agent should focus on cooking remaining items."
  3. Code-as-Policy Generator: Receives Guideline. Generates a lambda function for the FSM's 'idle' state transition logic: lambda state, belief: belief.contains("plating") and state.items_to_cook > 0.
  4. System 1 (FSM): In the 'idle' state, the FSM evaluates the new lambda function. If the condition is met (human is believed to be plating, items need cooking), it transitions to the 'cooking' state, overriding any previous default behavior.

This mechanism allows the fast FSM to adapt its behavior based on sophisticated reasoning without requiring the LLM to be queried for every single action.

Implementation Considerations

Implementing the DPT-Agent framework requires careful consideration of several aspects:

  • FSM Design: The FSM must be carefully designed for the specific collaborative task, capturing the essential states and transitions. The granularity of macro-actions needs to balance flexibility and efficiency. Overly complex FSMs can be difficult to manage, while overly simple ones might limit the agent's capabilities.
  • Code Generation Robustness: The Code-as-Policy generator's reliance on LLMs means its output (lambda functions) might occasionally be syntactically incorrect, semantically flawed, or fail to capture the intended logic from the Behavior Guidelines. Robust error handling, validation, and potentially using more constrained code generation techniques or templates are necessary. The choice of LLM significantly impacts the success rate and quality of the generated code.
  • Asynchronous Architecture: Managing the asynchronous execution of System 2 (ToM and Reflection) requires careful engineering. Mechanisms for passing data (history, beliefs, guidelines) between the synchronous System 1 loop and the asynchronous System 2 processes must be efficient and thread-safe. The frequency of reflection updates needs tuning – too frequent may be computationally expensive, while too infrequent might lead to slow adaptation.
  • Latency Management: While System 1 ensures low action latency, the adaptation latency depends on System 2's processing time (ToM inference, reflection, code generation). Using high-capability LLMs for System 2 improves reasoning quality but increases this adaptation latency. Choosing the right LLM involves a trade-off between reasoning sophistication and the speed at which the agent can adapt its FSM logic.
  • Computational Resources: System 1 (FSM execution, pathfinding) is generally lightweight. System 2, however, requires access to potentially large LLMs, demanding significant computational resources (GPU memory, processing power) for inference, especially for the ToM and Reflection modules. Deployment might necessitate dedicated hardware or cloud-based LLM APIs.

Application and Performance

The DPT-Agent framework was evaluated in the Overcooked environment, a standard benchmark for multi-agent coordination that requires real-time, simultaneous actions in a shared workspace.

  • Experimental Results: The paper reported that DPT-Agent significantly outperformed several baseline LLM agent frameworks, including ReAct and Reflexion (even when augmented with an FSM for fair comparison). Performance gains were observed in objective metrics like game scores when collaborating with both rule-based agents and human players. Subjective evaluations also indicated human preference for collaborating with DPT-Agent.
  • Claim: The paper asserts that DPT-Agent is the first language agent framework demonstrated to achieve successful autonomous real-time simultaneous human-AI collaboration in this challenging setting, adapting its strategy based on inferred human behavior without explicit instructions.
  • Potential Applications: Beyond simulated environments like Overcooked, the DPT-Agent architecture holds promise for real-world applications requiring tight human-AI coordination in shared spaces. This includes:
    • Collaborative Robotics: Robots working alongside humans in manufacturing, assembly, or logistics, where the robot needs to anticipate and adapt to human actions in real-time.
    • Advanced Driver-Assistance Systems (ADAS): AI systems that need to interact seamlessly and predictably with human drivers.
    • Complex Simulations and Training: AI agents acting as realistic collaborators or opponents in training simulators.
    • Augmenting Legacy Systems: Using the System 2 (LLM reasoning) + Code-as-Policy approach to intelligently modulate the behavior of existing control systems (potentially based on FSMs or similar logic) without a full rewrite.

Limitations

The framework, as presented, has limitations:

  • LLM Dependency: Performance is heavily tied to the underlying LLM's ability to perform complex ToM reasoning, strategic reflection, and correct code generation. Weaker models may lead to significant performance degradation or errors.
  • Control Flexibility: Using lambda functions to modify FSM logic, while practical for current LLMs, might be less flexible or expressive than directly generating or modifying the FSM structure itself, although the latter is a harder generation task.
  • ToM Challenges: Reliable ToM inference remains a difficult task for LLMs, prone to errors or hallucinations.
  • Scope of Evaluation: Human studies were conducted on a limited scale, warranting broader validation.

Conclusion

The DPT-Agent framework (Zhang et al., 17 Feb 2025 ) offers a practical architectural pattern for building language agents capable of real-time, simultaneous collaboration with humans. By explicitly separating fast, reactive control (System 1 FSM) from slower, deliberative reasoning (System 2 LLM modules for ToM and Reflection) and integrating them via a Code-as-Policy mechanism, it addresses the critical latency and adaptability challenges. This DPT-inspired approach enables agents to maintain real-time responsiveness while leveraging the sophisticated reasoning capabilities of LLMs for autonomous adaptation and coordination in shared dynamic environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (13)
  1. Shao Zhang (18 papers)
  2. Xihuai Wang (11 papers)
  3. Wenhao Zhang (59 papers)
  4. Chaoran Li (6 papers)
  5. Junru Song (2 papers)
  6. Tingyu Li (6 papers)
  7. Lin Qiu (47 papers)
  8. Xuezhi Cao (24 papers)
  9. Xunliang Cai (63 papers)
  10. Wen Yao (61 papers)
  11. Weinan Zhang (322 papers)
  12. Xinbing Wang (98 papers)
  13. Ying Wen (75 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com