Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models (2310.03903v2)

Published 5 Oct 2023 in cs.CL and cs.MA
LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Abstract: The emergent reasoning and Theory of Mind (ToM) abilities demonstrated by LLMs make them promising candidates for developing coordination agents. In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed analysis of LLMs within the context of Pure Coordination Games, where participating agents need to cooperate for the most gain. This benchmark evaluates LLMs through two distinct tasks: (1) \emph{Agentic Coordination}, where LLMs act as proactive participants for cooperation in 4 pure coordination games; (2) \emph{Coordination Question Answering (QA)}, where LLMs are prompted to answer 198 multiple-choice questions from the 4 games for evaluation of three key reasoning abilities: Environment Comprehension, ToM Reasoning, and Joint Planning. Furthermore, to enable LLMs for multi-agent coordination, we introduce a Cognitive Architecture for Coordination (CAC) framework that can easily integrate different LLMs as plug-and-play modules for pure coordination games. Our findings indicate that LLM agents equipped with GPT-4-turbo achieve comparable performance to state-of-the-art reinforcement learning methods in games that require commonsense actions based on the environment. Besides, zero-shot coordination experiments reveal that, unlike RL methods, LLM agents are robust to new unseen partners. However, results on Coordination QA show a large room for improvement in the Theory of Mind reasoning and joint planning abilities of LLMs. The analysis also sheds light on how the ability of LLMs to understand their environment and their partner's beliefs and intentions plays a part in their ability to plan for coordination. Our code is available at \url{https://github.com/eric-ai-lab/LLM_coordination}.

Insights into Multi-Agent Coordination with LLMs

This paper investigates the potential of LLMs in facilitating multi-agent coordination, a critical component of collaborative artificial intelligence applications. The authors present an LLM-Coordinated Framework (LLM-Co) as a method for enabling LLMs to engage effectively in coordination games. They explore five pertinent aspects of coordination: Theory of Mind (ToM), Situated Reasoning, Sustained Coordination, Robustness to Partners, and Explicit Assistance. This paper evaluates LLMs in various game environments, highlighting their strengths and limitations.

LLM-Coordination Framework

The LLM-Co Framework is designed to enable LLMs, like GPT-4, to interact and perform tasks in dynamic multi-agent game environments. It provides a structured approach by translating game details and rules into a textual format that LLMs can process effectively. The framework supports continuous gameplay across environments by helping LLMs infer actionable steps based on the current game state and the feasible actions available.

Game Environments and Evaluations

The evaluations were conducted in three game environments: Collab Escape, Collab Capture, and Overcooked-AI. Each environment presents unique challenges requiring agents to display Theory of Mind, sustained coordination over extended tasks, and the ability to assist explicitly.

  1. Theory of Mind and Situated Reasoning: The paper introduced an LLM-ToM-Reasoning Test Set to measure the ToM and situated reasoning capabilities of LLMs. It was observed that GPT-4 outperforms others, approaching near-human reasoning levels, demonstrating its capacity to accurately predict partner intentions.
  2. Sustained Coordination: The LLM-Co agents, particularly those using GPT-4, were capable of sustained coordination, outperforming existing RL-based methods in coordination-heavy tasks without pre-training or task-specific fine-tuning.
  3. Robustness to Partners: LLM-Co agents were evaluated against varied partner types, including RL baselines trained with human data. Results show that they adaptively align with partner behavior without compromising coordination efficiency.
  4. Explicit Assistance: The paper explored scenarios requiring proactive help to enhance joint task completion effectiveness. They introduced specific Overcooked-AI layouts that require explicit assistance, demonstrating the adaptability of LLM-Co agents to these requirements with appropriate directive prompts.

Implications and Future Developments

The positive outcomes from this research indicate a promising direction for using LLMs in collaborative AI agents. They can process complex instructions, adapt to unforeseen partner actions, and execute long-term plans, making them suitable for real-world multi-agent tasks. Future work will likely explore the scalability of such frameworks across diversified agents and environments, potentially integrating real-world variables and constraints.

Conclusion

This paper underscores the utility of LLMs, principally GPT-4, in multi-agent coordination. By developing structured frameworks like LLM-Co and evaluating them against comprehensive scenarios, the research highlights the emergent reasoning capabilities of LLMs in collaborative tasks. These findings lay the groundwork for LLMs to serve as reliable agents in both virtual and real-world applications requiring sophisticated coordination.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2019.103216. URL https://www.sciencedirect.com/science/article/pii/S0004370219300116.
  2. On the Utility of Learning about Humans for Human-AI Coordination. Curran Associates Inc., Red Hook, NY, USA, 2019a.
  3. overcooked_ai. https://github.com/HumanCompatibleAI/overcooked_ai/tree/master, 2019b.
  4. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023. URL https://lmsys.org/blog/2023-03-30-vicuna/.
  5. Threedworld: A platform for interactive multi-modal physical simulation, 2021.
  6. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022.
  7. Population based training of neural networks, 2017.
  8. Two body problem: Collaborative visual task completion. In CVPR, 2019. first two authors contributed equally.
  9. A cordial sync: Going beyond marginal policies for multi-agent embodied tasks. In ECCV, 2020. first two authors contributed equally.
  10. Michal Kosinski. Theory of mind might have spontaneously emerged in large language models, 2023.
  11. Cooperative open-ended learning framework for zero-shot coordination. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 20470–20484. PMLR, 2023. URL https://proceedings.mlr.press/v202/li23au.html.
  12. Code as policies: Language model programs for embodied control. In arXiv preprint arXiv:2209.07753, 2022.
  13. Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, pp.  679–688, Richland, SC, 2023. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450394321.
  14. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pp.  6382–6393, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
  15. Roco: Dialectic multi-robot collaboration with large language models, 2023.
  16. OpenAI. Gpt-4 technical report, 2023.
  17. Training language models to follow instructions with human feedback, 2022.
  18. Generative agents: Interactive simulacra of human behavior, 2023.
  19. Watch-and-help: A challenge for social perception and human-{ai} collaboration. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=w_7JMpGZRh0.
  20. Planning with large language models via corrective re-prompting, 2022.
  21. Proximal policy optimization algorithms, 2017.
  22. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088, 2022.
  23. Collaborating with humans without human data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  14502–14515. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/797134c3e42371bb4979a462eb2f042a-Paper.pdf.
  24. Voyager: An open-ended embodied agent with large language models, 2023.
  25. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  26. Too many cooks: Bayesian inference for coordinating multi-agent collaboration. Topics in Cognitive Science, 13(2):414–432, 2021. doi: https://doi.org/10.1111/tops.12525. URL https://onlinelibrary.wiley.com/doi/abs/10.1111/tops.12525.
  27. Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning, 2023.
  28. Learning zero-shot cooperation with humans, assuming humans are biased. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=TrwE8l9aJzs.
  29. Building cooperative embodied agents modularly with large language models, 2023.
  30. Maximum entropy population-based training for zero-shot human-ai coordination. Proceedings of the AAAI Conference on Artificial Intelligence, 37(5):6145–6153, Jun. 2023. doi: 10.1609/aaai.v37i5.25758. URL https://ojs.aaai.org/index.php/AAAI/article/view/25758.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Saaket Agashe (5 papers)
  2. Yue Fan (46 papers)
  3. Xin Eric Wang (74 papers)
  4. Anthony Reyna (1 paper)
Citations (5)