Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MindAgent: Emergent Gaming Interaction (2309.09971v2)

Published 18 Sep 2023 in cs.AI, cs.HC, and cs.MA

Abstract: LLMs have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs collaborations. In this work, we propose a novel infrastructure - MindAgent - to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we introduce CUISINEWORLD, a new gaming scenario and related benchmark that dispatch a multi-agent collaboration efficiency and supervise multiple agents playing the game simultaneously. We conduct comprehensive evaluations with new auto-metric CoS for calculating the collaboration efficiency. Finally, our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CUISINEWORLD and adapted in existing broader Minecraft gaming domain. We hope our findings on LLMs and the new infrastructure for general-purpose scheduling and coordination can help shed light on how such skills can be obtained by learning from large language corpora.

MindAgent: Emergent Gaming Interaction

The paper "MindAgent: Emergent Gaming Interaction" introduces a novel gaming infrastructure named MindAgent, which aims to explore the potential of LLMs for planning and coordination in multi-agent systems. The primary objective is to enhance collaboration among agents in gaming environments, fostering efficient task completion without tailored fine-tuning of the models. This is achieved by leveraging the inherent capabilities of LLMs, typically trained on diverse text corpora, to comprehend and execute multi-agent tasks within dynamic and interactive gaming settings.

Overview of MindAgent

MindAgent's design capitalizes on the emergent capabilities of LLMs, utilizing minimalistic prompting strategies to harness multi-agent task management and coordination. The infrastructure encompasses a flexible framework that integrates with existing gaming systems, enabling seamless inclusion of human and Non-Player Characters (NPCs) within the setup. A significant part of its innovation lies in its ability to evaluate planning competencies using new benchmarks, prominently through 'CuisineWorld,' a diverse virtual kitchen scenario that emulates collaborative cooking tasks. The paper introduces the collaboration score (CoS) metric, which quantifies the efficiency of task completion amidst varying and competing objectives.

Key Contributions

Given the paper's extensive examinations and findings, the contributions can be summarized as follows:

  1. CuisineWorld Benchmark: A detailed gaming environment, CuisineWorld serves as an interactive and robust platform for testing LLMs' planning capabilities. This scenario comprises multiple tasks, varying complexities, and necessitates the coordination of multi-agent teams, enhancing the versatility of LLM applications in gaming.
  2. MindAgent Infrastructure: Central to the paper is the introduction of MindAgent, a novel infrastructure facilitating LLM-driven agent coordination in dynamic environments. By leveraging in-context learning, it advances multi-agent planning without extensive fine-tuning, optimizing efficiency across multi-task settings.
  3. Comprehensive Evaluation: The framework evaluates GPT-4, Claude, and LLaMA, demonstrating the varying degrees of collaboration efficiency these models achieve when guided by the infrastructure. Moreover, human collaboration experiments illustrate the practical applications and scalability of MindAgent in interactive human-AI gaming.
  4. Generalization and Adaptability: The paper extends beyond virtual kitchens, adapting MindAgent's techniques to real-world gaming domains like Minecraft. This exemplifies the system's potential to generalize across different gaming scenarios and interact with VR systems, indicating a broad scope for future applications.

Results and Implications

The evaluation signifies that LLMs, particularly GPT-4, exhibit emerging collaboration capabilities in multi-agent settings. These models achieve substantial task completion rates, especially when aided by structured prompts and environmental feedback. Notably, GPT-4's performance in dispatching agents demonstrates emergent task comprehension and dynamic task prioritization. Furthermore, the integration into Minecraft underscores the adaptability of the framework across varied gaming environments, showcasing its potential for broader gaming applications.

The research emphasizes how LLMs can assume generalist roles in multi-agent planning, potentially paving the way for more flexible game AI that learns through interaction rather than static datasets. The exploration into voice-chat collaboration and VR integration hints at a future where human-AI interplay becomes more immersive and seamless. By incorporating these insights, game developers can exploit the efficiencies of LLMs, ultimately crafting games with enhanced interactivity and player engagement.

Conclusion

The development of MindAgent represents a pivotal step forward in understanding and optimizing LLM functionalities within gaming contexts. This paper's contributions not only elucidate the inherent planning and coordination capabilities of LLMs but also underscore their emerging potential to revolutionize gaming AI. The insights gleaned from MindAgent and CuisineWorld benchmarks can significantly impact future gaming systems, auguring a more collaborative AI framework adaptable across various interactive domains. This research thus serves as a critical foundation for subsequent studies, promising significant advancements in AI-driven gaming mechanics and player experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Do as i can and not as i say: Grounding language in robotic affordances. In arXiv preprint arXiv:2204.01691, 2022.
  2. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
  3. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  4. Can gpt-3 perform statutory reasoning? arXiv preprint arXiv:2302.06100, 2023.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
  7. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
  8. Chatgpt goes to law school. Available at SSRN, 2023.
  9. Textworld: A learning environment for text-based games. In Computer Games: 7th Workshop, CGW 2018, Held in Conjunction with the 27th International Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, July 13, 2018, Revised Selected Papers 7, pp. 41–75. Springer, 2019.
  10. Mind2web: Towards a generalist agent for the web. arXiv preprint arXiv:2306.06070, 2023.
  11. Joint mind modeling for explanation generation in complex human-robot collaborative tasks. In 2020 29th IEEE international conference on robot and human interactive communication (RO-MAN), pp.  1119–1126. IEEE, 2020.
  12. Dialfred: Dialogue-enabled agents for embodied instruction following. IEEE Robotics and Automation Letters, 7(4):10049–10056, 2022.
  13. Ark: Augmented reality with knowledge interactive emergent ability. arXiv preprint arXiv:2305.00970, 2023.
  14. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp.  9118–9147. PMLR, 17–23 Jul 2022a. URL https://proceedings.mlr.press/v162/huang22a.html.
  15. Inner monologue: Embodied reasoning through planning with language models. In arXiv preprint arXiv:2207.05608, 2022b.
  16. Mathprompter: Mathematical reasoning using large language models. arXiv preprint arXiv:2303.05398, 2023.
  17. Two body problem: Collaborative visual task completion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  6689–6699, 2019.
  18. Chatgpt makes medicine easy to swallow: An exploratory case study on simplified radiology reports. arXiv preprint arXiv:2212.14882, 2022.
  19. A comprehensive taxonomy for multi-robot task allocation. The International Journal of Robotics Research, 32(12):1495–1512, 2013.
  20. Code as policies: Language model programs for embodied control. In arXiv preprint arXiv:2209.07753, 2022.
  21. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023.
  22. Embodied multi-agent task planning from ambiguous instruction. Proceedings of robotics: science and systems, New York City, NY, USA, pp.  1–14, 2022.
  23. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  24. Large language models as general pattern machines. arXiv preprint arXiv:2307.04721, 2023.
  25. John J Nay. Law informs code: A legal informatics approach to aligning artificial intelligence with humans. Nw. J. Tech. & Intell. Prop., 20:309, 2022.
  26. Putting chatgpt’s medical advice to the (turing) test. medRxiv, pp.  2023–01, 2023.
  27. Teach: Task-driven embodied agents that chat. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  2017–2025, 2022.
  28. Generative agents: Interactive simulacra of human behavior. arXiv preprint arXiv:2304.03442, 2023.
  29. Watch-and-help: A challenge for social perception and human-ai collaboration. arXiv preprint arXiv:2010.09890, 2020.
  30. Monotonic value function factorisation for deep multi-agent reinforcement learning. The Journal of Machine Learning Research, 21(1):7234–7284, 2020.
  31. Alfworld: Aligning text and embodied environments for interactive learning. arXiv preprint arXiv:2010.03768, 2020.
  32. Multiagent systems: A survey from a machine learning perspective. Autonomous Robots, 8:345–383, 2000.
  33. Executing instructions in situated collaborative interactions. arXiv preprint arXiv:1910.03655, 2019.
  34. Learning to speak and act in a fantasy text adventure game. arXiv preprint arXiv:1903.03094, 2019.
  35. Handmethat: Human-robot communication in physical and social environments. Advances in Neural Information Processing Systems, 35:12014–12026, 2022.
  36. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  37. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023b.
  38. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  39. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  40. On the evaluations of chatgpt and emotion-enhanced prompting for mental health analysis. arXiv preprint arXiv:2304.03347, 2023.
  41. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations (ICLR), 2023.
  42. In situ bidirectional human-robot value alignment. Science robotics, 7(68):eabm4183, 2022.
  43. Proagent: Building proactive cooperative ai with large language models. arXiv preprint arXiv:2308.11339, 2023a.
  44. Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485, 2023b.
  45. Solving math word problem via cooperative reasoning induced language models. arXiv preprint arXiv:2210.16257, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Ran Gong (17 papers)
  2. Qiuyuan Huang (23 papers)
  3. Xiaojian Ma (52 papers)
  4. Hoi Vo (4 papers)
  5. Zane Durante (12 papers)
  6. Yusuke Noda (6 papers)
  7. Zilong Zheng (63 papers)
  8. Song-Chun Zhu (216 papers)
  9. Demetri Terzopoulos (44 papers)
  10. Li Fei-Fei (199 papers)
  11. Jianfeng Gao (344 papers)
Citations (51)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com