Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
98 tokens/sec
GPT-4o
61 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges (2403.10249v1)

Published 15 Mar 2024 in cs.AI
A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges

Abstract: The swift evolution of Large-scale Models (LMs), either language-focused or multi-modal, has garnered extensive attention in both academy and industry. But despite the surge in interest in this rapidly evolving area, there are scarce systematic reviews on their capabilities and potential in distinct impactful scenarios. This paper endeavours to help bridge this gap, offering a thorough examination of the current landscape of LM usage in regards to complex game playing scenarios and the challenges still open. Here, we seek to systematically review the existing architectures of LM-based Agents (LMAs) for games and summarize their commonalities, challenges, and any other insights. Furthermore, we present our perspective on promising future research avenues for the advancement of LMs in games. We hope to assist researchers in gaining a clear understanding of the field and to generate more interest in this highly impactful research direction. A corresponding resource, continuously updated, can be found in our GitHub repository.

A Comprehensive Survey on Game Playing Agents Leveraging Large Models

Introduction to LMs in Game Playing Agents

The advent of Large Models (LMs) in the field of game playing has ushered in a significant shift in how artificial intelligence can learn, reason, and engage within complex digital environments. Game Playing Agents (GPAs) encapsulate a critical area of research, providing insights into the capabilities of LMs and presenting a multitude of challenges and opportunities for future advancements. This survey aims to meticulously review the landscape of LM-based Agents (LMAs) in gaming, elucidating common methodologies, applications, challenges, and prospective research directions. The focus is primarily on the utilization of LMs across various stages of gameplay, encompassing perception, inference, behavior, and the overarching goal of improving authenticity and interaction within game scenarios.

Perception in Game Playing Agents

Perception serves as the foundation for GPAs, enabling them to interpret and react to the game environment. This includes:

  • Semantic Understanding: Initial LMAs predominantly focused on textual information processing within games, necessitating a comprehensive understanding of semantics for effective gameplay. Advanced approaches now integrate visual and, potentially, auditory data to enhance the multi-modal perception capabilities of agents.
  • Visual Perception: The incorporation of Multi-modal LLMs (MLLMs) is considered a substantial improvement, enabling agents to process visual cues more effectively, thus opening avenues for richer interaction and engagement within games.

The Role of Inference in Enhancing GPAs

Inference encompasses the cognitive processes that allow GPAs to make decisions based on perceived information. This involves:

  • Learning and Reasoning: The application of learned knowledge to new scenarios is critical for the generalization capabilities of GPAs. LMs exhibit promising results in adapting learned behaviors to new tasks, showcasing potential in ongoing learning and adaptation.
  • Decision-making and Reflection: Advanced LMs enable GPAs to undertake complex decision-making processes, involving multi-hop inference and long-term planning. Reflection mechanisms further allow for the evaluation and improvement of decisions based on feedback.

Behaviors and Actions of GPAs

Action execution in GPAs is pivotal for interacting with the game environment, characterized by:

  • Generative Programming Techniques: These are employed for task execution, where iterative prompting and program generation based on language instructions play significant roles.
  • Dialogue Interactions: Communication, either between agents or with human players, forms a crucial interaction channel within games.
  • Consistency in Action: Ensuring that actions are coherent and contextually appropriate across diverse game scenarios is paramount for the viability of LMAs in gaming.

Addressing Challenges in Game Playing Agents

A myriad of challenges confront LMAs in gaming, such as:

  • Hallucination: The propensity of LMs to generate outputs that may not align with the game's reality represents a critical hurdle.
  • Error Correction: The ability of GPAs to identify and rectify errors autonomously is essential for sustained gameplay and learning.
  • Generalization: The capacity to extend learned behaviors and strategies to new, unseen tasks remains a significant challenge, reflective of the broader goal of achieving AGI.
  • Interpretability: Ensuring that the decision-making process of GPAs is transparent and understandable is crucial for debugging, improvement, and user trust.

Future Directions

The future trajectory of research in GPAs and LMs is oriented towards enhancing multi-modal perception, achieving authentic gaming experiences, leveraging external tools for comprehensive interaction, and excelling in real-time gaming scenarios. Further investigation into the application of LMs in game testing, level and story generation, and mechanics tuning presents a fertile ground for exploration.

Conclusion

The integration of LMs into game playing agents marks a substantial leap towards creating intelligent systems capable of navigating and interacting within complex digital environments. Despite the progress, the path towards fully autonomous, adaptable, and intelligent game agents presents numerous challenges and opportunities for innovation. As the field continues to evolve, the continued exploration of these avenues will undoubtedly yield significant advancements in artificial intelligence, game design, and beyond.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Games. arXiv:2309.17234, 2023.
  2. Towards grounded dialogue generation in video game environments. 2023.
  3. A framework for exploring player perceptions of llm-generated dialogue in commercial video games. In Findings of EMNLP 2023.
  4. Trevor Ashby and Braden K Webb et al. Personalized quest and dialogue generation in role-playing games: A knowledge graph-and language model-based approach. In CHI, 2023.
  5. Video pretraining (vpt): Learning to act by watching unlabeled online videos. NeurIPS, 2022.
  6. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.
  7. Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680, 2019.
  8. Groot: Learning to follow instructions by watching gameplay videos. arXiv:2310.08235, 2023.
  9. GameGPT: Multi-agent collaborative framework for game development. arXiv:2310.08067, 2023.
  10. AutoAgents: A framework for Automatic Agent Generation. arXiv:2309.17288, 2023.
  11. Towards end-to-end embodied decision making with multi-modal large language model. In NeurIPS 2023 Workshop, 2023.
  12. Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv:2308.10848, 2023.
  13. Neural mechanisms for interacting with a world full of action choices. Annual review of neuroscience, 2010.
  14. DesignGPT: Multi-agent collaboration in design. arXiv:2311.11591, 2023.
  15. Yijiang River Dong. COTTAGE: Coherent Text Adventure Games Generation. PhD thesis, University of Pennsylvania, 2023.
  16. Palm-e: An embodied multimodal language model. arXiv:2303.03378, 2023.
  17. Improving factuality and reasoning in language models through multiagent debate. arXiv:2305.14325, 2023.
  18. Minedojo: Building open-ended embodied agents with internet-scale knowledge. NeurIPS, 2022.
  19. Llama rider: Spurring large language models to explore the open world. arXiv:2310.08922, 2023.
  20. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv:2305.10142, 2023.
  21. Mindagent: Emergent gaming interaction. arXiv:2309.09971, 2023.
  22. Suspicion-agent: Playing imperfect information games with theory of mind-aware GPT-4. arXiv:2309.17277, 2023.
  23. Akshat Gupta. Are ChatGPT and GPT-4 Good Poker Players? – A Pre-Flop Analysis. arXiv:2308.12466, 2023.
  24. MetaGPT: Meta programming for a multi-agent collaborative framework. arXiv:2308.00352, 2023.
  25. Pokéllmon: A human-parity agent for pokémon battles with large language models. arXiv:2402.01118, 2024.
  26. Inner monologue: Embodied reasoning through planning with language models. arXiv:2207.05608, 2022.
  27. Qiuyuan Huang and Jae Sung Park et al. Ark: Augmented reality with knowledge interactive emergent ability. arXiv:2305.00970, 2023.
  28. Reinforcement learning agents for ubisoft’s roller champions. arXiv:2012.06031, 2020.
  29. Alphablock: Embodied finetuning for vision-language reasoning in robot manipulation. arXiv:2305.18898, 2023.
  30. Lyfe agents: Generative agents for low-cost real-time social interactions. arXiv:2310.02172, 2023.
  31. Llm-based agent society investigation: Collaboration and confrontation in avalon gameplay. arXiv:2310.14985, 2023.
  32. Guohao Li and Hasan Abed Al Kader Hammoud et al. CAMEL: Communicative agents for ’mind’ exploration of large language model society. In The 37th NeurIPS, 2023.
  33. Auto mc-reward: Automated dense reward design with large language models for minecraft. arXiv:2312.09238, 2023.
  34. Assessing logical puzzle solving in large language models: Insights from a minesweeper case study. arXiv:2311.07387, 2023.
  35. Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv:2310.06500, 2023.
  36. Tachikuma: Understading complex interactions with multi-character and novel objects by large language models. arXiv:2307.12573, 2023.
  37. Steve-1: A generative model for text-to-behavior in minecraft. In NeurIPS 2023 Workshop, 2023.
  38. Avalonbench: Evaluating llms playing the game of avalon. In NeurIPS 2023 Workshop, 2023.
  39. Llm-powered hierarchical language agent for real-time human-ai coordination. arXiv:2312.15224, 2024.
  40. Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv:2312.11865, 2023.
  41. Meta’s Fundamental AI Research Diplomacy Team. Human-level play in the game of Diplomacy by combining language models with strategic reasoning. Science, 2022.
  42. The 2003 Report of the IGDA’s Artificial Intelligence Interface Standards Committee. International Game Developers Association (IGDA) Technical Report, 2003.
  43. Alexander Nareyek and Börje F. Karlsson et al. The 2004 Report of the IGDA’s Artificial Intelligence Interface Standards Committee. IGDA Technical Report, 2004.
  44. The 2005 Report of the IGDA’s Artificial Intelligence Interface Standards Committee. IGDA Technical Report, 2005.
  45. Selective perception: Optimizing state descriptions with reinforcement learning for language model actors. arXiv:2307.11922, 2023.
  46. OpenAI. ChatGPT can now see, hear, and speak, 2023.
  47. Social simulacra: Creating populated prototypes for social computing systems. In ACM UIST, 2022.
  48. Generative agents: Interactive simulacra of human behavior. In The 36th UIST, 2023.
  49. Gorilla: Large language model connected with massive apis. arXiv:2305.15334, 2023.
  50. diff history for neural language agents. arXiv:2312.07540, 2023.
  51. Communicative agents for software development. arXiv:2307.07924, 2023.
  52. Scaling instructable agents across many simulated worlds. Technical Report, 2024.
  53. Sayplan: Grounding large language models using 3d scene graphs for scalable task planning. arXiv:2307.06135, 2023.
  54. Visual encoders for data-efficient imitation learning in modern video games. arXiv:2312.02312, 2023.
  55. Timo Schick and Jane Dwivedi-Yu et al. Toolformer: Language models can teach themselves to use tools. arXiv:2302.04761, 2023.
  56. Character-LLM: A trainable agent for role-playing. In EMNLP, 2023.
  57. MarioGPT: Open-Ended Text2Level Generation through Large Language Models. arXiv:2302.05981, 2023.
  58. Large language models are pretty good zero-shot video game bug detectors. arXiv:2210.02506, 2022.
  59. Glitchbench: Can large multimodal models detect video game glitches? arXiv:2312.05291, 2023.
  60. Searching bug instances in gameplay video repositories. IEEE Transactions on Games, 2024.
  61. True knowledge comes from practice: Aligning large language models with embodied environments via reinforcement learning. In ICLR, 2024.
  62. Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study. arXiv:2403.03186, 2024.
  63. Can large language models play text games well? Current state-of-the-art and open questions. arXiv:2304.02868, 2023.
  64. Voyager: An open-ended embodied agent with large language models. arXiv:2305.16291, 2023.
  65. Apollo’s oracle: Retrieval-augmented reasoning in multi-agent debates. arXiv:2312.04854, 2023.
  66. Avalon’s game of thoughts: Battle against deception through recursive contemplation. arXiv:2310.01320, 2023.
  67. Self-consistency improves chain of thought reasoning in language models. arXiv:2203.11171, 2023.
  68. Open-world story generation with structured knowledge enhancement: A comprehensive survey. Neurocomputing, 2023.
  69. Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. arXiv:2310.00746, 2023.
  70. Describe, explain, plan and select: Interactive planning with llms enables open-world multi-task agents. In The 37th NeurIPS, 2023.
  71. Jarvis-1: Open-world multi-task agents with memory-augmented multimodal language models. arXiv:2311.05997, 2023.
  72. Honor of kings arena: An environment for generalization in competitive reinforcement learning. The 35th NeurIPS.
  73. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 35:24824–24837, 2022.
  74. Visual ChatGPT: Talking, drawing and editing with visual foundation models. arXiv:2303.04671, 2023.
  75. Deciphering digital detectives: Understanding llm behaviors and capabilities in multi-agent mystery games. arXiv:2312.00746, 2023.
  76. Tidybot: Personalized robot assistance with large language models. arXiv:2305.05658, 2023.
  77. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv:2308.08155, 2023.
  78. SPRING: Studying papers and reasoning to play games. In The 37th NeurIPS, 2023.
  79. Smartplay: A benchmark for llms as intelligent agents. 2023.
  80. Embodied task planning with large language models. arXiv:2307.01848, 2023.
  81. The rise and potential of large language model based agents: A survey. arXiv:2309.07864, 2023.
  82. Robotic skill acquisition via instruction augmentation with vision-language models. RSS, 2023.
  83. Exploring large language models for communication games: An empirical study on werewolf. arXiv:2309.04658, 2023.
  84. Language agents with reinforcement learning for strategic play in the werewolf game. arXiv:2310.18940, 2023.
  85. Octopus: Embodied vision-language programmer from environmental feedback. arXiv:2310.08588, 2023.
  86. Skill reinforcement learning and planning for open-world long-horizon tasks. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
  87. Building open-ended embodied agent via language-policy bidirectional adaptation. arXiv:2401.00006, 2023.
  88. Proagent: Building proactive cooperative ai with large language models. AAAI 2024.
  89. Building cooperative embodied agents modularly with large language models. In NeurIPS 2023 Foundation Models for Decision Making Workshop.
  90. Sprint: Scalable policy pre-training via language instruction relabeling. arXiv:2306.11886, 2023.
  91. Steve-eye: Equipping llm-based embodied agents with visual perception in open worlds. arXiv:2310.13255, 2023.
  92. Wangchunshu Zhou and Yuchen Eleanor Jiang et al. Agents: An open-source framework for autonomous language agents. arXiv:2309.07870, 2023.
  93. Calypso: Llms as dungeon master’s assistants. In AIIDE 2023, 2023.
  94. Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Xinrun Xu (15 papers)
  2. Yuxin Wang (132 papers)
  3. Chaoyi Xu (5 papers)
  4. Ziluo Ding (16 papers)
  5. Jiechuan Jiang (14 papers)
  6. Zhiming Ding (14 papers)
  7. Börje F. Karlsson (27 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com