Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4 (2309.17277v3)
Abstract: Unlike perfect information games, where all elements are known to every player, imperfect information games emulate the real-world complexities of decision-making under uncertain or incomplete information. GPT-4, the recent breakthrough in LLMs trained on massive passive data, is notable for its knowledge retrieval and reasoning abilities. This paper delves into the applicability of GPT-4's learned knowledge for imperfect information games. To achieve this, we introduce \textbf{Suspicion-Agent}, an innovative agent that leverages GPT-4's capabilities for performing in imperfect information games. With proper prompt engineering to achieve different functions, Suspicion-Agent based on GPT-4 demonstrates remarkable adaptability across a range of imperfect information card games. Importantly, GPT-4 displays a strong high-order theory of mind (ToM) capacity, meaning it can understand others and intentionally impact others' behavior. Leveraging this, we design a planning strategy that enables GPT-4 to competently play against different opponents, adapting its gameplay style as needed, while requiring only the game rules and descriptions of observations as input. In the experiments, we qualitatively showcase the capabilities of Suspicion-Agent across three different imperfect information games and then quantitatively evaluate it in Leduc Hold'em. The results show that Suspicion-Agent can potentially outperform traditional algorithms designed for imperfect information games, without any specialized training or examples. In order to encourage and foster deeper insights within the community, we make our game-related data publicly available.
- Playing repeated games with large language models, 2023.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- A multitask, multilingual, multimodal evaluation of chatgpt on reasoning, hallucination, and interactivity. arXiv preprint arXiv:2302.04023, 2023.
- Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
- Superhuman ai for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374):418–424, 2018.
- Superhuman ai for multiplayer poker. Science, 365(6456):885–890, 2019.
- Deep counterfactual regret minimization. In International conference on machine learning, pp. 793–802. PMLR, 2019.
- Combining deep reinforcement learning and search for imperfect-information games. Advances in Neural Information Processing Systems, 33:17057–17069, 2020a.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020b.
- Algorithms for omega-regular games with imperfect information. Logical Methods in Computer Science, Volume 3, Issue 3, jul 2007. doi: 10.2168/lmcs-3(3:4)2007. URL https://doi.org/10.2168%2Flmcs-3%283%3A4%292007.
- Agentverse: Facilitating multi-agent collaboration and exploring emergent behaviors in agents. arXiv preprint arXiv:2308.10848, 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Vincent P. Crawford. “fatal attraction” and level-k thinking in games with non-neutral frames. Journal of Economic Behavior and Organization, 156:219–224, 2018. ISSN 0167-2681. doi: https://doi.org/10.1016/j.jebo.2018.10.008. URL https://www.sciencedirect.com/science/article/pii/S016726811830283X.
- Instructblip: Towards general-purpose vision-language models with instruction tuning, 2023.
- Martin Davies. The mental simulation debate. Philosophical Issues, 5:189–218, 1994.
- How much does it help to know what she knows you know? an agent-based simulation study. Artificial Intelligence, 199:67–92, 2013.
- A theoretical and empirical investigation of search in imperfect information games. Theoretical Computer Science, 252(1-2):217–256, 2001.
- Theory of mind. Current biology, 15(17):R644–R645, 2005.
- The capacity for moral self-correction in large language models. arXiv preprint arXiv:2302.07459, 2023.
- Mindagent: Emergent gaming interaction, 2023.
- Human-level performance in no-press diplomacy via equilibrium search. arXiv preprint arXiv:2010.02923, 2020.
- From images to textual prompts: Zero-shot vqa with frozen large language models. arXiv preprint arXiv:2212.10846, 2022.
- Akshat Gupta. Are chatgpt and gpt-4 good poker players? – a pre-flop analysis, 2023.
- John C Harsanyi. Games with incomplete information played by “bayesian” players part ii. bayesian equilibrium points. Management Science, 14(5):320–334, 1968.
- Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- Efficient and durable decision rules with incomplete information. Econometrica: Journal of the Econometric Society, pp. 1799–1819, 1983.
- Jie Huang and Kevin Chen-Chuan Chang. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403, 2022.
- Susan Hurley. The shared circuits model (scm): How control, mirroring, and simulation can enable imitation, deliberation, and mindreading. Behavioral and brain sciences, 31(1):1–22, 2008.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38, 2023.
- Language models can solve computer tasks. arXiv preprint arXiv:2303.17491, 2023.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Michal Kosinski. Theory of mind may have spontaneously emerged in large language models. arXiv preprint arXiv:2302.02083, 2023.
- Reputation and imperfect information. Journal of economic theory, 27(2):253–279, 1982.
- Limited lookahead in imperfect-information games. Artificial Intelligence, 283:103218, 2020. ISSN 0004-3702. doi: https://doi.org/10.1016/j.artint.2019.103218. URL https://www.sciencedirect.com/science/article/pii/S000437021930044X.
- Swiftsage: A generative agent with fast and slow thinking for complex interactive tasks. ArXiv, abs/2305.17390, 2023. URL https://api.semanticscholar.org/CorpusID:258960143.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023a.
- Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023b.
- The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
- Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023a.
- Score-based equilibrium learning in multi-player finite games with imperfect information, 2023b.
- Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552, 2023.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- C. Models. Model card and evaluations for claude models. https://www-files.anthropic.com/production/images/ Model-Card-Claude-2.pdf., 2023.
- Combining theory of mind and abduction for cooperation under imperfect information. In European Conference on Multi-Agent Systems, pp. 294–311. Springer, 2022.
- Deepstack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337):508–513, 2017.
- Y. Nakajima. Babyagi. https://github. com/yoheinakajima/babyagi, 2023.
- Mindreading: An integrated account of pretence, self-awareness, and understanding other minds. 2003.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Imperfect-information game ai agent based on reinforcement learning using tree search and a deep neural network. Electronics, 12(11), 2023. ISSN 2079-9292. doi: 10.3390/electronics12112453. URL https://www.mdpi.com/2079-9292/12/11/2453.
- Gorilla: Large language model connected with massive apis, 2023.
- Does the chimpanzee have a theory of mind? Behavioral and Brain Sciences, 1(4):515–526, 1978. doi: 10.1017/S0140525X00076512.
- Reworkd. Agentgpt. https://github.com/reworkd/AgentGPT, 2023.
- Toran Bruce Richards et al. Auto-gpt. https: //github.com/Significant-Gravitas/Auto-GPT, 2023.
- Louise Röska-Hardy. Theory theory (simulation theory, theory of mind). 2008.
- Tim Roughgarden. Twenty lectures on algorithmic game theory. Cambridge University Press, 2016.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
- Dale Schuurmans. Memory augmented large language models are computationally universal. arXiv preprint arXiv:2301.04589, 2023.
- Algorithm of thoughts: Enhancing exploration of ideas in large language models. arXiv preprint arXiv:2308.10379, 2023.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Abstracting imperfect information away from two-player zero-sum games, 2023.
- Bayes’ bluff: Opponent modelling in poker. arXiv preprint arXiv:1207.1411, 2012.
- Multi-view 3d models from single images with a convolutional network. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pp. 322–337. Springer, 2016.
- Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
- Tpe: Towards better compositional reasoning over conceptual tools with multi-persona collaboration, 2023b.
- A survey on large language model-based autonomous agents, 2023c.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023d.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022b.
- Spring: Gpt-4 out-performs rl algorithms by studying papers and reasoning. arXiv preprint arXiv:2305.15486, 2023.
- Using iterated reasoning to predict opponent strategies. In The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2, AAMAS ’11, pp. 593–600, Richland, SC, 2011. International Foundation for Autonomous Agents and Multiagent Systems. ISBN 0982657161.
- Exploring large language models for communication games: An empirical study on werewolf, 2023.
- React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Rlcard: A toolkit for reinforcement learning in card games. arXiv preprint arXiv:1910.04376, 2019.
- Douzero: Mastering doudizhu with self-play deep reinforcement learning. In international conference on machine learning, pp. 12333–12344. PMLR, 2021.
- Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219, 2023.
- Regret minimization in games with incomplete information. Advances in neural information processing systems, 20, 2007.