Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models (2404.01230v1)

Published 1 Apr 2024 in cs.CL

Abstract: This paper presents a comprehensive survey of the current status and opportunities for LLMs in strategic reasoning, a sophisticated form of reasoning that necessitates understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. Strategic reasoning is distinguished by its focus on the dynamic and uncertain nature of interactions among multi-agents, where comprehending the environment and anticipating the behavior of others is crucial. We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with LLMs, highlighting the burgeoning development in this area and the interdisciplinary approaches enhancing their decision-making performance. It aims to systematize and clarify the scattered literature on this subject, providing a systematic review that underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (94)
  1. Llm-deliberation: Evaluating llms with interactive multi-agent negotiation games. arXiv preprint arXiv:2309.17234, 2023.
  2. Evaluating multi-agent coordination abilities in large language models. arXiv preprint arXiv:2310.03903, 2023.
  3. Ina: An integrative approach for enhancing negotiation strategies with reward-based dialogue agent. In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
  4. Playing repeated games with large language models. arXiv preprint arXiv:2305.16867, 2023.
  5. Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6):26–38, 2017.
  6. Playing games with gpt: What can we learn about a large language model from canonical strategic games? Available at SSRN 4493398, 2023.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in games, 4(1):1–43, 2012.
  9. Put your money where your mouth is: Evaluating strategic planning and execution of llm agents in an auction arena. arXiv preprint arXiv:2310.05746, 2023a.
  10. Llmarena: Assessing capabilities of large language models in dynamic multi-agent environments. arXiv preprint arXiv:2402.16499, 2024.
  11. The emergence of economic rationality of gpt. Proceedings of the National Academy of Sciences, 120(51):e2316205120, 2023b.
  12. Adrian de Wynter. Will gpt-4 run doom? arXiv preprint arXiv:2403.05468, 2024.
  13. Gtbench: Uncovering the strategic reasoning limitations of llms via game-theoretic evaluations. arXiv preprint arXiv:2402.12348, 2024.
  14. Human-level play in the game of diplomacy by combining language models with strategic reasoning. Science, 378(6624):1067–1074, 2022.
  15. Can large language models serve as rational players in game theory? a systematic analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp.  17960–17967, 2024.
  16. Alphazero-like tree-search can guide large language model decoding and training. arXiv preprint arXiv:2309.17179, 2023.
  17. Chessgpt: Bridging policy learning and language modeling. Advances in Neural Information Processing Systems, 36, 2024.
  18. Limits of large language models in debating humans. arXiv preprint arXiv:2402.06049, 2024.
  19. Improving language model negotiation with self-play and in-context learning from ai feedback. arXiv preprint arXiv:2305.10142, 2023.
  20. Strategic reasoning with language models. arXiv preprint arXiv:2305.19165, 2023.
  21. Understanding social reasoning in language models with language models. Advances in Neural Information Processing Systems, 36, 2024.
  22. Large language models empowered agent-based modeling and simulation: A survey and perspectives. arXiv preprint arXiv:2312.11970, 2023.
  23. States as strings as strategies: Steering language models with game-theoretic solvers. arXiv preprint arXiv:2402.01704, 2024.
  24. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, 55(2):895–943, 2022.
  25. Fulin Guo. Gpt in game theory experiments. 2023.
  26. Can large language models play games? a case study of a self-play approach. arXiv preprint arXiv:2403.05632, 2024a.
  27. Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4. arXiv preprint arXiv:2309.17277, 2023.
  28. Large language model based multi-agents: A survey of progress and challenges. arXiv preprint arXiv:2402.01680, 2024b.
  29. " guinea pig trials" utilizing gpt: A novel smart agent-based modeling approach for studying firm competition and collusion. arXiv preprint arXiv:2308.10974, 2023.
  30. Trueskill™: a bayesian skill rating system. Advances in neural information processing systems, 19, 2006.
  31. John J Horton. Large language models as simulated economic agents: What can we learn from homo silicus? Technical report, National Bureau of Economic Research, 2023.
  32. Pok\\\backslash\’ellmon: A human-parity agent for pok\\\backslash\’emon battles with large language models. arXiv preprint arXiv:2402.01118, 2024.
  33. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227, 2023.
  34. Assistive large language model agents for socially-aware negotiation dialogues. arXiv preprint arXiv:2402.01737, 2024.
  35. Pokergpt: An end-to-end lightweight solver for multi-player texas hold’em via large language model. arXiv preprint arXiv:2401.06781, 2024a.
  36. How far are we on the decision-making of llms? evaluating llms’ gaming ability in multi-agent environments. arXiv preprint arXiv:2403.11807, 2024b.
  37. Multi-agent reinforcement learning: A comprehensive survey. arXiv preprint arXiv:2312.10256, 2023.
  38. Philip N Johnson-Laird. Deductive reasoning. Annual review of psychology, 50(1):109–135, 1999.
  39. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  40. Large language models on the chessboard: A study on chatgpt’s formal language comprehension and complex reasoning skills. arXiv preprint arXiv:2308.15118, 2023.
  41. Human vs. machine: Language models and wargames. arXiv preprint arXiv:2403.03407, 2024.
  42. Llm-based agent society investigation: Collaboration and confrontation in avalon gameplay. arXiv preprint arXiv:2310.14985, 2023.
  43. Theory of mind for multi-agent collaboration via large language models. arXiv preprint arXiv:2310.10701, 2023a.
  44. Large language model-empowered agents for simulating macroeconomic activities. arXiv preprint arXiv:2310.10436, 2023b.
  45. Tradinggpt: Multi-agent system with layered memory and distinct characters for enhanced financial trading performance. arXiv preprint arXiv:2309.03736, 2023c.
  46. Avalonbench: Evaluating llms playing the game of avalon. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023.
  47. Strategic behavior of large language models: Game structure vs. contextual framing. arXiv preprint arXiv:2309.05898, 2023.
  48. Large language models play starcraft ii: Benchmarks and a chain of summarization approach. arXiv preprint arXiv:2312.11865, 2023.
  49. Alympics: Language agents meet game theory. arXiv preprint arXiv:2311.03220, 2023.
  50. A diverse corpus for evaluating and developing english math word problem solvers. arXiv preprint arXiv:2106.15772, 2021.
  51. Welfare diplomacy: Benchmarking language model cooperation. arXiv preprint arXiv:2310.08901, 2023.
  52. Aidan O’Gara. Hoodwinked: Deception and cooperation in a text-based game for language models. arXiv preprint arXiv:2308.01404, 2023.
  53. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730–27744, 2022.
  54. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, pp.  1–22, 2023.
  55. Investigating emergent goal-like behaviour in large language models using experimental economics. arXiv preprint arXiv:2305.07970, 2023.
  56. Civrealm: A learning and reasoning odyssey in civilization for decision-making agents. arXiv preprint arXiv:2401.10568, 2024.
  57. Gameeval: Evaluating llms on conversational games. arXiv preprint arXiv:2308.10032, 2023.
  58. Neural theory-of-mind? on the limits of social intelligence in large lms. arXiv preprint arXiv:2210.13312, 2022.
  59. Negotiating with llms: Prompt hacks, skill gaps, and reasoning deficits. arXiv preprint arXiv:2312.03720, 2023.
  60. Two systems for empathy: a double dissociation between emotional and cognitive empathy in inferior frontal gyrus versus ventromedial prefrontal lesions. Brain, 132(3):617–627, 2009.
  61. Swarmbrain: Embodied agent for real-time strategy game starcraft ii via large language models. arXiv preprint arXiv:2401.17749, 2024.
  62. Eric Siegel. Predictive analytics: The power to predict who will click, buy, lie, or die. John Wiley & Sons, 2013.
  63. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  64. An evolutionary model of personality traits related to cooperative behavior using a large language model. Scientific Reports, 14(1):5989, 2024.
  65. Commonsenseqa 2.0: Exposing the limits of ai through gamification. arXiv preprint arXiv:2201.05320, 2022.
  66. Medagents: Large language models as collaborators for zero-shot medical reasoning. arXiv preprint arXiv:2311.10537, 2023.
  67. Systematic biases in llm simulations of debates. arXiv preprint arXiv:2402.04049, 2024.
  68. Can large language models play text games well. Current State-of-the-Art and Open Questions, 2023.
  69. Fons JR van de Vijver and Madde E Willemsen. Abstract thinking. In Advances in psychology, volume 103, pp.  317–342. Elsevier, 1993.
  70. A logic for strategic reasoning. In Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems, pp.  157–164, 2005.
  71. A survey on large language model based autonomous agents. Frontiers of Computer Science, 18(6):1–26, 2024.
  72. Avalon’s game of thoughts: Battle against deception through recursive contemplation. arXiv preprint arXiv:2310.01320, 2023a.
  73. Unleashing the emergent cognitive synergy in large language models: A task-solving agent through multi-persona self-collaboration. arXiv preprint arXiv:2307.05300, 2023b.
  74. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
  75. Think twice: Perspective-taking improves large language models’ theory-of-mind capabilities. arXiv preprint arXiv:2311.10227, 2023.
  76. Deciphering digital detectives: Understanding llm behaviors and capabilities in multi-agent mystery games. arXiv preprint arXiv:2312.00746, 2023.
  77. Enhance reasoning for large language models in the game werewolf. arXiv preprint arXiv:2402.02330, 2024a.
  78. Shall we talk: Exploring spontaneous collaborations of competing llm agents. arXiv preprint arXiv:2402.12327, 2024b.
  79. The rise and potential of large language model based agents: A survey. arXiv preprint arXiv:2309.07864, 2023.
  80. Measuring bargaining abilities of llms: A benchmark and a buyer-enhancement method. arXiv preprint arXiv:2402.15813, 2024.
  81. The wall street neophyte: A zero-shot analysis of chatgpt over multimodal stock movement prediction challenges. arXiv preprint arXiv:2304.05351, 2023.
  82. Urban generative intelligence (ugi): A foundational platform for agents in embodied city environment. arXiv preprint arXiv:2312.11813, 2023a.
  83. Opentom: A comprehensive benchmark for evaluating theory-of-mind reasoning capabilities of large language models. arXiv preprint arXiv:2402.06044, 2024a.
  84. Magic: Investigation of large language model powered multi-agent in cognition, adaptability, rationality and collaboration. In ICLR 2024 Workshop on Large Language Model (LLM) Agents, 2023b.
  85. A survey on game playing agents and large models: Methods, applications, and challenges. arXiv preprint arXiv:2403.10249, 2024b.
  86. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023c.
  87. Language agents with reinforcement learning for strategic play in the werewolf game. arXiv preprint arXiv:2310.18940, 2023d.
  88. Retroformer: Retrospective large language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151, 2023.
  89. Controlling large language model-based agents for large-scale decision-making: An actor-critic approach. arXiv preprint arXiv:2311.13884, 2023.
  90. Strength lies in differences! towards effective non-collaborative dialogues via tailored strategy planning. arXiv preprint arXiv:2403.06769, 2024a.
  91. Agent-pro: Learning to evolve via policy-level reflection and optimization. arXiv preprint arXiv:2402.17574, 2024b.
  92. K-level reasoning with large language models. arXiv preprint arXiv:2402.01521, 2024c.
  93. Competeai: Understanding the competition behaviors in large language model-based agents. arXiv preprint arXiv:2310.17512, 2023.
  94. Sotopia: Interactive evaluation for social intelligence in language agents. arXiv preprint arXiv:2310.11667, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yadong Zhang (22 papers)
  2. Shaoguang Mao (27 papers)
  3. Tao Ge (53 papers)
  4. Xun Wang (96 papers)
  5. Adrian de Wynter (20 papers)
  6. Yan Xia (169 papers)
  7. Wenshan Wu (17 papers)
  8. Ting Song (9 papers)
  9. Man Lan (26 papers)
  10. Furu Wei (291 papers)
Citations (32)