Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Can Large Language Models Play Games? A Case Study of A Self-Play Approach (2403.05632v1)

Published 8 Mar 2024 in cs.AI

Abstract: LLMs harness extensive data from the Internet, storing a broad spectrum of prior knowledge. While LLMs have proven beneficial as decision-making aids, their reliability is hampered by limitations in reasoning, hallucination phenomenon, and so on. On the other hand, Monte-Carlo Tree Search (MCTS) is a heuristic search algorithm that provides reliable decision-making solutions, achieved through recursive rollouts and self-play. However, the effectiveness of MCTS relies heavily on heuristic pruning and external value functions, particularly in complex decision scenarios. This work introduces an innovative approach that bolsters LLMs with MCTS self-play to efficiently resolve deterministic turn-based zero-sum games (DTZG), such as chess and go, without the need for additional training. Specifically, we utilize LLMs as both action pruners and proxies for value functions without the need for additional training. We theoretically prove that the suboptimality of the estimated value in our proposed method scales with $\tilde{\mathcal O}\Bigl(\frac{|\tilde {\mathcal A}|}{\sqrt{N}} + \epsilon_\mathrm{pruner} + \epsilon_\mathrm{critic}\Bigr)$, where (N) is the number of simulations, $|\tilde {\mathcal A}|$ is the cardinality of the pruned action space by LLM, and $\epsilon_\mathrm{pruner}$ and $\epsilon_\mathrm{critic}$ quantify the errors incurred by adopting LLMs as action space pruner and value function proxy, respectively. Our experiments in chess and go demonstrate the capability of our method to address challenges beyond the scope of MCTS and improve the performance of the directly application of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Agrawal, R. (1995). Sample mean based index policies by o (log n) regret for the multi-armed bandit problem. Advances in applied probability, 27 1054–1078.
  3. Playing repeated games with large language models. arXiv preprint arXiv:2305.16867.
  4. Exploration–exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410 1876–1902.
  5. Finite-time analysis of the multiarmed bandit problem. Machine learning, 47 235–256.
  6. Azuma, K. (1967). Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19 357–367.
  7. Beam monte-carlo tree search. In 2012 IEEE Conference on Computational Intelligence and Games (CIG). IEEE.
  8. The reversal curse: Llms trained on” a is b” fail to learn” b is a”. arXiv preprint arXiv:2309.12288.
  9. Bertsekas, D. (2012). Dynamic programming and optimal control: Volume I, vol. 4. Athena scientific.
  10. Champandard, A. J. (2014). Monte-carlo tree search in total war: Rome ii’s campaign ai. AIGameDev. com: http://aigamedev. com/open/coverage/mcts-rome-ii, 4.
  11. Portfolio greedy search and simulation for large-scale combat in starcraft. In 2013 IEEE Conference on Computational Inteligence in Games (CIG). IEEE.
  12. Coulom, R. (2006). Efficient selectivity and backup operators in monte-carlo tree search. In International conference on computers and games. Springer.
  13. Chessgpt: Bridging policy learning and language modeling. arXiv preprint arXiv:2306.09200.
  14. Alphazero-like tree-search can guide large language model decoding and training. arXiv preprint arXiv:2309.17179.
  15. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992.
  16. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403.
  17. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232.
  18. Script-and cluster-based uct for starcraft. In 2014 IEEE conference on computational intelligence and games. IEEE.
  19. Transposition table driven work scheduling in distributed game-tree search. In Advances in Artificial Intelligence: 15th Conference of the Canadian Society for Computational Studies of Intelligence, AI 2002 Calgary, Canada, May 27–29, 2002 Proceedings 15. Springer.
  20. Bandit based monte-carlo planning. In European conference on machine learning. Springer.
  21. Reason for future, act for now: A principled framework for autonomous llm agents with provable sample efficiency. arXiv preprint arXiv:2309.17382.
  22. The harpy speech recognition system: performance with large vocabularies. The Journal of the Acoustical Society of America, 60 S10–S11.
  23. Alympics: Language agents meet game theory. arXiv preprint arXiv:2311.03220.
  24. Pearl, J. (1980). Scout: A simple game-searching algorithm with proven optimal properties. In AAAI.
  25. Enhancements for monte-carlo tree search in ms pac-man. In 2012 IEEE Conference on Computational Intelligence and Games (CIG). IEEE.
  26. Grandmaster-level chess without search. arXiv preprint arXiv:2402.04494.
  27. Schaeffer, J. (1989). The history heuristic and alpha-beta search enhancements in practice. IEEE transactions on pattern analysis and machine intelligence, 11 1203–1212.
  28. Heuristic move pruning in monte carlo tree search for the strategic card game lords of war. In 2014 IEEE conference on computational intelligence and games. IEEE.
  29. Non-asymptotic analysis of monte carlo tree search. In Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.
  30. Alfworld: Aligning text and embodied environments for interactive learning. arXiv preprint arXiv:2010.03768.
  31. Mastering the game of go without human knowledge. nature, 550 354–359.
  32. Enhancements for real-time monte-carlo tree search in general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG). IEEE.
  33. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601.
  34. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
  35. A hybrid search agent in pommerman. In Proceedings of the 13th international conference on the foundations of digital games.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hongyi Guo (14 papers)
  2. Zhihan Liu (22 papers)
  3. Yufeng Zhang (67 papers)
  4. Zhaoran Wang (164 papers)
Citations (3)