Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Amortized Planning with Large-Scale Transformers: A Case Study on Chess (2402.04494v2)

Published 7 Feb 2024 in cs.LG, cs.AI, and stat.ML

Abstract: This paper uses chess, a landmark planning problem in AI, to assess transformers' performance on a planning task where memorization is futile $\unicode{x2013}$ even at a large scale. To this end, we release ChessBench, a large-scale benchmark dataset of 10 million chess games with legal move and value annotations (15 billion data points) provided by Stockfish 16, the state-of-the-art chess engine. We train transformers with up to 270 million parameters on ChessBench via supervised learning and perform extensive ablations to assess the impact of dataset size, model size, architecture type, and different prediction targets (state-values, action-values, and behavioral cloning). Our largest models learn to predict action-values for novel boards quite accurately, implying highly non-trivial generalization. Despite performing no explicit search, our resulting chess policy solves challenging chess puzzles and achieves a surprisingly strong Lichess blitz Elo of 2895 against humans (grandmaster level). We also compare to Leela Chess Zero and AlphaZero (trained without supervision via self-play) with and without search. We show that, although a remarkably good approximation of Stockfish's search-based algorithm can be distilled into large-scale transformers via supervised learning, perfect distillation is still beyond reach, thus making ChessBench well-suited for future research.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. H. Alrdahi and R. Batista-Navarro. Learning to play chess from textbooks (LEAP): a corpus for evaluating chess moves based on sentiment analysis. arXiv:2310.20260, 2023.
  2. Gemini: A family of highly capable multimodal models. arXiv:2312.11805, 2023.
  3. JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
  4. Language models are few-shot learners. In NeurIPS, 2020.
  5. C. Burt. Faster than thought: A symposium on digital computing machines. edited by b. v. bowden. British Journal of Statistical Psychology, 1955.
  6. Deep blue. Artif. Intell., 2002.
  7. N. Carlini. Playing chess with large language models. https://nicholas.carlini.com/writing/2023/chess-llm.html, 2023.
  8. R. Coulom. Whole-history rating: A bayesian rating system for players of time-varying strength. In Computers and Games, 2008.
  9. Representation matters: The game of chess poses a challenge to vision transformers. arXiv:2304.14918, 2023.
  10. Deepchess: End-to-end deep neural network for automatic learning in chess. In ICANN (2), 2016.
  11. The DeepMind JAX Ecosystem, 2020. URL http://github.com/google-deepmind.
  12. M. DeLeo and E. Guven. Learning chess with language models and transformers. arXiv:2209.11902, 2022.
  13. Chessgpt: Bridging policy learning and language modeling. arXiv:2306.09200, 2023.
  14. Convolutional sequence to sequence learning. In ICML, 2017.
  15. B. A. Gramaje. Exploring GPT’s capabilities in chess-puzzles. Master’s thesis, Universitat Politècnica de València, 2023.
  16. G. Haworth and N. Hernandez. The 20thth{}^{\mbox{th}}start_FLOATSUPERSCRIPT th end_FLOATSUPERSCRIPT top chess engine championship, TCEC20. J. Int. Comput. Games Assoc., 2021.
  17. Haiku: Sonnet for JAX, 2020. URL http://github.com/deepmind/dm-haiku.
  18. Training compute-optimal large language models. arXiv:2203.15556, 2022.
  19. Justaz. Exact ratings for everyone on lichess. https://lichess.org/@/justaz/blog/exact-ratings-for-everyone-on-lichess/klIoAEAU, 2023.
  20. Sentimate: Learning to play chess through natural language processing. arXiv:1907.08321, 2019.
  21. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR (Poster), 2015.
  22. D. Klein. Neural networks for chess. arXiv:2209.01506, 2022.
  23. M. Lai. Giraffe: Using deep reinforcement learning to play chess. arXiv:1509.01549, 2015.
  24. OpenAI. GPT-4 technical report. arXiv:2303.08774, 2023.
  25. Stockfish, 2008. URL https://stockfishchess.org.
  26. M. Sadler and N. Regan. Game Changer: AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI. New In Chess, 2019.
  27. Mastering atari, go, chess and shogi by planning with a learned model. Nat., 2020.
  28. N. Shazeer. GLU variants improve transformer. arXiv:2002.05202, 2020.
  29. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv:1712.01815, 2017.
  30. A. Stöckl. Watching a language model learning chess. In RANLP, 2021.
  31. S. Thrun. Learning to play the game of chess. In NIPS, 1994.
  32. Chess as a testbed for language model state tracking. In AAAI, 2022.
  33. Llama: Open and efficient foundation language models. arXiv:2302.13971, 2023a.
  34. Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288, 2023b.
  35. Deep pepper: Expert iteration based chess agent in the reinforcement learning setting. arXiv:1806.00683, 2018.
  36. Attention is all you need. In NIPS, 2017.
Citations (10)

Summary

  • The paper demonstrates that a transformer trained on millions of annotated chess positions can achieve grandmaster-level play without traditional search algorithms.
  • It systematically analyzes the effects of model size and dataset scale on generalization, offering detailed ablation studies and scaling insights.
  • It reveals that learned action-values suffice for complex chess strategy, showcasing the potential of neural approximators in replacing explicit search methods.

Introduction

The breakthroughs in AI over the past few years have been driven by the application of large-scale models and massive datasets. This paper investigates the application of such paradigms to the game of chess, traditionally dominated by engines using deep search algorithms and large databases of heuristics. Through the lens of supervised learning, the authors of the paper hypothesize that it is possible to achieve strong chess performance purely from learned action-values, without the traditional explicit search algorithms.

Methodology

The team constructed a dataset by extracting and annotating 10 million games from Lichess with action-values sourced from the elite chess engine Stockfish 16. A transformer model with 270 million parameters was then trained to predict win probabilities for given chess board positions. The researchers systematically studied the effects of varying model sizes and dataset scales to understand their impact on generalization and chess performance.

Results and Contributions

The model's prowess is commendable, with a Lichess blitz Elo of 2895, which situates it in the grandmaster echelon. It also outperforms AlphaZero's policy and value networks—sans the Monte Carlo Tree Search—and GPT-3.5-turbo-instruct over a select set of metrics, solidifying the notion of neural predictors as strong generalizers. Key contributions include:

  • A neural predictor trained on millions of annotated board states capturing Stockfish 16's expertise.
  • A grandmaster-level policy derived from this predictor, capable of tackling challenging chess puzzles, without any domain-specific enhancements or search algorithms.
  • Demonstrated significance of scale for achieving robust generalization and performance in chess, accompanied by a detailed analysis through rigorous ablations and scaling studies.

Discussion and Implications

While achieving overwhelming success, the paper acknowledges the limitations of lacking historical context which traditional engines use for strategic planning, exposing minor weaknesses. These were pragmatically mitigated through workarounds that, however, do not imply a search algorithm. Furthermore, the paper clarifies the scale needed for similar neural predictors to bridge the gap between current model performance and oracle-powered engines like Stockfish 16.

The work underlines a shift in AI research, suggesting that highly complex algorithmic reasoning—once deemed exclusive to sophisticated search-based systems—can be distilled into transformer models using supervised learning. As such, transformers are not simply pattern recognition systems, but powerful approximators capable of substituting intricate algorithmic processes. The implications for future research are profound, promising further shifts in how AI is applied across varied cognitive domains.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Grandmaster-Level Chess Without Search (197 points, 130 comments)