Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ChessGPT: Bridging Policy Learning and Language Modeling (2306.09200v2)

Published 15 Jun 2023 in cs.LG and cs.AI

Abstract: When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in LLM training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and LLMing by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess. Leveraging the dataset, we showcase two model examples ChessCLIP and ChessGPT, integrating policy learning and LLMing. Finally, we propose a full evaluation framework for evaluating LLM's chess ability. Experimental results validate our model and dataset's effectiveness. We open source our code, model, and dataset at https://github.com/waterhorse1/ChessGPT.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Playwright. https://github.com/microsoft/playwright.
  2. Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  3. GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. https://github.com/nomic-ai/gpt4all, 2023.
  4. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
  5. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, 2023.
  6. GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arXiv preprint arXiv:2204.06745, 2022.
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  8. Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712, 2023.
  9. Deep Blue. Artificial Intelligence, 134(1):57–83, January 2002.
  10. CCRL. https://www.computerchess.org.uk/ccrl/.
  11. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  12. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, March 2023.
  13. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  14. Dahoas. Dolly2. https://huggingface.co/datasets/databricks/databricks-dolly-15k.
  15. Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
  16. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  17. Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853, 2022.
  18. Wikimedia Foundation. Wikimedia Downloads. https://dumps.wikimedia.org.
  19. The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
  20. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, Las Vegas, NV, USA, June 2016. IEEE.
  21. Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
  22. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  23. Constrained Value-Aligned LLM via Safe RLHF, May 2023.
  24. OpenAssistant Conversations–Democratizing Large Language Model Alignment. arXiv preprint arXiv:2304.07327, 2023.
  25. The bigscience ROOTS corpus: A 1.6TB composite multilingual dataset. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  26. Leela Chess Zero. https://lczero.org/.
  27. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
  28. Grounded language-image pre-training. In CVPR, 2022.
  29. Lichess chess opening names. https://github.com/lichess-org/chess-openings.
  30. Lichess Developers. Lichess. https://lichess.org/.
  31. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023.
  32. Acquisition of chess knowledge in AlphaZero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, November 2022.
  33. Aligning Superhuman AI with Human Behavior: Chess as a Model System. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, pages 1677–1687, New York, NY, USA, August 2020. Association for Computing Machinery.
  34. Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773, 2021.
  35. Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  36. The chess transformer: Mastering play using generative language models. arXiv preprint arXiv:2008.04057, 2020.
  37. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. In Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald L"ungen, and Caroline Iliadi, editors, Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019, pages 9 – 16, Mannheim, 2019. Leibniz-Institut f"ur Deutsche Sprache.
  38. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  39. Instruction Tuning with GPT-4. arXiv preprint arXiv:2304.03277, 2023.
  40. FiLM: Visual reasoning with a general conditioning layer. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18, pages 3942–3951, New Orleans, Louisiana, USA, February 2018. AAAI Press.
  41. python-chess: a chess library for python. https://github.com/niklasf/python-chess.
  42. Learning Transferable Visual Models From Natural Language Supervision, February 2021.
  43. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints, 2019.
  44. A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
  45. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  46. ShareGPT. https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered.
  47. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (New York, N.Y.), 362(6419):1140–1144, 2018.
  48. Alexei Barantsev Simon Stewart. Selenium, March 2023.
  49. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
  50. Andreas Stöckl. Watching a language model learning chess. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1369–1379, 2021.
  51. Moss, March 2023.
  52. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
  53. Together Computer. RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset. https://github.com/togethercomputer/RedPajama-Data, april 2023.
  54. Chess as a Testbed for Language Model State Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
  55. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  56. Alan Turing. Digital computers applied to games. Faster than thought, 1953.
  57. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  58. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv: Arxiv-2305.16291, 2023.
  59. Self-Instruct: Aligning Language Model with Self Generated Instructions, 2022.
  60. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
  61. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
  62. Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
  63. Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095, 2007.
  64. ChatGLM-6B, March 2023.
Citations (26)

Summary

  • The paper introduces a hybrid framework that integrates historical chess games with language insights to improve strategic move predictions.
  • It employs ChessCLIP's contrastive learning and ChessGPT's fine-tuned generative techniques on mixed game-language datasets.
  • Evaluation shows enhanced model performance in tracking chess moves, aligning value judgments, and generating optimal policies compared to baselines.

An Overview of ChessGPT: Bridging Policy Learning and LLMing

The paper "ChessGPT: Bridging Policy Learning and LLMing" explores the intersection of policy learning and LLMing by leveraging the complexities of the game of chess. The research aims to create a robust autonomous agent capable of integrating both historical policy data and language insights, which are vital to human decision-making. Traditional approaches have predominantly focused on either learning policy through historical data or training LLMs using a textual corpus. This work seeks to fill this gap by employing a hybrid methodology that combines these elements.

The paper introduces two models: ChessCLIP and ChessGPT, both utilizing a large-scale dataset amalgamating game play and language data related to chess. ChessCLIP bridges the gap between policy (chess game states) and language annotations through a contrastive learning approach, while ChessGPT applies generative pretraining transformer techniques to chess-related datasets.

Dataset and Methodology

The paper curates a comprehensive dataset divided into several categories:

  1. Game Data: This includes professional-player games, computer engine matches, and player-versus-player encounters, constituting a vast repository of actual chess games represented in Portable Game Notation (PGN).
  2. Language Data: Extracted from blogs, forums, books, and other chess-related literature to form a language corpus specific to chess.
  3. Mixed Game-Language Data: Features annotated PGNs where language descriptions directly correlate with game states, providing a dual-modality dataset.
  4. Instruction-Tuning and Conversation Data: Contains conversational chess data and instructional tuning prompts generated using LLMs like GPT-4.

The models developed attempt to leverage this dataset in distinct ways. ChessCLIP employs a pretraining scheme akin to Contrastive Language-Image Pre-Training (CLIP) to align chess boards with their respective language annotations. The ChessGPT model, on the other hand, is a fine-tuned version of an existing LLM, thereby integrating policy-learning tasks directly into the model’s generative processes.

Evaluation and Results

A thorough evaluation framework is proposed, segregating model performance into three domains: chess modeling ability, value judgment ability, and policy proficiency. Chess modeling tasks involve assessing the model’s capacity to accurately track game states and predict legal moves. Value judgment tasks measure the alignment between model evaluations and established heuristics or human judgments. Policy proficiency evaluates the model’s competency to generate optimal game moves.

The results indicate that ChessGPT and ChessCLIP outperform baseline models in various tasks, validating the dataset's utility and the model’s efficacy in bridging policy learning with natural language processing. ChessCLIP particularly shows promise in correlating textual annotations with board positions, a task inherently challenging due to the abstract nature of strategic commentary.

Implications and Future Directions

The implications of integrating policy learning with LLMs extend beyond theoretical insights, offering practical applications such as enhanced chess AI assistants and new paradigms for educational tools. Bridging these domains could provide insights into broader challenges in AI, such as incorporating natural language guidance into decision-making systems across various applications.

The future development may involve exploring more sophisticated models using Reinforcement Learning from Human Feedback (RLHF), expanding datasets with richer annotation, and enhancing model interpretability. Moreover, the concept of mixed-modality datasets pioneered in this work could be applicable to other complex domains beyond chess.

In conclusion, "ChessGPT: Bridging Policy Learning and LLMing" offers a novel and innovative approach to integrating two traditionally separate areas of AI research, laying the groundwork for future explorations into the synergy between decision-making processes and language interpretations. This work signifies a meaningful step towards creating more nuanced models that mirror human-like problem-solving capabilities.

Github Logo Streamline Icon: https://streamlinehq.com