ChessGPT: Bridging Policy Learning and Language Modeling (2306.09200v2)
Abstract: When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in LLM training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and LLMing by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess. Leveraging the dataset, we showcase two model examples ChessCLIP and ChessGPT, integrating policy learning and LLMing. Finally, we propose a full evaluation framework for evaluating LLM's chess ability. Experimental results validate our model and dataset's effectiveness. We open source our code, model, and dataset at https://github.com/waterhorse1/ChessGPT.
- Playwright. https://github.com/microsoft/playwright.
- Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. https://github.com/nomic-ai/gpt4all, 2023.
- Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654, 2022.
- Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling, 2023.
- GPT-NeoX-20B: An Open-Source Autoregressive Language Model. arXiv preprint arXiv:2204.06745, 2022.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Sparks of artificial general intelligence: Early experiments with GPT-4. arXiv preprint arXiv:2303.12712, 2023.
- Deep Blue. Artificial Intelligence, 134(1):57–83, January 2002.
- CCRL. https://www.computerchess.org.uk/ccrl/.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, March 2023.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
- Dahoas. Dolly2. https://huggingface.co/datasets/databricks/databricks-dolly-15k.
- Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853, 2022.
- Wikimedia Foundation. Wikimedia Downloads. https://dumps.wikimedia.org.
- The Pile: An 800GB dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, Las Vegas, NV, USA, June 2016. IEEE.
- Metagpt: Meta programming for multi-agent collaborative framework. arXiv preprint arXiv:2308.00352, 2023.
- Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
- Constrained Value-Aligned LLM via Safe RLHF, May 2023.
- OpenAssistant Conversations–Democratizing Large Language Model Alignment. arXiv preprint arXiv:2304.07327, 2023.
- The bigscience ROOTS corpus: A 1.6TB composite multilingual dataset. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- Leela Chess Zero. https://lczero.org/.
- End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
- Grounded language-image pre-training. In CVPR, 2022.
- Lichess chess opening names. https://github.com/lichess-org/chess-openings.
- Lichess Developers. Lichess. https://lichess.org/.
- Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023.
- Acquisition of chess knowledge in AlphaZero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, November 2022.
- Aligning Superhuman AI with Human Behavior: Chess as a Model System. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, pages 1677–1687, New York, NY, USA, August 2020. Association for Computing Machinery.
- Cross-task generalization via natural language crowdsourcing instructions. arXiv preprint arXiv:2104.08773, 2021.
- Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- The chess transformer: Mastering play using generative language models. arXiv preprint arXiv:2008.04057, 2020.
- Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. In Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald L"ungen, and Caroline Iliadi, editors, Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019, pages 9 – 16, Mannheim, 2019. Leibniz-Institut f"ur Deutsche Sprache.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Instruction Tuning with GPT-4. arXiv preprint arXiv:2304.03277, 2023.
- FiLM: Visual reasoning with a general conditioning layer. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18, pages 3942–3951, New Orleans, Louisiana, USA, February 2018. AAAI Press.
- python-chess: a chess library for python. https://github.com/niklasf/python-chess.
- Learning Transferable Visual Models From Natural Language Supervision, February 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints, 2019.
- A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
- Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- ShareGPT. https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (New York, N.Y.), 362(6419):1140–1144, 2018.
- Alexei Barantsev Simon Stewart. Selenium, March 2023.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- Andreas Stöckl. Watching a language model learning chess. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1369–1379, 2021.
- Moss, March 2023.
- Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Together Computer. RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset. https://github.com/togethercomputer/RedPajama-Data, april 2023.
- Chess as a Testbed for Language Model State Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
- Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
- Alan Turing. Digital computers applied to games. Faster than thought, 1953.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv: Arxiv-2305.16291, 2023.
- Self-Instruct: Aligning Language Model with Self Generated Instructions, 2022.
- Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Auto-gpt for online decision making: Benchmarks and additional opinions. arXiv preprint arXiv:2306.02224, 2023.
- Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095, 2007.
- ChatGLM-6B, March 2023.