Emergent Mind

ChessGPT: Bridging Policy Learning and Language Modeling

(2306.09200)
Published Jun 15, 2023 in cs.LG and cs.AI

Abstract

When solving decision-making tasks, humans typically depend on information from two key sources: (1) Historical policy data, which provides interaction replay from the environment, and (2) Analytical insights in natural language form, exposing the invaluable thought process or strategic considerations. Despite this, the majority of preceding research focuses on only one source: they either use historical replay exclusively to directly learn policy or value functions, or engaged in language model training utilizing mere language corpus. In this paper, we argue that a powerful autonomous agent should cover both sources. Thus, we propose ChessGPT, a GPT model bridging policy learning and language modeling by integrating data from these two sources in Chess games. Specifically, we build a large-scale game and language dataset related to chess. Leveraging the dataset, we showcase two model examples ChessCLIP and ChessGPT, integrating policy learning and language modeling. Finally, we propose a full evaluation framework for evaluating language model's chess ability. Experimental results validate our model and dataset's effectiveness. We open source our code, model, and dataset at https://github.com/waterhorse1/ChessGPT.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Sign up for a free account or log in to generate a summary of this paper:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. Playwright. https://github.com/microsoft/playwright.

  2. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
  3. GPT4All: Training an Assistant-style Chatbot with Large Scale Data Distillation from GPT-3.5-Turbo. https://github.com/nomic-ai/gpt4all

  4. Video pretraining (vpt): Learning to act by watching unlabeled online videos. Advances in Neural Information Processing Systems, 35:24639–24654
  5. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
  6. GPT-NeoX-20B: An Open-Source Autoregressive Language Model
  7. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  8. Sparks of Artificial General Intelligence: Early experiments with GPT-4
  9. Deep Blue. Artificial Intelligence, 134(1):57–83, January 2002.
  10. CCRL. https://www.computerchess.org.uk/ccrl/.

  11. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097
  12. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, March 2023
  13. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30
  14. Dahoas. Dolly2. https://huggingface.co/datasets/databricks/databricks-dolly-15k.

  15. Guiding Pretraining in Reinforcement Learning with Large Language Models
  16. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335
  17. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
  18. Wikimedia Foundation. Wikimedia Downloads. https://dumps.wikimedia.org.

  19. The Pile: An 800GB Dataset of Diverse Text for Language Modeling
  20. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, Las Vegas, NV, USA, June 2016. IEEE.
  21. MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
  22. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286
  23. Constrained Value-Aligned LLM via Safe RLHF, May 2023
  24. OpenAssistant Conversations -- Democratizing Large Language Model Alignment
  25. The bigscience ROOTS corpus: A 1.6TB composite multilingual dataset. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track
  26. Leela Chess Zero. https://lczero.org/.

  27. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373
  28. Grounded language-image pre-training. In CVPR
  29. Lichess chess opening names. https://github.com/lichess-org/chess-openings.

  30. Lichess Developers. Lichess. https://lichess.org/.

  31. AgentBench: Evaluating LLMs as Agents
  32. Acquisition of chess knowledge in AlphaZero. Proceedings of the National Academy of Sciences, 119(47):e2206625119, November 2022.
  33. Aligning Superhuman AI with Human Behavior: Chess as a Model System. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’20, pages 1677–1687, New York, NY, USA, August 2020. Association for Computing Machinery.
  34. Cross-Task Generalization via Natural Language Crowdsourcing Instructions
  35. Playing Atari with Deep Reinforcement Learning
  36. The Chess Transformer: Mastering Play using Generative Language Models
  37. Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures. In Piotr Bański, Adrien Barbaresi, Hanno Biber, Evelyn Breiteneder, Simon Clematide, Marc Kupietz, Harald L"ungen, and Caroline Iliadi, editors, Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-7) 2019. Cardiff, 22nd July 2019, pages 9 – 16, Mannheim, 2019. Leibniz-Institut f"ur Deutsche Sprache.
  38. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744
  39. Instruction Tuning with GPT-4
  40. FiLM: Visual reasoning with a general conditioning layer. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’18/IAAI’18/EAAI’18, pages 3942–3951, New Orleans, Louisiana, USA, February 2018. AAAI Press.
  41. python-chess: a chess library for python. https://github.com/niklasf/python-chess.

  42. Learning Transferable Visual Models From Natural Language Supervision, February 2021
  43. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv e-prints
  44. A Generalist Agent
  45. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609
  46. ShareGPT. https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered.

  47. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science (New York, N.Y.), 362(6419):1140–1144
  48. Alexei Barantsev Simon Stewart. Selenium, March 2023.
  49. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
  50. Andreas Stöckl. Watching a language model learning chess. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 1369–1379
  51. Moss, March 2023
  52. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca

  53. Together Computer. RedPajama: An Open Source Recipe to Reproduce LLaMA training dataset. https://github.com/togethercomputer/RedPajama-Data, april 2023.

  54. Chess as a Testbed for Language Model State Tracking. In Proceedings of the AAAI Conference on Artificial Intelligence
  55. LLaMA: Open and Efficient Foundation Language Models
  56. Alan Turing. Digital computers applied to games. Faster than thought
  57. Attention is all you need. Advances in neural information processing systems, 30
  58. Voyager: An Open-Ended Embodied Agent with Large Language Models
  59. Self-Instruct: Aligning Language Model with Self Generated Instructions
  60. Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
  61. Finetuned Language Models Are Zero-Shot Learners
  62. Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions
  63. Li Yujian and Liu Bo. A normalized levenshtein distance metric. IEEE transactions on pattern analysis and machine intelligence, 29(6):1091–1095
  64. ChatGLM-6B, March 2023

Show All 64