Impartial Games: A Challenge for Reinforcement Learning (2205.12787v4)
Abstract: While AlphaZero-style reinforcement learning (RL) algorithms excel in various board games, in this paper we show that they face challenges on impartial games where players share pieces. We present a concrete example of a game - namely the children's game of Nim - and other impartial games that seem to be a stumbling block for AlphaZero-style and similar self-play reinforcement learning algorithms. Our work is built on the challenges posed by the intricacies of data distribution on the ability of neural networks to learn parity functions, exacerbated by the noisy labels issue. Our findings are consistent with recent studies showing that AlphaZero-style algorithms are vulnerable to adversarial attacks and adversarial perturbations, showing the difficulty of learning to master the games in all legal states. We show that Nim can be learned on small boards, but the learning progress of AlphaZero-style algorithms dramatically slows down when the board size increases. Intuitively, the difference between impartial games like Nim and partisan games like Chess and Go can be explained by the fact that if a small part of the board is covered for impartial games it is typically not possible to predict whether the position is won or lost as there is often zero correlation between the visible part of a partly blanked-out position and its correct evaluation. This situation starkly contrasts partisan games where a partly blanked-out board position typically provides abundant or at least non-trifle information about the value of the fully uncovered position.
- Provable advantage of curriculum learning on parity targets with mixed inputs. arXiv preprint arXiv:2306.16921, 2023.
- Louis Victor Allis et al. Searching for solutions in games and artificial intelligence. Ponsen & Looijen Wageningen, 1994.
- On pruning search trees of impartial games. Artificial Intelligence, 283:103262, 2020.
- Winning ways for your mathematical plays, volume 1. AK Peters/CRC Press, 2001.
- Winning ways for your mathematical plays, volume 2. AK Peters/CRC Press, 2002.
- Winning ways for your mathematical plays, volume 3. AK Peters/CRC Press, 2003.
- Winning ways for your mathematical plays, volume 4. AK Peters/CRC Press, 2004.
- Charles L Bouton. Nim, a game with a complete mathematical theory. The Annals of Mathematics, 3(1/4):35–39, 1901.
- Polygames: Improved zero learning. ICGA Journal, (Preprint):1–13, 2020.
- A mathematical model for curriculum learning for parities. 2023.
- Learning parities with neural networks. Advances in Neural Information Processing Systems, 33, 2020.
- Policy improvement by planning with gumbel. In International Conference on Learning Representations, 2021.
- Martin Gardner. Mathematical games: Of sprouts and brussels sprouts; games with a topological avor. Scientific American, 217(1):112–115, 1967.
- Are alphazero-like agents robust to adversarial perturbations? Advances in Neural Information Processing Systems, 35:11229–11240, 2022.
- Openspiel: A framework for reinforcement learning in games. arXiv preprint arXiv:1908.09453, 2019.
- Rllib: Abstractions for distributed reinforcement learning. In International conference on machine learning, pages 3053–3062. PMLR, 2018.
- Chess ai: Competing paradigms for machine intelligence. arXiv preprint arXiv:2109.11602, 2021.
- Ray: A distributed framework for emerging {{\{{AI}}\}} applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 561–577, 2018.
- Yu Nasu. Efficiently updatable neural-network-based evaluation functions for computer shogi. The 28th World Computer Shogi Championship Appeal Document, 2018.
- Richard J Nowakowski. Games of no chance, volume 29. Cambridge University Press, 1998.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Karl R Popper. Objective knowledge, volume 360. Oxford University Press Oxford, 1972.
- Ran Raz. Fast learning requires good memory: A time-space lower bound for parity learning. Journal of the ACM (JACM), 66(1):1–18, 2018.
- Intrinsic chess ratings. In Twenty-fifth aaai conference on artificial intelligence, 2011.
- Game changer. AlphaZero’s Groundbreaking Chess Strategies and the Promise of AI. Alkmaar. The Netherlands. New in Chess, 2019.
- Thomas J Schaefer. On the complexity of some two-person perfect-information games. Journal of Computer and System Sciences, 16(2):185–225, 1978.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Failures of gradient-based deep learning. In International Conference on Machine Learning, pages 3067–3075. PMLR, 2017.
- A novel approach to solving goal-achieving problems for board games. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 10362–10369, 2022.
- Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018.
- Chris Thornton. Parity: the problem that won’t go away. In Conference of the Canadian Society for Computational Studies of Intelligence, pages 362–374. Springer, 1996.
- Elf opengo: An analysis and open reimplementation of alphazero. In International Conference on Machine Learning, pages 6244–6253. PMLR, 2019.
- Games solved: Now and in the future. Artificial Intelligence, 134(1-2):277–311, 2002.
- JULIEN LEMOINE-SIMON VIENNOT. A further computer analysis of sprouts. 2007.
- Adversarial policies beat superhuman go ais. 2023.
- David J Wu. Accelerating self-play learning in go. arXiv preprint arXiv:1902.10565, 2019.
- Exploring parity challenges in reinforcement learning through curriculum learning with noisy labels, 2023.