An Evolutionary Framework for Connect-4 as Test-Bed for Comparison of Advanced Minimax, Q-Learning and MCTS (2405.16595v1)
Abstract: A major challenge in decision making domains with large state spaces is to effectively select actions which maximize utility. In recent years, approaches such as reinforcement learning (RL) and search algorithms have been successful to tackle this issue, despite their differences. RL defines a learning framework that an agent explores and interacts with. Search algorithms provide a formalism to search for a solution. However, it is often difficult to evaluate the performances of such approaches in a practical way. Motivated by this problem, we focus on one game domain, i.e., Connect-4, and develop a novel evolutionary framework to evaluate three classes of algorithms: RL, Minimax and Monte Carlo tree search (MCTS). The contribution of this paper is threefold: i) we implement advanced versions of these algorithms and provide a systematic comparison with their standard counterpart, ii) we develop a novel evaluation framework, which we call the Evolutionary Tournament, and iii) we conduct an extensive evaluation of the relative performance of each algorithm to compare our findings. We evaluate different metrics and show that MCTS achieves the best results in terms of win percentage, whereas Minimax and Q-Learning are ranked in second and third place, respectively, although the latter is shown to be the fastest to make a decision.
- X. Kang, Y. Wang and Y. Hu, “Research on different heuristics for Minimax algorithm insight from connect-4 game”, Journal of Intelligent Learning Systems and Applications, vol. 11, pp. 15–31, 2019.
- M. Schneider and J. Garcia Rosa, “Neural connect 4 - a connectionist approach to the game”, Brazilian Symposium on Neural Networks, 2002, pp. 236–241.
- H. van den Herik, J. W. Uiterwijk and J. van Rijswijck, “Games solved: Now and in the future”, Artificial Intelligence, vol. 134, no. 1, pp. 277–311, 2002.
- H. Wang, M. Emmerich and A. Plaat, “Assessing the potential of classical q-learning in general game playing”, Communications in Computer and Information Science, vol 1021, pp. 138–150, 2018.
- R. Nasa, R. Didwania, S. Maji, and V. Kumar, “Alpha-beta pruning in mini-max algorithm –an optimized approach for a connect-4 game”, International Research Journal of Engineering and Technology (IRJET), vol. 5, 2018.
- J. Scheiermann and W. Konen, “AlphaZero-inspired game learning: Faster training by using MCTS only at test time”, IEEE Transactions on Games, pp. 1–11, 2022.
- M. Dabas, N. Dahiya and P. Pushparaj, “Solving connect 4 using artificial intelligence”, International Conference on Innovative Computing and Communications, Singapore: Springer Singapore, 2022, pp. 727–735.
- E. R. Escandon and J. Campion, “Minimax checkers playing GUI: A foundation for AI applications”, IEEE XXV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), 2018, pp. 1–4.
- J. Persson and T. Jakobsson, “Self-learning game player – connect-4 with q-learning”, Bachelor’s dissertation, KTH Royal Institute of Technology, Stockholm, Sweden, 2011.
- R. Axelrod and W. D. Hamilton, “The Evolution of Cooperation”, Science, vol. 211, no. 4489, pp. 1390–1396, 1981.
- O. Arvidsson and L. Wallgren, “Q-learning for a simple board game”, Bachelor’s dissertation, KTH Royal Institute of Technology, Stockholm, Sweden, 2010.
- L. Kocsis and C. Szepesvári, “Bandit based monte-carlo planning”, Machine Learning: ECML 2006, Berlin, 2006, pp. 282–293.
- F. Teytaud and O. Teytaud, “On the huge benefit of decisive moves in monte-carlo tree search algorithm”, Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, 2010, pp. 359–364.
- S. Airiau, S. Saha and S. Sen, “Evolutionary Tournament-Based Comparison of Learning and Non-Learning Algorithms for Iterated Games”, Proceedings of the Eighteenth International Florida Artificial Intelligence, pp. 449–454, 2005.
- B. L. Miller and D. E. Goldberg, “Genetic algorithms, tournament selection, and the effects of noise”, Complex Systems, vol. 9, 1995.