Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules (2306.17766v1)
Abstract: Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.
- Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2623–2631, Anchorage AK USA, July 2019. ACM. ISBN 978-1-4503-6201-6. doi: 10.1145/3292500.3330701. URL https://dl.acm.org/doi/10.1145/3292500.3330701.
- What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study, June 2020. URL http://arxiv.org/abs/2006.05990. arXiv:2006.05990 [cs, stat].
- What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=nIAxjsniDzg.
- Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, 7:2–11, October 2019. ISSN 2769-1349, 2769-1330. doi: 10.1609/hcomp.v7i1.5285. URL https://ojs.aaai.org/index.php/HCOMP/article/view/5285.
- Is the Most Accurate AI the Best Teammate? Optimizing AI for Teamwork. Proceedings of the AAAI Conference on Artificial Intelligence, 35(13):11405–11414, May 2021. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v35i13.17359. URL https://ojs.aaai.org/index.php/AAAI/article/view/17359.
- The Arcade Learning Environment: An Evaluation Platform for General Agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757. doi: 10.1613/jair.3912. URL https://www.jair.org/index.php/jair/article/view/10819.
- OpenAI Gym, June 2016. URL http://arxiv.org/abs/1606.01540. Number: arXiv:1606.01540 arXiv:1606.01540 [cs].
- Amazon’s Mechanical Turk: A New Source of Inexpensive, yet High-Quality, Data? In Alan E. Kazdin, editor, Methodological Issues and Strategies in Clinical Research, pages 133–140. American Psychological Association, 4 edition, 2016. ISBN 978-1-4338-2091-5. URL http://www.jstor.org/stable/j.ctv1chrzdj.15.
- Rethink reporting of evaluation results in AI. Science, 380(6641):136–138, April 2023. doi: 10.1126/science.adf6369. URL https://www.science.org/doi/10.1126/science.adf6369. Publisher: American Association for the Advancement of Science.
- François Chollet. On the Measure of Intelligence, November 2019. URL http://arxiv.org/abs/1911.01547. arXiv:1911.01547 [cs].
- Leveraging Procedural Generation to Benchmark Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, pages 2048–2056. PMLR, November 2020. URL https://proceedings.mlr.press/v119/cobbe20a.html. ISSN: 2640-3498.
- A framework for rigorous evaluation of human performance in human and machine learning comparison studies. Scientific Reports, 12(1):5444, March 2022. ISSN 2045-2322. doi: 10.1038/s41598-022-08078-3. URL https://www.nature.com/articles/s41598-022-08078-3. Number: 1 Publisher: Nature Publishing Group.
- MineRL: A Large-Scale Dataset of Minecraft Demonstrations. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, pages 2442–2448, Macao, China, August 2019. International Joint Conferences on Artificial Intelligence Organization. ISBN 978-0-9992411-4-1. doi: 10.24963/ijcai.2019/339. URL https://www.ijcai.org/proceedings/2019/339.
- Cooperative Inverse Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://proceedings.neurips.cc/paper_files/paper/2016/hash/c3395dd46c34fa7fd8d729d8cf88b7a8-Abstract.html.
- Deep Reinforcement Learning That Matters. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v32i1.11694. URL https://ojs.aaai.org/index.php/AAAI/article/view/11694.
- José Hernández-Orallo. Evaluation in artificial intelligence: from task-oriented to ability-oriented measurement. Artificial Intelligence Review, 48(3):397–447, October 2017. ISSN 0269-2821, 1573-7462. doi: 10.1007/s10462-016-9505-7. URL http://link.springer.com/10.1007/s10462-016-9505-7.
- Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control, August 2017. URL http://arxiv.org/abs/1708.04133. arXiv:1708.04133 [cs].
- Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, May 2019. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aau6249. URL https://www.science.org/doi/10.1126/science.aau6249.
- Obstacle Tower: A Generalization Challenge in Vision, Control, and Planning. Technical Report arXiv:1902.01378, arXiv, July 2019. URL http://arxiv.org/abs/1902.01378. arXiv:1902.01378 [cs] type: article.
- ViZDoom: A Doom-based AI research platform for visual reinforcement learning. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1–8, September 2016. doi: 10.1109/CIG.2016.7860433. ISSN: 2325-4289.
- Human vs. supervised machine learning: Who learns patterns faster? Cognitive Systems Research, 76:78–92, December 2022. ISSN 13890417. doi: 10.1016/j.cogsys.2022.09.002. URL https://linkinghub.elsevier.com/retrieve/pii/S1389041722000419.
- The NetHack Learning Environment. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 7671–7684. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/569ff987c643b4bedf504efda8f786c2-Paper.pdf.
- TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49(2):433–442, April 2017. ISSN 1554-3528. doi: 10.3758/s13428-016-0727-z. URL http://link.springer.com/10.3758/s13428-016-0727-z.
- A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health, 1(6):e271–e297, 2019. ISSN 2589-7500. doi: https://doi.org/10.1016/S2589-7500(19)30123-2. URL https://www.sciencedirect.com/science/article/pii/S2589750019301232.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 0028-0836, 1476-4687. doi: 10.1038/nature14236. URL http://www.nature.com/articles/nature14236.
- Nadim Nachar. The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution. Tutorials in Quantitative Methods for Psychology, 4(1):13–20, March 2008. ISSN 1913-4126. doi: 10.20982/tqmp.04.1.p013. URL http://www.tqmp.org/RegularArticles/vol04-1/p013.
- Dota 2 with Large Scale Deep Reinforcement Learning. Technical Report arXiv:1912.06680, arXiv, December 2019. URL http://arxiv.org/abs/1912.06680. arXiv:1912.06680 [cs, stat] type: article.
- Behaviour Suite for Reinforcement Learning. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rygf-kSYwH.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
- The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning. Technical Report arXiv:2207.10218, arXiv, July 2022. URL http://arxiv.org/abs/2207.10218. arXiv:2207.10218 [cs] type: article.
- A. L. Samuel. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3(3):210–229, July 1959. ISSN 0018-8646, 0018-8646. doi: 10.1147/rd.33.0210. URL http://ieeexplore.ieee.org/document/5392560/.
- MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Paper-round1.pdf.
- Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, December 2020. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-020-03051-4. URL http://www.nature.com/articles/s41586-020-03051-4.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, January 2016. ISSN 0028-0836, 1476-4687. doi: 10.1038/nature16961. URL http://www.nature.com/articles/nature16961.
- Mastering the game of Go without human knowledge. Nature, 550(7676):354–359, October 2017. ISSN 0028-0836, 1476-4687. doi: 10.1038/nature24270. URL http://www.nature.com/articles/nature24270.
- A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419):1140–1144, December 2018. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.aar6404. URL https://www.science.org/doi/10.1126/science.aar6404.
- The Neural MMO Platform for Massively Multiagent Research. In J. Vanschoren and S. Yeung, editors, Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1. Curran, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/44f683a84163b3523afe57c2e008bc8c-Paper-round1.pdf.
- Reinforcement learning: An introduction. MIT press, 2018.
- Gerald Tesauro. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play. Neural Computation, 6(2):215–219, March 1994. ISSN 0899-7667, 1530-888X. doi: 10.1162/neco.1994.6.2.215. URL https://direct.mit.edu/neco/article/6/2/215-219/5771.
- Reimagining chess with AlphaZero. Communications of the ACM, 65(2):60–66, February 2022. ISSN 0001-0782, 1557-7317. doi: 10.1145/3460349. URL https://dl.acm.org/doi/10.1145/3460349.
- Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), March 2016. ISSN 2374-3468, 2159-5399. doi: 10.1609/aaai.v30i1.10295. URL https://ojs.aaai.org/index.php/AAAI/article/view/10295.
- StarCraft II: A New Challenge for Reinforcement Learning, August 2017. URL http://arxiv.org/abs/1708.04782. arXiv:1708.04782 [cs].
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, November 2019. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-019-1724-z. URL http://www.nature.com/articles/s41586-019-1724-z.
- Benchmarking Model-Based Reinforcement Learning, July 2019. URL http://arxiv.org/abs/1907.02057. arXiv:1907.02057 [cs, stat].
- Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 11881–11892. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/4dbb61cb68671edc4ca3712d70083b9f-Paper-Datasets_and_Benchmarks.pdf.
- Ronald J. Williams. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. In Richard S. Sutton, editor, Reinforcement Learning, pages 5–32. Springer US, Boston, MA, 1992. ISBN 978-1-4613-6608-9 978-1-4615-3618-5. doi: 10.1007/978-1-4615-3618-5_2. URL http://link.springer.com/10.1007/978-1-4615-3618-5_2.