A Design Trajectory Map of Human-AI Collaborative Reinforcement Learning Systems: Survey and Taxonomy (2405.10214v1)
Abstract: Driven by the algorithmic advancements in reinforcement learning and the increasing number of implementations of human-AI collaboration, Collaborative Reinforcement Learning (CRL) has been receiving growing attention. Despite this recent upsurge, this area is still rarely systematically studied. In this paper, we provide an extensive survey, investigating CRL methods based on both interactive reinforcement learning algorithms and human-AI collaborative frameworks that were proposed in the past decade. We elucidate and discuss via synergistic analysis methods both the growth of the field and the state-of-the-art; we conceptualise the existing frameworks from the perspectives of design patterns, collaborative levels, parties and capabilities, and review interactive methods and algorithmic models. Specifically, we create a new Human-AI CRL Design Trajectory Map, as a systematic modelling tool for the selection of existing CRL frameworks, as well as a method of designing new CRL systems, and finally of improving future CRL designs. Furthermore, we elaborate generic Human-AI CRL challenges, providing the research community with a guide towards novel research directions. The aim of this paper is to empower researchers with a systematic framework for the design of efficient and 'natural' human-AI collaborative methods, making it possible to work on maximised realisation of humans' and AI's potentials.
- Keith Ronald Skene. Artificial Intelligence and the Environmental Crisis: Can Technology Really Save the World? Routledge, 2019.
- Eliezer Yudkowsky et al. Artificial intelligence as a positive and negative factor in global risk. Global catastrophic risks, 1(303):184, 2008.
- George Zarkadakis. In our own image: will artificial intelligence save or destroy us? Random House, 2015.
- Aaron Sloman. Did searle attack strong strong or weak strong ai. Artificial Intelligence and Its Applications, John Wiley and Sons, 1986.
- Elizabeth Gibney. Google ai algorithm masters ancient game of go. Nature News, 529(7587):445, 2016.
- Reinforcement learning: An introduction. MIT press, 2018.
- Cognitive systems engineering: New wine in new bottles. International journal of man-machine studies, 18(6):583–600, 1983.
- Cooperative work: A conceptual framework. Distributed decision making: Cognitive models for cooperative work, pages 75–110, 1991.
- Joint activity testbed: Blocks world for teams (bw4t). In International Workshop on Engineering Societies in the Agents World, pages 254–256. Springer, 2009.
- Power to the people: The role of humans in interactive machine learning. Ai Magazine, 35(4):105–120, 2014.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- A survey of inverse reinforcement learning techniques. International Journal of Intelligent Computing and Cybernetics, 2012.
- Explainable reinforcement learning: A survey. arXiv preprint arXiv:2005.06247, 2020.
- Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.
- Reinforcement learning with human advice. a survey. arXiv preprint arXiv:2005.11016, 2020.
- Human-centered reinforcement learning: a survey. IEEE Transactions on Human-Machine Systems, 49(4):337–349, 2019.
- A survey on interactive reinforcement learning: Design principles and open challenges. In Proceedings of the 2020 ACM Designing Interactive Systems Conference, pages 1195–1209, 2020.
- Frameworks for collective intelligence: A systematic literature review. ACM Computing Surveys (CSUR), 53(1):1–36, 2020.
- What’s in a feature: A requirements engineering perspective. In International Conference on Fundamental Approaches to Software Engineering, pages 16–30. Springer, 2008.
- A survey of collaborative reinforcement learning: Interactive methods and design patterns. In Designing Interactive Systems Conference 2021, pages 1579–1590, 2021.
- A survey on methods for the safety assurance of machine learning based systems. In 10th European Congress on Embedded Real Time Software and Systems (ERTS 2020), 2020.
- Peeking inside the black-box: A survey on explainable artificial intelligence (xai). IEEE Access, 6:52138–52160, 2018.
- Richard Bellman. Dynamic programming and a new formalism in the theory of integral equations. Proceedings of the National Academy of Sciences of the United States of America, 41(1):31, 1955.
- Richard Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
- John H Andreae. Stella: A scheme for a learning machine. IFAC Proceedings Volumes, 1(2):497–502, 1963.
- John Henry Holland et al. Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT press, 1992.
- On-line Q-learning using connectionist systems, volume 37. University of Cambridge, Department of Engineering Cambridge, UK, 1994.
- Monte carlo localization for mobile robots. In Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No. 99CH36288C), volume 2, pages 1322–1328. IEEE, 1999.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017.
- Decision transformer: Reinforcement learning via sequence modeling. arXiv preprint arXiv:2106.01345, 2021.
- Playing fps games with deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
- AV Bernstein and EV Burnaev. Reinforcement learning in computer vision. In Tenth International Conference on Machine Vision (ICMV 2017), volume 10696, page 106961S. International Society for Optics and Photonics, 2018.
- Gwern Branwen. Gpt-3 creative fiction. 2020.
- Recogym: A reinforcement learning environment for the problem of product recommendation in online advertising. arXiv preprint arXiv:1808.00720, 2018.
- Learning to summarize from human feedback. arXiv preprint arXiv:2009.01325, 2020.
- A survey on deep learning-based non-invasive brain signals: recent advances and new frontiers. Journal of Neural Engineering, 18(3):031002, 2021.
- Multi-modal integration of dynamic audiovisual patterns for an interactive reinforcement learning scenario. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 759–766. IEEE, 2016.
- Karel van den Bosch and Adelbert Bronkhorst. Human-ai cooperation to benefit military decision making. NATO, 2018.
- Coactive design: Designing support for interdependence in joint activity. Journal of Human-Robot Interaction, 3(1):43–69, 2014.
- Principles of adjustable autonomy: a framework for resilient human–machine cooperation. Cognition, Technology & Work, 12(3):193–203, 2010.
- Intelligent robotic wheelchair with emg-, gesture-, and voice-based interfaces. In Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003)(Cat. No. 03CH37453), volume 4, pages 3453–3458. IEEE, 2003.
- Regional tree regularization for interpretability in deep neural networks. In AAAI, pages 6413–6421, 2020.
- Programmatically interpretable reinforcement learning. arXiv preprint arXiv:1804.02477, 2018.
- Toward interpretable deep reinforcement learning with linear model u-trees. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 414–429. Springer, 2018.
- Explainable reinforcement learning through a causal lens. arXiv preprint arXiv:1905.10958, 2019.
- Six challenges for human-ai co-learning. In International Conference on Human-Computer Interaction, pages 572–589. Springer, 2019.
- Pandemic as a challenge for human-ai cooperation. In Proceedings of the AIIDE workshop on Experimental AI in Games, 2019.
- Games as conversation. In Tenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2014.
- Implicit communication of actionable information in human-ai teams. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–13, 2019.
- R KLING. Routine decision-making-the future of bureaucracy-inbar, m, 1981.
- Ai safety via debate. arXiv preprint arXiv:1805.00899, 2018.
- Dynamic mental models in learning science: The importance of constructing derivational linkages among models. Journal of Research in Science Teaching: The Official Journal of the National Association for Research in Science Teaching, 36(7):806–836, 1999.
- Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257, 2019.
- Safe on the road–does advanced driver-assistance systems use affect road risk perception? Transportation research part F: traffic psychology and behaviour, 73:488–498, 2020.
- Open problems in cooperative ai. arXiv preprint arXiv:2012.08630, 2020.
- Jon Hovi. Games, threats, and treaties: understanding commitments in international relations. Burns & Oates, 1998.
- Autonomous agents modelling other agents: A comprehensive survey and open problems. Artificial Intelligence, 258:66–95, 2018.
- Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
- Occam’s razor is insufficient to infer the preferences of irrational agents. arXiv preprint arXiv:1712.05812, 2017.
- Repeated inverse reinforcement learning. arXiv preprint arXiv:1705.05427, 2017.
- Information-theoretic exploration with bayesian optimization. In 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1816–1822. IEEE, 2016.
- Negotiating with other minds: the role of recursive theory of mind in negotiation with incomplete information. Autonomous Agents and Multi-Agent Systems, 31(2):250–287, 2017.
- The hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280:103216, 2020.
- Progress in the simulation of emergent communication and language. Adaptive Behavior, 11(1):37–69, 2003.
- Learning multiagent communication with backpropagation. Advances in neural information processing systems, 29:2244–2252, 2016.
- Learning to communicate with deep multi-agent reinforcement learning. arXiv preprint arXiv:1605.06676, 2016.
- Emergent communication through negotiation. arXiv preprint arXiv:1804.03980, 2018.
- James D Fearon. Rationalist explanations for war. International organization, 49(3):379–414, 1995.
- Amartya Sen. Goals, commitment, and identity. JL Econ. & Org., 1:341, 1985.
- Douglass C North. Institutions and credible commitment. Journal of Institutional and Theoretical Economics (JITE)/Zeitschrift für die gesamte Staatswissenschaft, pages 11–23, 1993.
- Kyle Bagwell. Commitment and observability in games. Games and Economic Behavior, 8(2):271–280, 1995.
- The reasons for wars: an updated survey. In The handbook on the political economy of war. Edward Elgar Publishing, 2011.
- Kcofi: Complete control-flow integrity for commodity operating system kernels. In 2014 IEEE Symposium on Security and Privacy, pages 292–307. IEEE, 2014.
- Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 254–269, 2016.
- From institutions to code: Towards automated generation of smart contracts. In 2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS* W), pages 210–215. IEEE, 2016.
- Reid G Smith. The contract net protocol: High-level communication and control in a distributed problem solver. IEEE Transactions on computers, 29(12):1104–1113, 1980.
- A survey of multi-agent organizational paradigms. The Knowledge engineering review, 19(4):281–316, 2004.
- Multi-agent systems: an introduction to distributed artificial intelligence, volume 1. Addison-Wesley Reading, 1999.
- Readings in distributed artificial intelligence. Morgan Kaufmann, 2014.
- Organizing multiagent systems. Autonomous Agents and Multi-Agent Systems, 11(3):307–360, 2005.
- Payment rules through discriminant-based classifiers, 2015.
- Optimal auctions through deep learning. In International Conference on Machine Learning, pages 1706–1715. PMLR, 2019.
- A neural architecture for designing truthful and efficient auctions. arXiv preprint arXiv:1907.05181, 2019.
- Multi-agent pathfinding as a combinatorial auction. In Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
- Interactively shaping agents via human reinforcement: The tamer framework. In Proceedings of the fifth international conference on Knowledge capture, pages 9–16, 2009.
- Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning. Autonomous agents and multi-agent systems, 30(1):30–59, 2016.
- Interactive learning from policy-dependent human feedback. arXiv preprint arXiv:1701.06049, 2017.
- Real-time interactive reinforcement learning for robots. In AAAI 2005 workshop on human comprehensible machine learning, 2005.
- Teachable robots: Understanding human teaching behavior to build more effective robot learners. Artificial Intelligence, 172(6-7):716–737, 2008.
- Deep tamer: Interactive agent shaping in high-dimensional state spaces. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Gesture-based programming: A preliminary demonstration. In Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No. 99CH36288C), volume 1, pages 708–713. IEEE, 1999.
- Facial feedback for reinforcement learning: a case study and offline analysis using the tamer framework. Autonomous Agents and Multi-Agent Systems, 34(1):1–29, 2020.
- Sandra Clara Gadanho. Learning behavior-selection by emotions and cognition in a multi-goal robot task. Journal of Machine Learning Research, 4(Jul):385–412, 2003.
- Dqn-tamer: Human-in-the-loop reinforcement learning with intractable feedback. arXiv preprint arXiv:1810.11748, 2018.
- Affective personalization of a social robot tutor for children’s second language skills. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 3951–3957, 2016.
- Using natural language for reward shaping in reinforcement learning. arXiv preprint arXiv:1903.02020, 2019.
- Creating advice-taking reinforcement learners. Machine Learning, 22(1-3):251–281, 1996.
- Guiding a reinforcement learner with natural language advice: Initial results in robocup soccer. In The AAAI-2004 workshop on supervisory control of learning and adaptive systems. San Jose, CA, 2004.
- Learning to parse natural language to grounded reward functions with weak supervision. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–7. IEEE, 2018.
- Multimodal human discourse: gesture and speech. ACM Transactions on Computer-Human Interaction (TOCHI), 9(3):171–193, 2002.
- Real-time adaptation of a robotic joke teller based on human social signals. 2018.
- A social reinforcement learning agent. In Proceedings of the fifth international conference on Autonomous agents, pages 377–384, 2001.
- " enable or disable gamification?" analyzing the impact of choice in a gamified image tagging task. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–12, 2019.
- Teaching with rewards and punishments: Reinforcement or communication? In CogSci, 2015.
- Reinforcement learning with human teachers: Understanding how people want to teach robots. In ROMAN 2006-The 15th IEEE International Symposium on Robot and Human Interactive Communication, pages 352–357. IEEE, 2006.
- A survey of robot learning from demonstration. Robotics and autonomous systems, 57(5):469–483, 2009.
- Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems, pages 2625–2633, 2013.
- Integrating reinforcement learning with human demonstrations of varying ability. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 617–624, 2011.
- Introspective reinforcement learning and learning from demonstration. In AAMAS, pages 1992–1994, 2018.
- Adding guidance to interactive reinforcement learning. In Proceedings of the Twentieth Conference on Artificial Intelligence (AAAI), 2006.
- Design pattern mining enhanced by machine learning. In 21st IEEE International Conference on Software Maintenance (ICSM’05), pages 295–304. IEEE, 2005.
- Deep reinforcement learning for nlp. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 19–21, 2018.
- Andreas Holzinger. From machine learning to explainable ai. In 2018 world symposium on digital intelligence for systems and machines (DISA), pages 55–66. IEEE, 2018.
- Simstu-transformer: A transformer-based approach to simulating student behaviour. In International Conference on Artificial Intelligence in Education, pages 348–351. Springer, 2022.
- Broader and deeper: A multi-features with latent relations bert knowledge tracing model. In European Conference on Technology Enhanced Learning, pages 183–197. Springer, 2023.
- Zhaoxing Li. Deep Reinforcement Learning Approaches for Technology Enhanced Learning. PhD thesis, Durham University, 2023.
- Sim-gail: A generative adversarial imitation learning approach of student modelling for intelligent tutoring systems. Neural Computing and Applications, 35(34):24369–24388, 2023.
- Towards student behaviour simulation: a decision transformer based approach. In International Conference on Intelligent Tutoring Systems, pages 553–562. Springer, 2023.
- Lbkt: a lstm bert-based knowledge tracing model for long-sequence data. 2024.
- Ge Wang. Humans in the loop: The design of interactive ai systems, 2020.
- Exploring the potential of immersive virtual environments for learning american sign language. In European Conference on Technology Enhanced Learning, pages 459–474. Springer, 2023.
- On grice’s theory of conversation. Conversation and discourse, pages 155–78, 1981.
- User-defined hand gesture interface to improve user experience of learning american sign language. In International Conference on Intelligent Tutoring Systems, pages 479–490. Springer, 2023.
- Li Deng. The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 29(6):141–142, 2012.
- H Clark Barrett. Deciding what to observe: Thoughts for a post-weird generation. Evolution and Human Behavior, 41(5):445–453, 2020.
- Comparative efficacy of 2d and 3d virtual reality games in american sign language learning. In The 31st IEEE Conference on Virtual Reality and 3D User Interfaces. Newcastle University, 2024.
- Impact of personalised ai chat assistant on mediated human-human textual conversations: Exploring female-male differences. In Companion Proceedings of the 29th International Conference on Intelligent User Interfaces, pages 78–83, 2024.
- Rules of encounter: designing conventions for automated negotiation among computers. MIT press, 1994.
- William Bradley Knox. Learning from human-generated reward. 2012.
- Interactive reinforcement learning for autonomous behavior design. In Artificial Intelligence for Human Computer Interaction: A Modern Approach, pages 345–375. Springer, 2021.
- Leveraging human knowledge in tabular reinforcement learning: A study of human subjects. The Knowledge Engineering Review, 33, 2018.
- Heuristically-accelerated multiagent reinforcement learning. IEEE transactions on cybernetics, 44(2):252–265, 2013.
- Active inverse reward design. arXiv preprint arXiv:1809.03060, 2018.
- Inverse reward design. arXiv preprint arXiv:1711.02827, 2017.
- Interaction algorithm effect on human experience with reinforcement learning. ACM Transactions on Human-Robot Interaction (THRI), 7(2):1–22, 2018.
- Effect of human guidance and state space size on interactive reinforcement learning. In 2011 Ro-Man, pages 1–6. IEEE, 2011.
- Learning shaping strategies in human-in-the-loop interactive reinforcement learning. arXiv preprint arXiv:1811.04272, 2018.
- Collaborating with humans without human data. arXiv preprint arXiv:2110.08176, 2021.
- Experience-based causality learning for intelligent agents. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 18(4):1–22, 2019.
- Effect of interaction design on the human experience with interactive reinforcement learning. In Proceedings of the 2019 on Designing Interactive Systems Conference, pages 1089–1100, 2019.
- Highlights: Summarizing agent behavior to people. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pages 1168–1176, 2018.
- Breeding a diversity of super mario behaviors through interactive evolution. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1–7. IEEE, 2016.
- A bayesian approach for policy learning from trajectory preference queries. Advances in neural information processing systems, 25:1133–1141, 2012.