Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming (2312.14292v2)

Published 21 Dec 2023 in cs.AI, cs.LG, and cs.MA

Abstract: Preference-based Reinforcement Learning (PbRL) has made significant strides in single-agent settings, but has not been studied for multi-agent frameworks. On the other hand, modeling cooperation between multiple agents, specifically, Human-AI Teaming settings while ensuring successful task completion is a challenging problem. To this end, we perform the first investigation of multi-agent PbRL by extending single-agent PbRL to the two-agent teaming settings and formulate it as a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior. Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly. Secondly, we study the RL agent's varying access to the human policy. We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy. We motivate the need for taking Human Flexibility into account and the usefulness of Specified Orchestration through a gamified user study. We evaluate state-of-the-art PbRL algorithms for Human-AI cooperative setups through robot locomotion based domains that explicitly require forced cooperation. Our findings highlight the challenges associated with PbRL by varying Human Flexibility and agent's access to the human policy. Finally, we draw insights from our user study and empirical results, and conclude that Specified Orchestration can be seen as an upper bound PbRL performance for future research in Human-AI teaming scenarios.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Transparent, explainable, and accountable ai for robotics. Science robotics, 2(6):eaan6080, 2017.
  2. Ai-based digital assistants: Opportunities, threats, and research perspectives. Business & Information Systems Engineering, 61:535–544, 2019.
  3. Introduction to artificial intelligence in medicine. Minimally Invasive Therapy & Allied Technologies, 28(2):73–81, 2019.
  4. Yali Du. Cooperative multi-agent learning in a complex world: challenges and solutions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 15436–15436, 2023.
  5. Multi-agent systems: Technical & ethical challenges of functioning in a mixed group. Daedalus, 151(2):114–126, 2022.
  6. On the utility of learning about humans for human-ai coordination. Advances in neural information processing systems, 32, 2019.
  7. Sage Bergerson. Multi-agent inverse reinforcement learning: Suboptimal demonstrations and alternative solution concepts. arXiv preprint arXiv:2109.01178, 2021.
  8. A survey of preference-based reinforcement learning methods. Journal of Machine Learning Research, 18(136):1–46, 2017.
  9. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
  10. Cooperative inverse reinforcement learning. Advances in neural information processing systems, 29, 2016.
  11. Edouard Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
  12. Gymnasium, March 2023.
  13. Reward uncertainty for exploration in preference-based reinforcement learning. arXiv preprint arXiv:2205.12401, 2022.
  14. Surf: Semi-supervised reward learning with data augmentation for feedback-efficient preference-based reinforcement learning. arXiv preprint arXiv:2203.10050, 2022.
  15. Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091, 2021.
  16. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Machine learning, 89:123–156, 2012.
  17. Reward learning from human preferences and demonstrations in atari. Advances in neural information processing systems, 31, 2018.
  18. Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.
  19. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
  20. Interpretable preference-based reinforcement learning with tree-structured reward functions. arXiv preprint arXiv:2112.11230, 2021.
  21. B-pref: Benchmarking preference-based reinforcement learning. arXiv preprint arXiv:2111.03026, 2021.
  22. A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, 297:103500, 2021.
  23. A survey of inverse reinforcement learning. Artificial Intelligence Review, 55(6):4307–4346, 2022.
  24. The communicative multiagent team decision problem: Analyzing teamwork theories and models. Journal of artificial intelligence research, 16:389–423, 2002.
  25. Multi-agent inverse reinforcement learning. In 2010 ninth international conference on machine learning and applications, pages 395–400. IEEE, 2010.
  26. Scaling expectation-maximization for inverse reinforcement learning to multiple robots under occlusion. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pages 522–529, 2017.
  27. Multi-robot inverse reinforcement learning under occlusion with interactions. In Proceedings of the 2014 international conference on Autonomous agents and multi-agent systems, pages 173–180, 2014.
  28. Multi-agent inverse reinforcement learning for certain general-sum stochastic games. Journal of Artificial Intelligence Research, 66:473–502, 2019.
  29. Multiagent inverse reinforcement learning for two-person zero-sum games. IEEE Transactions on Games, 10(1):56–68, 2017.
  30. Deep reinforcement learning for multiagent systems: A review of challenges, solutions, and applications. IEEE transactions on cybernetics, 50(9):3826–3839, 2020.
  31. Learning correlated communication topology in multi-agent reinforcement learning. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 456–464, 2021.
  32. Gcs: graph-based coordination strategy for multi-agent reinforcement learning. arXiv preprint arXiv:2201.06257, 2022.
  33. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  34. Cooperative multi-agent control using deep reinforcement learning. In Autonomous Agents and Multiagent Systems: AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers 16, pages 66–83. Springer, 2017.
  35. Learning existing social conventions via observationally augmented self-play. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 107–114, 2019.
  36. Multi-agent reinforcement learning in sequential social dilemmas. arXiv preprint arXiv:1702.03037, 2017.
  37. Inequity aversion improves cooperation in intertemporal social dilemmas. Advances in neural information processing systems, 31, 2018.
  38. Collaborating with humans without human data. Advances in Neural Information Processing Systems, 34:14502–14515, 2021.
  39. Learning robust helpful behaviors in two-player cooperative atari environments. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pages 1686–1688, 2021.
  40. Learned human-agent decision-making, communication and joint action in a virtual reality environment. arXiv preprint arXiv:1905.02691, 2019.
  41. Cooperating with machines. Nature communications, 9(1):233, 2018.
  42. Multi-view decision processes: the helper-ai problem. Advances in neural information processing systems, 30, 2017.
  43. Learning to collaborate in markov decision processes. In International Conference on Machine Learning, pages 5261–5270. PMLR, 2019.
  44. Game-theoretic modeling of human adaptation in human-robot collaboration. CoRR, abs/1701.07790, 2017.
  45. Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
  46. Human–robot interaction: a survey. Foundations and Trends® in Human–Computer Interaction, 1(3):203–275, 2008.
  47. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345, 1952.
  48. Collaborating with humans without human data. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 14502–14515. Curran Associates, Inc., 2021.
  49. Learning zero-shot cooperation with humans, assuming humans are biased. arXiv preprint arXiv:2302.01605, 2023.
  50. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34:12208–12221, 2021.
  51. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
  52. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  53. Intrinsically motivated reinforcement learning. Advances in neural information processing systems, 17, 2004.
  54. Andrew G Barto. Intrinsic motivation and reinforcement learning. Intrinsically motivated learning in natural and artificial systems, pages 17–47, 2013.
  55. On the expressivity of markov reward. Advances in Neural Information Processing Systems, 34:7799–7812, 2021.
  56. Jürgen Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE transactions on autonomous mental development, 2(3):230–247, 2010.
  57. Openai gym, 2016.
  58. Dynamic potential-based reward shaping. In Proceedings of the 11th international conference on autonomous agents and multiagent systems, pages 433–440. IFAAMAS, 2012.
  59. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pages 278–287. Citeseer, 1999.
  60. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Siddhant Bhambri (16 papers)
  2. Mudit Verma (25 papers)
  3. Anil Murthy (2 papers)
  4. Subbarao Kambhampati (126 papers)
  5. Upasana Biswas (4 papers)