Learning to Switch Among Agents in a Team via 2-Layer Markov Decision Processes (2002.04258v3)
Abstract: Reinforcement learning agents have been mostly developed and evaluated under the assumption that they will operate in a fully autonomous manner -- they will take all actions. In this work, our goal is to develop algorithms that, by learning to switch control between agents, allow existing reinforcement learning agents to operate under different automation levels. To this end, we first formally define the problem of learning to switch control among agents in a team via a 2-layer Markov decision process. Then, we develop an online learning algorithm that uses upper confidence bounds on the agents' policies and the environment's transition probabilities to find a sequence of switching policies. The total regret of our algorithm with respect to the optimal switching policy is sublinear in the number of learning steps and, whenever multiple teams of agents operate in a similar environment, our algorithm greatly benefits from maintaining shared confidence bounds for the environments' transition probabilities and it enjoys a better regret bound than problem-agnostic algorithms. Simulation experiments in an obstacle avoidance task illustrate our theoretical findings and demonstrate that, by exploiting the specific structure of the problem, our proposed algorithm is superior to problem-agnostic algorithms.
- An analysis framework for ad hoc teamwork tasks. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp. 357–364, 2012.
- P. Bartlett and M. Wegkamp. Classification with a reject option using a hinge loss. JMLR, 2008.
- Behavioural impacts of advanced driver assistance systems–an overview. European Journal of Transport and Infrastructure Research, 1(3), 2001.
- Machine teaching for inverse reinforcement learning: Algorithms and applications. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pp. 7749–7758, 2019.
- Learning with rejection. In ALT, 2016.
- Instant messaging and interruption: Influence of task type on performance. In OZCHI 2000 conference proceedings, volume 356, pp. 361–367, 2000.
- The algorithmic anatomy of model-based evaluation. Philosophical Transactions of the Royal Society B: Biological Sciences, 369(1655):20130478, 2014.
- Regression under human assistance. In AAAI, 2020.
- Classification under human assistance. In AAAI, 2021.
- European Parliament. Regulation (EC) No 561/2006. http://data.europa.eu/eli/reg/2006/561/2015-03-02, 2006.
- R. Everett and S. Roberts. Learning against non-stationary agents with opponent modelling and deep reinforcement learning. In 2018 AAAI Spring Symposium Series, 2018.
- Y. Geifman and R. El-Yaniv. Selectivenet: A deep neural network with an integrated reject option. arXiv preprint arXiv:1901.09192, 2019.
- Bias-reduced uncertainty estimation for deep neural classifiers. In ICLR, 2018.
- Towards deployment of robust cooperative ai agents: An algorithmic framework for learning adaptive policies. In AAMAS, 2020.
- Thompson sampling for learning parameterized markov decision processes. In Conference on Learning Theory, pp. 861–898, 2015.
- Learning policy representations in multiagent systems. In ICML, 2018.
- Cooperative inverse reinforcement learning. In NIPS, 2016.
- Teaching inverse reinforcement learners via features and demonstrations. In NeurIPS, 2018.
- Learning and reasoning about interruption. In Proceedings of the 5th international conference on Multimodal interfaces, pp. 20–27, 2003.
- Understanding and developing models for detecting and differentiating breakpoints during interactive tasks. In Proceedings of the SIGCHI conference on Human factors in computing systems, pp. 697–706, 2007.
- Lazy-mdps: Towards interpretable reinforcement learning by learning when to act. In AAMAS, 2022.
- Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 2010.
- Interrupted by my car? implications of interruption and interleaving research for automated vehicles. International Journal of Human-Computer Studies, 130:221–233, 2019.
- Interactive teaching algorithms for inverse reinforcement learning. In IJCAI, 2019.
- Modality switching for mitigation of sensory adaptation and habituation in personal navigation systems. In 23rd International Conference on Intelligent User Interfaces, pp. 115–127, 2018.
- Deep gamblers: Learning to abstain with portfolio theory. In NeurIPS, 2019.
- C. Macadam. Understanding and modeling the human driver. Vehicle system dynamics, 40(1-3):101–134, 2003.
- Pomcop: Belief space planning for sidekicks in cooperative games. In AIIDE, 2012.
- Human intent prediction using markov decision processes. Journal of Aerospace Information Systems, 12(5):393–397, 2015.
- V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015.
- Adjustable autonomy: a systematic literature review. Artificial Intelligence Review, 51(2):149–186, 2019.
- Consistent estimators for learning to defer to an expert. In ICML, 2020.
- Efficient model learning from joint-action demonstrations for human-robot collaborative tasks. In HRI, 2015.
- Mathematical models of adaptation in human-robot collaboration. arXiv preprint arXiv:1707.02586, 2017.
- Near-optimal reinforcement learning in factored mdps. In Advances in Neural Information Processing Systems, pp. 604–612, 2014.
- (more) efficient reinforcement learning via posterior sampling. In Advances in Neural Information Processing Systems, pp. 3003–3011, 2013.
- Learning to collaborate in markov decision processes. In ICML, 2019.
- The algorithmic automation problem: Prediction, triage, and human effort. arXiv preprint arXiv:1903.12220, 2019a.
- Direct uncertainty prediction for medical second opinions. In ICML, 2019b.
- Consistent algorithms for multiclass classification with an abstain option. Electronic J. of Statistics, 2018.
- Shared autonomy via deep reinforcement learning. arXiv preprint arXiv:1802.01744, 2018.
- Active learning for classification with abstention. IEEE Journal on Selected Areas in Information Theory, 2(2):705–719, 2021.
- D. Silver et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
- D. Silver et al. Mastering the game of go without human knowledge. Nature, 550(7676):354, 2017.
- Ad hoc autonomous agent teams: Collaboration without pre-coordination. In Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010.
- A. Strehl and M. Littman. An analysis of model-based interval estimation for markov decision processes. Journal of Computer and System Sciences, 74(8):1309–1331, 2008.
- Collaborating with humans without human data. In Advances in Neural Information Processing Systems, volume 34, 2021.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Integrating reinforcement learning with human demonstrations of varying ability. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 617–624. International Foundation for Autonomous Agents and Multiagent Systems, 2011.
- Combating label noise in deep learning using abstention. arXiv preprint arXiv:1905.10964, 2019.
- Teaching on a budget: Agents advising agents in reinforcement learning. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pp. 1053–1060, 2013.
- Exploring the relations between categorization and decision making with regard to realistic face stimuli. Pragmatics & Cognition, 8(1):83–105, 2000.
- Learner-aware teaching: Inverse reinforcement learning with preferences and constraints. In NeurIPS, 2019.
- O. Vinyals et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, pp. 1–5, 2019.
- Blending autonomous exploration and apprenticeship learning. In Advances in Neural Information Processing Systems, pp. 2258–2266, 2011.
- Learning to complement humans. In IJCAI, 2020.
- H. Wilson and P. Daugherty. Collaborative intelligence: humans and ai are joining forces. Harvard Business Review, 2018.
- Model primitives for hierarchical lifelong reinforcement learning. Autonomous Agents and Multi-Agent Systems, 34(1):1–38, 2020.
- A deep bayesian policy reuse approach against non-stationary agents. In NeurIPS, 2018.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.