Learning Safety Constraints from Demonstrations with Unknown Rewards (2305.16147v2)
Abstract: We propose Convex Constraint Learning for Reinforcement Learning (CoCoRL), a novel approach for inferring shared constraints in a Constrained Markov Decision Process (CMDP) from a set of safe demonstrations with possibly different reward functions. While previous work is limited to demonstrations with known rewards or fully known environment dynamics, CoCoRL can learn constraints from demonstrations with different unknown rewards without knowledge of the environment dynamics. CoCoRL constructs a convex safe set based on demonstrations, which provably guarantees safety even for potentially sub-optimal (but safe) demonstrations. For near-optimal demonstrations, CoCoRL converges to the true safe set with no policy regret. We evaluate CoCoRL in gridworld environments and a driving simulation with multiple constraints. CoCoRL learns constraints that lead to safe driving behavior. Importantly, we can safely transfer the learned constraints to different tasks and environments. In contrast, alternative methods based on Inverse Reinforcement Learning (IRL) often exhibit poor performance and learn unsafe policies.
- P. Abbeel and A. Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proceedings of International Conference on Machine Learning (ICML), 2004.
- E. Altman. Constrained Markov decision processes, volume 7. CRC press, 1999.
- Repeated inverse reinforcement learning. In Advances in Neural Information Processing Systems, 2017.
- Apprenticeship learning about multiple intentions. In Proceedings of International Conference on Machine Learning (ICML), 2011.
- Random polytopes, convex bodies, and approximation. Stochastic Geometry: Lectures given at the CIME Summer School held in Martina Franca, Italy, September 13–18, 2004, pages 77–118, 2007.
- Maximum causal entropy inverse constrained reinforcement learning. arXiv preprint arXiv:2305.02857, 2023.
- The quickhull algorithm for convex hulls. ACM Transactions on Mathematical Software (TOMS), 22(4):469–483, 1996.
- Active learning with safety constraints. In Advances in Neural Information Processing Systems, 2022.
- J. Choi and K.-E. Kim. Nonparametric Bayesian inverse reinforcement learning for multiple reward functions. In Advances in Neural Information Processing Systems, 2012.
- Learning constraints from locally-optimal demonstrations under cost function uncertainty. IEEE Robotics and Automation Letters, 5(2):3682–3690, 2020.
- C. Dimitrakakis and C. A. Rothkopf. Bayesian multitask inverse reinforcement learning. In Recent Advances in Reinforcement Learning: 9th European Workshop, EWRL 2011, Athens, Greece, September 9-11, 2011, Revised Selected Papers 9, pages 273–284. Springer, 2012.
- The convex hull in a new model of computation. In Proceedings of the 13th Canadian Conference on Computational Geometry, University of Waterloo, Ontario, Canada, August 13-15, 2001, pages 93–96, 2001.
- Computability in computational geometry. In New Computational Paradigms: First Conference on Computability in Europe, CiE 2005, Amsterdam, The Netherlands, June 8-12, 2005. Proceedings 1, pages 117–127. Springer, 2005.
- J. Garcıa and F. Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- Learning soft constraints from constrained expert demonstrations. In International Conference on Learning Representations (ICLR), 2023.
- Making human-like trade-offs in constrained environments by learning from demonstrations. arXiv preprint arXiv:2109.11018, 2021.
- Learning convex polytopes with margin. In Advances in Neural Information Processing Systems, 2018.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
- One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review, 29(3):345–374, 2014.
- Reward (mis) design for autonomous driving. Artificial Intelligence, 316:103829, 2023.
- Specification gaming: the flip side of ai ingenuity. DeepMind Blog, 2020. https://www.deepmind.com/blog/specification- gaming-the-flip-side-of-ai-ingenuity.
- Scalable agent alignment via reward modeling: a research direction. arXiv preprint arXiv:1811.07871, 2018.
- E. Leurent. An environment for autonomous driving decision-making. https://github.com/eleurent/highway-env, 2018.
- Approximate robust control of uncertain dynamical systems. In Advances in Neural Information Processing Systems, 2018.
- Interactively learning preference constraints in linear bandits. In Proceedings of International Conference on Machine Learning (ICML), 2022.
- Benchmarking constraint inference in inverse reinforcement learning. In International Conference on Learning Representations (ICLR), 2023.
- Inverse constrained reinforcement learning. In Proceedings of International Conference on Machine Learning (ICML), 2021.
- P. McMullen. The maximum numbers of faces of a convex polytope. Mathematika, 17(2):179–184, 1970.
- Maximum likelihood constraint inference from stochastic demonstrations. In Conference on Control Technology and Applications (CCTA), 2021.
- Algorithms for inverse reinforcement learning. In Proceedings of International Conference on Machine Learning (ICML), 2000.
- Bayesian inverse constrained reinforcement learning. In Transactions on Machine Learning Research (TMLR), 2022.
- W. R. Pulleyblank. Polyhedral combinatorics. Springer, 1983.
- M. L. Puterman. Markov decision processes. Handbooks in operations research and management science, 2:331–434, 1990.
- K. Rezaee and P. Yadmellat. How to not drive: Learning driving constraints from demonstration. In 2022 IEEE Intelligent Vehicles Symposium (IV), 2022.
- The cross-entropy method: a unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning, volume 133. Springer, 2004.
- Maximum likelihood constraint inference for inverse reinforcement learning. In International Conference on Learning Representations (ICLR), 2020.
- J. Sember. Guarantees concerning geometric objects with imprecise points. PhD thesis, University of British Columbia, 2011.
- Discretizing dynamics for maximum likelihood constraint inference. arXiv preprint arXiv:2109.04874, 2021.
- Meta-adversarial inverse reinforcement learning for decision-making tasks. In International Conference on Robotics and Automation (ICRA), 2021.
- M. Wen and U. Topcu. Constrained cross-entropy method for safe reinforcement learning. In Advances in Neural Information Processing Systems, 2018.
- Learning a prior over intent via meta-inverse reinforcement learning. In Proceedings of International Conference on Machine Learning (ICML), 2019.
- Meta-inverse reinforcement learning with probabilistic context variables. In Advances in Neural Information Processing Systems, 2019.
- Maximum entropy inverse reinforcement learning. In AAAI Conference on Artificial Intelligence, 2008.