Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models (2403.02431v1)
Abstract: It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods.
- J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in International conference on machine learning. PMLR, 2017, pp. 22–31.
- S. Miryoosefi and C. Jin, “A simple reward-free approach to constrained reinforcement learning,” in International Conference on Machine Learning. PMLR, 2022, pp. 15 666–15 698.
- S. Arora and P. Doshi, “A survey of inverse reinforcement learning: Challenges, methods and progress,” Artificial Intelligence, vol. 297, p. 103500, 2021.
- P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in Proceedings of the twenty-first international conference on Machine learning, 2004, p. 1.
- A. Y. Ng, S. Russell, et al., “Algorithms for inverse reinforcement learning.” in Icml, vol. 1, 2000, p. 2.
- M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum entropy deep inverse reinforcement learning,” arXiv preprint arXiv:1507.04888, 2015.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, et al., “Maximum entropy inverse reinforcement learning.” in Aaai, vol. 8. Chicago, IL, USA, 2008, pp. 1433–1438.
- M. Lopes, F. Melo, and L. Montesano, “Active learning for reward estimation in inverse reinforcement learning,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2009, Bled, Slovenia, September 7-11, 2009, Proceedings, Part II 20. Springer Berlin Heidelberg, 2009, pp. 31–46.
- D. Ramachandran and E. Amir, “Bayesian inverse reinforcement learning.” in IJCAI, vol. 7, 2007, pp. 2586–2591.
- D. Brown, W. Goo, P. Nagarajan, and S. Niekum, “Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations,” in International conference on machine learning. PMLR, 2019, pp. 783–792.
- E. Klein, M. Geist, B. Piot, and O. Pietquin, “Inverse reinforcement learning through structured classification,” Advances in neural information processing systems, vol. 25, 2012.
- D. R. Scobee and S. S. Sastry, “Maximum likelihood constraint inference for inverse reinforcement learning,” arXiv preprint arXiv:1909.05477, 2019.
- D. Papadimitriou, U. Anwar, and D. S. Brown, “Bayesian methods for constraint inference in reinforcement learning,” Transactions on Machine Learning Research.
- S. Malik, U. Anwar, A. Aghasi, and A. Ahmed, “Inverse constrained reinforcement learning,” in International Conference on Machine Learning. PMLR, 2021, pp. 7390–7399.
- W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, and L. Song, “Sphereface: Deep hypersphere embedding for face recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 212–220.
- W. Liu, Y. Wen, Z. Yu, and M. Yang, “Large-margin softmax loss for convolutional neural networks,” arXiv preprint arXiv:1612.02295, 2016.
- F. Wang, J. Cheng, W. Liu, and H. Liu, “Additive margin softmax for face verification,” IEEE Signal Processing Letters, vol. 25, no. 7, pp. 926–930, 2018.
- G. Chou, N. Ozay, and D. Berenson, “Learning constraints from locally-optimal demonstrations under cost function uncertainty,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 3682–3690, 2020.
- D. Papadimitriou and J. Li, “Constraint inference in control tasks from expert demonstrations via inverse optimization,” arXiv preprint arXiv:2304.03367, 2023.
- A. Gaurav, K. Rezaee, G. Liu, and P. Poupart, “Learning soft constraints from constrained expert demonstrations,” arXiv preprint arXiv:2206.01311, 2022.
- M. Baert, P. Mazzaglia, S. Leroux, and P. Simoens, “Maximum causal entropy inverse constrained reinforcement learning,” arXiv preprint arXiv:2305.02857, 2023.
- K. Kim, G. Swamy, Z. Liu, D. Zhao, S. Choudhury, and Z. S. Wu, “Learning shared safety constraints from multi-task demonstrations,” arXiv preprint arXiv:2309.00711, 2023.
- D. Lindner, X. Chen, S. Tschiatschek, K. Hofmann, and A. Krause, “Learning safety constraints from demonstrations with unknown rewards,” arXiv preprint arXiv:2305.16147, 2023.
- C. Basich, S. Mahmud, and S. Zilberstein, “Learning constraints on autonomous behavior from proactive feedback,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2023, pp. 3680–3687.
- C. Wirth, R. Akrour, G. Neumann, J. Fürnkranz, et al., “A survey of preference-based reinforcement learning methods,” Journal of Machine Learning Research, vol. 18, no. 136, pp. 1–46, 2017.
- K. Lee, L. Smith, A. Dragan, and P. Abbeel, “B-pref: Benchmarking preference-based reinforcement learning,” in Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
- R. Akrour, M. Schoenauer, and M. Sebag, “Preference-based policy learning,” in Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2011, Athens, Greece, September 5-9, 2011. Proceedings, Part I 11. Springer, 2011, pp. 12–27.
- R. A. Bradley and M. E. Terry, “Rank analysis of incomplete block designs: I. the method of paired comparisons,” Biometrika, vol. 39, no. 3/4, pp. 324–345, 1952.
- P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,” Advances in neural information processing systems, vol. 30, 2017.
- D. Brown, R. Coleman, R. Srinivasan, and S. Niekum, “Safe imitation learning via fast bayesian reward inference from preferences,” in International Conference on Machine Learning. PMLR, 2020, pp. 1165–1177.
- K. Lee, L. Smith, and P. Abbeel, “Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training,” arXiv preprint arXiv:2106.05091, 2021.
- A. Bobu, Y. Liu, R. Shah, D. S. Brown, and A. D. Dragan, “Sirl: Similarity-based implicit representation learning,” in Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, 2023, pp. 565–574.
- R. Rafailov, A. Sharma, E. Mitchell, C. D. Manning, S. Ermon, and C. Finn, “Direct preference optimization: Your language model is secretly a reward model,” Advances in Neural Information Processing Systems, vol. 36, 2024.
- V. Myers, E. Biyik, N. Anari, and D. Sadigh, “Learning multimodal rewards from rankings,” in Conference on Robot Learning. PMLR, 2022, pp. 342–352.
- D. Shin, A. D. Dragan, and D. S. Brown, “Benchmarks and algorithms for offline preference-based reward learning,” arXiv preprint arXiv:2301.01392, 2023.
- Y. Liu, G. Datta, E. Novoseller, and D. S. Brown, “Efficient preference-based reinforcement learning using learned dynamics models,” International Conference on Robotics and Automation (ICRA), 2023.
- N. Wilde, E. Bıyık, D. Sadigh, and S. L. Smith, “Learning reward functions from scale feedback,” arXiv preprint arXiv:2110.00284, 2021.
- A. Richards and J. How, “Mixed-integer programming for control,” in Proceedings of the 2005, American Control Conference, 2005. IEEE, 2005, pp. 2676–2683.
- P. Belotti, P. Bonami, M. Fischetti, A. Lodi, M. Monaci, A. Nogales-Gómez, and D. Salvagnin, “On handling indicator constraints in mixed integer programming,” Computational Optimization and Applications, vol. 65, pp. 545–566, 2016.
- M. Montazery and N. Wilson, “Dominance and optimisation based on scale-invariant maximum margin preference learning.” International Joint Conferences on Artificial Intelligence, 2017.
- S. Teso, A. Passerini, and P. Viappiani, “Constructive preference elicitation by setwise max-margin learning,” arXiv preprint arXiv:1604.06020, 2016.
- X. Yuan, R. Henao, E. Tsalik, R. Langley, and L. Carin, “Non-gaussian discriminative factor models via the max-margin rank-likelihood,” in International Conference on Machine Learning. PMLR, 2015, pp. 1254–1263.
- R. de Lazcano, K. Andreas, J. J. Tai, S. R. Lee, and J. Terry, “Gymnasium robotics,” 2023. [Online]. Available: http://github.com/Farama-Foundation/Gymnasium-Robotics
- T. Haarnoja, A. Zhou, K. Hartikainen, G. Tucker, S. Ha, J. Tan, V. Kumar, H. Zhu, A. Gupta, P. Abbeel, et al., “Soft actor-critic algorithms and applications,” arXiv preprint arXiv:1812.05905, 2018.
- E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- C. Mattson and D. S. Brown, “Leveraging human feedback to evolve and discover novel emergent behaviors in robot swarms,” Genetic and Evolutionary Computation Conference (GECCO), 2023.