Inverse Constraint Learning and Generalization by Transferable Reward Decomposition (2306.12357v2)
Abstract: We present the problem of inverse constraint learning (ICL), which recovers constraints from demonstrations to autonomously reproduce constrained skills in new scenarios. However, ICL suffers from an ill-posed nature, leading to inaccurate inference of constraints from demonstrations. To figure it out, we introduce a transferable constraint learning (TCL) algorithm that jointly infers a task-oriented reward and a task-agnostic constraint, enabling the generalization of learned skills. Our method TCL additively decomposes the overall reward into a task reward and its residual as soft constraints, maximizing policy divergence between task- and constraint-oriented policies to obtain a transferable constraint. Evaluating our method and five baselines in three simulated environments, we show TCL outperforms state-of-the-art IRL and ICL algorithms, achieving up to a $72\%$ higher task-success rates with accurate decomposition compared to the next best approach in novel scenarios. Further, we demonstrate the robustness of TCL on two real-world robotic tasks.
- C. Tessler, D. J. Mankowitz, and S. Mannor, “Reward constrained policy optimization,” in Proc. Int’l. Conf. on Learning Representations, 2019.
- L. Armesto, J. Bosga, V. Ivan, and S. Vijayakumar, “Efficient learning of constraints and generic null space policies,” in Proc. Int’l Conf. on Robotics and Automation, 2017, pp. 1520–1526.
- D. R. Scobee and S. S. Sastry, “Maximum likelihood constraint inference for inverse reinforcement learning,” in Proc. Int’l. Conf. on Learning Representations, 2020.
- G. Chou, D. Berenson, and N. Ozay, “Learning constraints from demonstrations with grid and parametric representations,” Int’l J. of Robotics Research, vol. 40, pp. 1255 – 1283, 2018.
- C. Pérez-D’Arpino and J. A. Shah, “C-learn: Learning geometric constraints from demonstrations for multi-step manipulation in shared autonomy,” in Proc. Int’l Conf. on Robotics and Automation, 2017, pp. 4058–4065.
- D. Park, M. Noseworthy, R. Paul, S. Roy, and N. Roy, “Inferring task goals and constraints using bayesian nonparametric inverse reinforcement learning,” in Proc. Conf. on robot learning, vol. 100, 2020, pp. 1005–1014.
- D. L. McPherson, K. C. Stocking, and S. S. Sastry, “Maximum likelihood constraint inference from stochastic demonstrations,” in Proc. Conf. on Control Technology and Applications, 2021, pp. 1208–1213.
- S. Malik, U. Anwar, A. Aghasi, and A. Ahmed, “Inverse constrained reinforcement learning,” in Proc. Int’l Conf. on Machine Learning, vol. 139, 2021, pp. 7390–7399.
- A. Gaurav, K. Rezaee, G. Liu, and P. Poupart, “Learning soft constraints from constrained expert demonstrations,” in Proc. Int’l. Conf. on Learning Representations, 2023.
- G. Liu, Y. Luo, A. Gaurav, K. Rezaee, and P. Poupart, “Benchmarking constraint inference in inverse reinforcement learning,” in Proc. Int’l. Conf. on Learning Representations, 2023.
- T. Haarnoja, H. Tang, P. Abbeel, and S. Levine, “Reinforcement learning with deep energy-based policies,” in Proc. Int’l Conf. on Machine Learning, vol. 70, 2017, pp. 1352–1361.
- N. Somani, M. Rickert, A. Gaschler, C. Cai, A. Perzylo, and A. Knoll, “Task level robot programming using prioritized non-linear inequality constraints,” in Proc. RSJ Int’l Conf. on Intelligent Robots and Systems, 2016, pp. 430–437.
- A. H. Qureshi, J. Dong, A. Choe, and M. C. Yip, “Neural manipulation planning on constraint manifolds,” IEEE Robotics and Automation Letters, vol. 5, pp. 6089–6096, 2020.
- J. Garcıa and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, pp. 1437–1480, 2015.
- Z. Qin, Y. Chen, and C. Fan, “Density constrained reinforcement learning,” in Proc. Int’l Conf. on Machine Learning, vol. 139, 2021, pp. 8682–8692.
- J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proc. Int’l Conf. on Machine Learning, vol. 70, 2017, pp. 22–31.
- Y. Chow, M. Ghavamzadeh, L. Janson, and M. Pavone, “Risk-constrained reinforcement learning with percentile risk criteria,” Journal of Machine Learning Research, vol. 18, pp. 1–51, 2018.
- S. Huang, A. Abdolmaleki, G. Vezzani, P. Brakel, D. J. Mankowitz, M. Neunert, S. Bohez, Y. Tassa, N. Heess, M. Riedmiller, and R. Hadsell, “A constrained multi-objective reinforcement learning framework,” in Proc. Conf. on robot learning, vol. 164, 2022, pp. 883–893.
- J. Lee, C. Paduraru, D. J. Mankowitz, N. Heess, D. Precup, K.-E. Kim, and A. Guez, “Coptidice: Offline constrained reinforcement learning via stationary distribution correction estimation,” in Proc. Int’l. Conf. on Learning Representations, 2022.
- D. Papadimitriou, U. Anwar, and D. S. Brown, “Bayesian inverse constrained reinforcement learning,” in Workshop on Safe and Robust Control of Uncertain Systems (NeurIPS), 2021.
- J. Fischer, C. Eyberg, M. Werling, and M. Lauer, “Sampling-based inverse reinforcement learning algorithms with safety constraints,” in Proc. RSJ Int’l Conf. on Intelligent Robots and Systems, 2021, pp. 791–798.
- A. Schlaginhaufen and M. Kamgarpour, “Identifiability and generalizability in constrained inverse reinforcement learning,” in Proc. Int’l Conf. on Machine Learning, vol. 202, 23–29 Jul 2023, pp. 30 224–30 251.
- G. Subramani, M. Zinn, and M. Gleicher, “Inferring geometric constraints in human demonstrations,” in Proc. Conf. on robot learning, vol. 87, 2018, pp. 223–236.
- C. Willibald and D. Lee, “Multi-level task learning based on intention and constraint inference for autonomous robotic manipulation,” in Proc. RSJ Int’l Conf. on Intelligent Robots and Systems, 2022, pp. 7688–7695.
- M. Hasanbeig, D. Kroening, and A. Abate, “LCRL: Certified policy synthesis via logically-constrained reinforcement learning,” in Proc. Int’l Conf. on Quantitative Evaluation of Systems, 2022, pp. 217–231.
- B. D. Ziebart, J. A. Bagnell, and A. K. Dey, “Modeling interaction via the principle of maximum causal entropy,” in Proc. Int’l Conf. on Machine Learning, 2010, p. 1255–1262.
- M. Bloem and N. Bambos, “Infinite time horizon maximum causal entropy inverse reinforcement learning,” in Proc. Conf. on decision and control, 2014, pp. 4911–4916.
- S. J. Russell and A. Zimdars, “Q-decomposition for reinforcement learning agents,” in Proc. Int’l Conf. on Machine Learning, 2003, p. 656–663.
- E. Catto. Box2d: A 2d physics engine for games. [Online]. Available: http://www.box2d.org
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” in Conf. on Neural Information Processing Systems, vol. 29, 2016.
- J. Fu, K. Luo, and S. Levine, “Learning robust rewards with adverserial inverse reinforcement learning,” in Proc. Int’l. Conf. on Learning Representations, 2018.
- G. Bradski. Open computer vision library (OpenCV). [Online]. Available: http://opencv.org/