Safe CoR: A Dual-Expert Approach to Integrating Imitation Learning and Safe Reinforcement Learning Using Constraint Rewards (2407.02245v1)
Abstract: In the realm of autonomous agents, ensuring safety and reliability in complex and dynamic environments remains a paramount challenge. Safe reinforcement learning addresses these concerns by introducing safety constraints, but still faces challenges in navigating intricate environments such as complex driving situations. To overcome these challenges, we present the safe constraint reward (Safe CoR) framework, a novel method that utilizes two types of expert demonstrations$\unicode{x2013}$reward expert demonstrations focusing on performance optimization and safe expert demonstrations prioritizing safety. By exploiting a constraint reward (CoR), our framework guides the agent to balance performance goals of reward sum with safety constraints. We test the proposed framework in diverse environments, including the safety gym, metadrive, and the real$\unicode{x2013}$world Jackal platform. Our proposed framework enhances the performance of algorithms by $39\%$ and reduces constraint violations by $88\%$ on the real-world Jackal platform, demonstrating the framework's efficacy. Through this innovative approach, we expect significant advancements in real-world performance, leading to transformative effects in the realm of safe and reliable autonomous agents.
- A. Aksjonov and V. Kyrki, “Rule-based decision-making system for autonomous vehicles at intersections with mixed traffic environment,” in Proc. of the International Intelligent Transportation Systems Conference (ITSC), Sep, 2021.
- W. Xiao, N. Mehdipour, A. Collin, A. Y. Bin-Nun, E. Frazzoli, R. D. Tebbens, and C. Belta, “Rule-based optimal control for autonomous driving,” in Proc. of the International Conference on Cyber-Physical Systems (ICCPS), May, 2021.
- K. Chitta, A. Prakash, B. Jaeger, Z. Yu, K. Renz, and A. Geiger, “Transfuser: Imitation with transformer-based sensor fusion for autonomous driving,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
- G. Lee, D. Kim, W. Oh, K. Lee, and S. Oh, “MixGAIL: Autonomous driving using demonstrations with mixed qualities,” in Proc. of the International Conference on Intelligent Robots and Systems (IROS), Oct, 2021.
- G. Lee, W. Oh, J. Oh, S. Shin, D. Kim, J. Jeong, S. Choi, and S. Oh, “Semi-supervised imitation learning with mixed qualities of demonstrations for autonomous driving,” in Proc. of the International Conference on Control, Automation and Systems (ICCAS), Nov, 2022.
- J. Huang, S. Xie, J. Sun, G. Q. Ma, C. Liu, D. Lin, and B. Zhou, “Learning a decision module by imitating driver’s control behaviors,” in Proc. of the Conference on Robot Learning (CoRL), Nov, 2020.
- L. Wang, J. Liu, H. Shao, W. Wang, R. Chen, Y. Liu, and S. L. Waslander, “Efficient reinforcement learning for autonomous driving with parameterized skills and priors,” 2023.
- B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yogamani, and P. Pérez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 6, pp. 4909–4926, 2021.
- J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” in Proc. of the International Conference on Machine Learning (ICML), August, 2017.
- D. Kim and S. Oh, “TRC: trust region conditional value at risk for safe reinforcement learning,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2621–2628, 2022.
- Q. Li, Z. Peng, L. Feng, Q. Zhang, Z. Xue, and B. Zhou, “Metadrive: Composing diverse driving scenarios for generalizable reinforcement learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 45, no. 3, pp. 3461–3475, 2022.
- A. Ray, J. Achiam, and D. Amodei, “Benchmarking safe exploration in deep reinforcement learning,” vol. 7, no. 1, p. 2, 2019.
- D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” Advances in neural information processing systems, vol. 1, 1988.
- T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in Proc. of the International Conference on Robotics and Automation (ICRA), May, 2018.
- S. Ross, G. Gordon, and D. Bagnell, “A reduction of imitation learning and structured prediction to no-regret online learning,” in Proc. of the International Conference on Artificial Intelligence and Statistics (AISTATS), Apr, 2011.
- B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey et al., “Maximum entropy inverse reinforcement learning,” in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), Jul, 2008.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” Advances in neural information processing systems, vol. 29, 2016.
- Y. Li, J. Song, and S. Ermon, “Infogail: Interpretable imitation learning from visual demonstrations,” Advances in neural information processing systems, vol. 30, 2017.
- R. Dadashi, L. Hussenot, M. Geist, and O. Pietquin, “Primal Wasserstein imitation learning,” in Proc. of the International Conference on Learning Representations (ICLR), May, 2021.
- Y. Lu, J. Fu, G. Tucker, X. Pan, E. Bronstein, R. Roelofs, B. Sapp, B. White, A. Faust, S. Whiteson et al., “Imitation is not enough: Robustifying imitation with reinforcement learning for challenging driving scenarios,” in Proc. of the International Conference on Intelligent Robots and Systems (IROS), Oct, 2023.
- Q. Yang, T. D. Simão, S. H. Tindemans, and M. T. Spaan, “WCSAC: Worst-case soft actor critic for safety-constrained reinforcement learning,” in Proc. of the AAAI Conference on Artificial Intelligence (AAAI), May, 2021.
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz, “Trust region policy optimization,” in Proc. of the International Conference on Machine Learning (ICML), July, 2015.
- D. Kim and S. Oh, “Efficient off-policy safe reinforcement learning using trust region conditional value at risk,” IEEE Robotics and Automation Letters, vol. 7, no. 3, pp. 7644–7651, 2022.
- D. Kim, K. Lee, and S. Oh, “Trust region-based safe distributional reinforcement learning for multiple constraints,” Advances in neural information processing systems, vol. 36, 2024.
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proc. of the International Conference on Machine Learning (ICML), Jul, 2018.
- Hyeokjin Kwon (3 papers)
- Gunmin Lee (4 papers)
- Junseo Lee (15 papers)
- Songhwai Oh (36 papers)