Safe Deep Policy Adaptation (2310.08602v3)
Abstract: A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.
- Karl J Åström and Björn Wittenmark “Adaptive control” Courier Corporation, 2013
- Jean-Jacques E Slotine and Weiping Li “On the adaptive control of robot manipulators” In The international journal of robotics research 6.3 Sage Publications Sage CA: Thousand Oaks, CA, 1987, pp. 49–59
- Eugene Lavretsky and Kevin A Wise “Robust adaptive control” In Robust and adaptive control: With aerospace applications Springer, 2012, pp. 317–353
- “Neural-fly enables rapid learning for agile flight in strong winds” In Science Robotics 7.66 American Association for the Advancement of Science, 2022, pp. eabm6597
- “DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control” In Conference on Robot Learning (CoRL), 2023
- “Performance, precision, and payloads: Adaptive nonlinear mpc for quadrotors” In IEEE Robotics and Automation Letters 7.2 IEEE, 2021, pp. 690–697
- “Rma: Rapid motor adaptation for legged robots” In arXiv preprint arXiv:2107.04034, 2021
- Patric Parks “Liapunov redesign of model reference adaptive control systems” In IEEE Transactions on Automatic Control 11.3 IEEE, 1966, pp. 362–367
- Pradeep K Khosla and Takeo Kanade “Parameter identification of robot dynamics” In 1985 24th IEEE conference on decision and control, 1985, pp. 1754–1760 IEEE
- Petros A Ioannou and Jing Sun “Robust adaptive control” PTR Prentice-Hall Upper Saddle River, NJ, 1996
- Leslie Pack Kaelbling, Michael L Littman and Andrew W Moore “Reinforcement learning: A survey” In Journal of artificial intelligence research 4, 1996, pp. 237–285
- Wenshuai Zhao, Jorge Peña Queralta and Tomi Westerlund “Sim-to-real transfer in deep reinforcement learning for robotics: a survey” In 2020 IEEE symposium series on computational intelligence (SSCI), 2020, pp. 737–744 IEEE
- “Learning quadrupedal locomotion over challenging terrain” In Science robotics 5.47 American Association for the Advancement of Science, 2020, pp. eabc5986
- “Sim-to-real: Learning agile locomotion for quadruped robots” In arXiv preprint arXiv:1804.10332, 2018
- “Learning agile and dynamic motor skills for legged robots” In Science Robotics 4.26 American Association for the Advancement of Science, 2019, pp. eaau5872
- “Learning agile robotic locomotion skills by imitating animals” In arXiv preprint arXiv:2004.00784, 2020
- Qingkai Liang, Fanyu Que and Eytan Modiano “Accelerated primal-dual policy optimization for safe reinforcement learning” In arXiv preprint arXiv:1802.06480, 2018
- Chen Tessler, Daniel J Mankowitz and Shie Mannor “Reward constrained policy optimization” In arXiv preprint arXiv:1805.11074, 2018
- “Constrained policy optimization” In International conference on machine learning, 2017, pp. 22–31 PMLR
- Weiye Zhao, Tairan He and Changliu Liu “Model-free safe control for zero-violation reinforcement learning” In 5th Annual Conference on Robot Learning, 2021
- “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks” In Proceedings of the AAAI conference on artificial intelligence 33.01, 2019, pp. 3387–3395
- “State-wise safe reinforcement learning: A survey” In arXiv preprint arXiv:2302.03122, 2023
- “Safe exploration in continuous action spaces” In arXiv preprint arXiv:1801.08757, 2018
- Oussama Khatib “Real-time obstacle avoidance for manipulators and mobile robots” In Autonomous robot vehicles Springer, 1986, pp. 396–404
- Aaron D Ames, Jessy W Grizzle and Paulo Tabuada “Control barrier function based quadratic programs with application to adaptive cruise control” In 53rd IEEE Conference on Decision and Control, 2014, pp. 6271–6278 IEEE
- “Control in a safe set: Addressing safety in human-robot interactions” In ASME 2014 Dynamic Systems and Control Conference, 2014 American Society of Mechanical Engineers Digital Collection
- Luis Gracia, Fabricio Garelli and Antonio Sala “Reactive sliding-mode algorithm for collision avoidance in robotic systems” In IEEE Transactions on Control Systems Technology 21.6 IEEE, 2013, pp. 2391–2399
- Jean-Jacques E Slotine and Weiping Li “Applied nonlinear control” Prentice hall Englewood Cliffs, NJ, 1991
- “Mesa: Offline meta-rl for safe adaptation and fault tolerance” In arXiv preprint arXiv:2112.03575, 2021
- “Learning to adapt in dynamic, real-world environments through meta-reinforcement learning” In arXiv preprint arXiv:1803.11347, 2018
- “Reinforcement learning with automated auxiliary loss search” In Advances in Neural Information Processing Systems 35, 2022, pp. 1820–1834
- Martin Enqvist “Linear models of nonlinear systems” Linköping, 2005
- “Robust regression for safe exploration in control” In Learning for Dynamics and Control, 2020, pp. 608–619 PMLR
- “Proximal policy optimization algorithms” In arXiv preprint arXiv:1707.06347, 2017
- “Trust region policy optimization” In International conference on machine learning, 2015, pp. 1889–1897 PMLR
- “Preparing for the unknown: Learning a universal policy with online system identification” In arXiv preprint arXiv:1702.02453, 2017
- “Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation.” In Robotics: Science and Systems 13, 2017, pp. 1–10 Cambridge, MA, USA
- “Control barrier function based quadratic programs for safety critical systems” In IEEE Transactions on Automatic Control 62.8 IEEE, 2016, pp. 3861–3876
- “Safe model-based reinforcement learning with stability guarantees” In Advances in neural information processing systems 30, 2017
- “Control barrier functions: Theory and applications” In 2019 18th European control conference (ECC), 2019, pp. 3420–3431 IEEE
- Alex Ray, Joshua Achiam and Dario Amodei “Benchmarking Safe Exploration in Deep Reinforcement Learning”, 2019
- “Meta-adaptive nonlinear control: Theory and algorithms” In Advances in Neural Information Processing Systems 34, 2021, pp. 10013–10025
- Emanuel Todorov, Tom Erez and Yuval Tassa “Mujoco: A physics engine for model-based control” In 2012 IEEE/RSJ international conference on intelligent robots and systems, 2012, pp. 5026–5033 IEEE
- Wenli Xiao (14 papers)
- Tairan He (22 papers)
- John Dolan (14 papers)
- Guanya Shi (54 papers)