Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safe Deep Policy Adaptation (2310.08602v3)

Published 8 Oct 2023 in cs.RO, cs.AI, and cs.LG

Abstract: A critical goal of autonomy and artificial intelligence is enabling autonomous robots to rapidly adapt in dynamic and uncertain environments. Classic adaptive control and safe control provide stability and safety guarantees but are limited to specific system classes. In contrast, policy adaptation based on reinforcement learning (RL) offers versatility and generalizability but presents safety and robustness challenges. We propose SafeDPA, a novel RL and control framework that simultaneously tackles the problems of policy adaptation and safe reinforcement learning. SafeDPA jointly learns adaptive policy and dynamics models in simulation, predicts environment configurations, and fine-tunes dynamics models with few-shot real-world data. A safety filter based on the Control Barrier Function (CBF) on top of the RL policy is introduced to ensure safety during real-world deployment. We provide theoretical safety guarantees of SafeDPA and show the robustness of SafeDPA against learning errors and extra perturbations. Comprehensive experiments on (1) classic control problems (Inverted Pendulum), (2) simulation benchmarks (Safety Gym), and (3) a real-world agile robotics platform (RC Car) demonstrate great superiority of SafeDPA in both safety and task performance, over state-of-the-art baselines. Particularly, SafeDPA demonstrates notable generalizability, achieving a 300% increase in safety rate compared to the baselines, under unseen disturbances in real-world experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Karl J Åström and Björn Wittenmark “Adaptive control” Courier Corporation, 2013
  2. Jean-Jacques E Slotine and Weiping Li “On the adaptive control of robot manipulators” In The international journal of robotics research 6.3 Sage Publications Sage CA: Thousand Oaks, CA, 1987, pp. 49–59
  3. Eugene Lavretsky and Kevin A Wise “Robust adaptive control” In Robust and adaptive control: With aerospace applications Springer, 2012, pp. 317–353
  4. “Neural-fly enables rapid learning for agile flight in strong winds” In Science Robotics 7.66 American Association for the Advancement of Science, 2022, pp. eabm6597
  5. “DATT: Deep Adaptive Trajectory Tracking for Quadrotor Control” In Conference on Robot Learning (CoRL), 2023
  6. “Performance, precision, and payloads: Adaptive nonlinear mpc for quadrotors” In IEEE Robotics and Automation Letters 7.2 IEEE, 2021, pp. 690–697
  7. “Rma: Rapid motor adaptation for legged robots” In arXiv preprint arXiv:2107.04034, 2021
  8. Patric Parks “Liapunov redesign of model reference adaptive control systems” In IEEE Transactions on Automatic Control 11.3 IEEE, 1966, pp. 362–367
  9. Pradeep K Khosla and Takeo Kanade “Parameter identification of robot dynamics” In 1985 24th IEEE conference on decision and control, 1985, pp. 1754–1760 IEEE
  10. Petros A Ioannou and Jing Sun “Robust adaptive control” PTR Prentice-Hall Upper Saddle River, NJ, 1996
  11. Leslie Pack Kaelbling, Michael L Littman and Andrew W Moore “Reinforcement learning: A survey” In Journal of artificial intelligence research 4, 1996, pp. 237–285
  12. Wenshuai Zhao, Jorge Peña Queralta and Tomi Westerlund “Sim-to-real transfer in deep reinforcement learning for robotics: a survey” In 2020 IEEE symposium series on computational intelligence (SSCI), 2020, pp. 737–744 IEEE
  13. “Learning quadrupedal locomotion over challenging terrain” In Science robotics 5.47 American Association for the Advancement of Science, 2020, pp. eabc5986
  14. “Sim-to-real: Learning agile locomotion for quadruped robots” In arXiv preprint arXiv:1804.10332, 2018
  15. “Learning agile and dynamic motor skills for legged robots” In Science Robotics 4.26 American Association for the Advancement of Science, 2019, pp. eaau5872
  16. “Learning agile robotic locomotion skills by imitating animals” In arXiv preprint arXiv:2004.00784, 2020
  17. Qingkai Liang, Fanyu Que and Eytan Modiano “Accelerated primal-dual policy optimization for safe reinforcement learning” In arXiv preprint arXiv:1802.06480, 2018
  18. Chen Tessler, Daniel J Mankowitz and Shie Mannor “Reward constrained policy optimization” In arXiv preprint arXiv:1805.11074, 2018
  19. “Constrained policy optimization” In International conference on machine learning, 2017, pp. 22–31 PMLR
  20. Weiye Zhao, Tairan He and Changliu Liu “Model-free safe control for zero-violation reinforcement learning” In 5th Annual Conference on Robot Learning, 2021
  21. “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks” In Proceedings of the AAAI conference on artificial intelligence 33.01, 2019, pp. 3387–3395
  22. “State-wise safe reinforcement learning: A survey” In arXiv preprint arXiv:2302.03122, 2023
  23. “Safe exploration in continuous action spaces” In arXiv preprint arXiv:1801.08757, 2018
  24. Oussama Khatib “Real-time obstacle avoidance for manipulators and mobile robots” In Autonomous robot vehicles Springer, 1986, pp. 396–404
  25. Aaron D Ames, Jessy W Grizzle and Paulo Tabuada “Control barrier function based quadratic programs with application to adaptive cruise control” In 53rd IEEE Conference on Decision and Control, 2014, pp. 6271–6278 IEEE
  26. “Control in a safe set: Addressing safety in human-robot interactions” In ASME 2014 Dynamic Systems and Control Conference, 2014 American Society of Mechanical Engineers Digital Collection
  27. Luis Gracia, Fabricio Garelli and Antonio Sala “Reactive sliding-mode algorithm for collision avoidance in robotic systems” In IEEE Transactions on Control Systems Technology 21.6 IEEE, 2013, pp. 2391–2399
  28. Jean-Jacques E Slotine and Weiping Li “Applied nonlinear control” Prentice hall Englewood Cliffs, NJ, 1991
  29. “Mesa: Offline meta-rl for safe adaptation and fault tolerance” In arXiv preprint arXiv:2112.03575, 2021
  30. “Learning to adapt in dynamic, real-world environments through meta-reinforcement learning” In arXiv preprint arXiv:1803.11347, 2018
  31. “Reinforcement learning with automated auxiliary loss search” In Advances in Neural Information Processing Systems 35, 2022, pp. 1820–1834
  32. Martin Enqvist “Linear models of nonlinear systems” Linköping, 2005
  33. “Robust regression for safe exploration in control” In Learning for Dynamics and Control, 2020, pp. 608–619 PMLR
  34. “Proximal policy optimization algorithms” In arXiv preprint arXiv:1707.06347, 2017
  35. “Trust region policy optimization” In International conference on machine learning, 2015, pp. 1889–1897 PMLR
  36. “Preparing for the unknown: Learning a universal policy with online system identification” In arXiv preprint arXiv:1702.02453, 2017
  37. “Discrete control barrier functions for safety-critical control of discrete systems with application to bipedal robot navigation.” In Robotics: Science and Systems 13, 2017, pp. 1–10 Cambridge, MA, USA
  38. “Control barrier function based quadratic programs for safety critical systems” In IEEE Transactions on Automatic Control 62.8 IEEE, 2016, pp. 3861–3876
  39. “Safe model-based reinforcement learning with stability guarantees” In Advances in neural information processing systems 30, 2017
  40. “Control barrier functions: Theory and applications” In 2019 18th European control conference (ECC), 2019, pp. 3420–3431 IEEE
  41. Alex Ray, Joshua Achiam and Dario Amodei “Benchmarking Safe Exploration in Deep Reinforcement Learning”, 2019
  42. “Meta-adaptive nonlinear control: Theory and algorithms” In Advances in Neural Information Processing Systems 34, 2021, pp. 10013–10025
  43. Emanuel Todorov, Tom Erez and Yuval Tassa “Mujoco: A physics engine for model-based control” In 2012 IEEE/RSJ international conference on intelligent robots and systems, 2012, pp. 5026–5033 IEEE
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Wenli Xiao (14 papers)
  2. Tairan He (22 papers)
  3. John Dolan (14 papers)
  4. Guanya Shi (54 papers)
Citations (8)
Youtube Logo Streamline Icon: https://streamlinehq.com