Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation (2405.01677v2)
Abstract: Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications. Nevertheless, managing the trade-off between reward and safety during exploration presents a significant challenge. Improving reward performance through policy adjustments may adversely affect safety performance. In this study, we aim to address this conflicting relation by leveraging the theory of gradient manipulation. Initially, we analyze the conflict between reward and safety gradients. Subsequently, we tackle the balance between reward and safety optimization by proposing a soft switching policy optimization method, for which we provide convergence analysis. Based on our theoretical examination, we provide a safe RL framework to overcome the aforementioned challenge, and we develop a Safety-MuJoCo Benchmark to assess the performance of safe RL algorithms. Finally, we evaluate the effectiveness of our method on the Safety-MuJoCo Benchmark and a popular safe RL benchmark, Omnisafe. Experimental results demonstrate that our algorithms outperform several state-of-the-art baselines in terms of balancing reward and safety optimization.
- Constrained policy optimization. In International conference on machine learning, 22–31. PMLR.
- On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1): 4431–4506.
- Convex optimization. Cambridge university press.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5: 411–444.
- Balancing Constraints and Rewards with Meta-Gradient D4PG. In ICLR.
- A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31.
- Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031.
- Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Constrained reinforcement learning for vehicle motion planning with topological reachability analysis. Robotics, 11(4): 81.
- A human-centered safe robot reinforcement learning framework with interactive behaviors. Frontiers in Neurorobotics, 17.
- Safe multi-agent reinforcement learning for multi-robot control. Artificial Intelligence, 319: 103905.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330.
- OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research. arXiv preprint arXiv:2305.09304.
- Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6): 4909–4926.
- Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11): 1238–1274.
- Learning-based model predictive control for safe exploration. In 2018 IEEE conference on decision and control (CDC), 6059–6066. IEEE.
- Temporal logic guided safe reinforcement learning using control barrier functions. arXiv preprint arXiv:1903.09885.
- Combining trust region and line search techniques. In Advances in Nonlinear Programming: Proceedings of the 96 International Conference on Nonlinear Programming, 153–175. Springer.
- Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7(1): 2.
- Trust region policy optimization. In International conference on machine learning, 1889–1897. PMLR.
- Mastering the game of Go with deep neural networks and tree search. nature, 529(7587): 484–489.
- Safe exploration for optimization with Gaussian processes. In International conference on machine learning, 997–1005. PMLR.
- Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 5026–5033. IEEE.
- Crpo: A new approach for safe reinforcement learning with convergence guarantee. In International Conference on Machine Learning, 11480–11491. PMLR.
- Constrained update projection approach to safe policy optimization. Advances in Neural Information Processing Systems, 35: 9111–9124.
- Projection-Based Constrained Policy Optimization. In International Conference on Learning Representations.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33: 5824–5836.
- On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond. Advances in Neural Information Processing Systems, 35: 38103–38115.
- Zhou, X. 2018. On the fenchel duality between strong convexity and lipschitz continuous gradient. arXiv preprint arXiv:1803.06573.
- Gradient-adaptive pareto optimization for constrained reinforcement learning. In AAAI, volume 37, 11443–11451.
- Shangding Gu (21 papers)
- Bilgehan Sel (9 papers)
- Yuhao Ding (21 papers)
- Lu Wang (329 papers)
- Qingwei Lin (81 papers)
- Ming Jin (130 papers)
- Alois Knoll (190 papers)