Gradient Shaping for Multi-Constraint Safe Reinforcement Learning (2312.15127v1)
Abstract: Online safe reinforcement learning (RL) involves training a policy that maximizes task efficiency while satisfying constraints via interacting with the environments. In this paper, our focus lies in addressing the complex challenges associated with solving multi-constraint (MC) safe RL problems. We approach the safe RL problem from the perspective of Multi-Objective Optimization (MOO) and propose a unified framework designed for MC safe RL algorithms. This framework highlights the manipulation of gradients derived from constraints. Leveraging insights from this framework and recognizing the significance of \textit{redundant} and \textit{conflicting} constraint conditions, we introduce the Gradient Shaping (GradS) method for general Lagrangian-based safe RL algorithms to improve the training efficiency in terms of both reward and constraint satisfaction. Our extensive experimentation demonstrates the effectiveness of our proposed method in encouraging exploration and learning a policy that improves both safety and reward performance across various challenging MC safe RL tasks as well as good scalability to the number of constraints.
- Constrained policy optimization. In International Conference on Machine Learning, pages 22–31. PMLR, 2017.
- Eitan Altman. Constrained markov decision processes with total cost criteria: Lagrangian approach and dual linear program. Mathematical methods of operations research, 48(3):387–417, 1998.
- Constrained policy optimization via bayesian world models. arXiv preprint arXiv:2201.09802, 2022.
- Conservative safety critics for exploration. arXiv preprint arXiv:2010.14497, 2020.
- Shalabh Bhatnagar and K Lakshmanan. An online actor–critic algorithm with function approximation for constrained markov decision processes. Journal of Optimization Theory and Applications, 153(3):688–708, 2012.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 2021.
- Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5:411–444, 2022.
- Rich Caruana. Multitask learning. Machine learning, 28:41–75, 1997.
- Safe and efficient reinforcement learning using disturbance-observer-based control barrier functions. In Learning for Dynamics and Control Conference, pages 104–115. PMLR, 2023.
- Risk-constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1):6070–6120, 2017.
- Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031, 2019.
- A safe reinforcement learning approach for multi-energy management of smart home. Electric Power Systems Research, 210:108120, 2022.
- Provably efficient primal-dual reinforcement learning for cmdps with non-stationary objectives and constraints. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 7396–7404, 2023.
- Adapting auxiliary losses using gradient similarity. arXiv preprint arXiv:1812.02224, 2018.
- Dense reinforcement learning for safety validation of autonomous vehicles. Nature, 615(7953):620–627, 2023.
- Mitigating gradient bias in multi-objective learning: A provably convergent approach. In The Eleventh International Conference on Learning Representations, 2022.
- Saac: Safe reinforcement learning as an adversarial game of actor-critics. arXiv preprint arXiv:2204.09424, 2022.
- A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015.
- Sven Gronauer. Bullet-safety-gym: Aframework for constrained reinforcement learning. 2022.
- Multi-agent constrained policy optimisation. arXiv preprint arXiv:2110.02793, 2021.
- A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
- Isaacs: Iterative soft adversarial actor-critic for safety. In Learning for Dynamics and Control Conference, pages 90–103. PMLR, 2023a.
- Sim-to-lab-to-real: Safe reinforcement learning with shielding and generalization guarantees. Artificial Intelligence, 314:103811, 2023b.
- Learning-aware safety for interactive autonomy. arXiv preprint arXiv:2309.01267, 2023.
- A constrained multi-objective reinforcement learning framework. In Conference on Robot Learning, pages 883–893. PMLR, 2022.
- Safe reinforcement learning on autonomous vehicles. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1–6. IEEE, 2018.
- Safety-gymnasium. GitHub repository, 2023.
- A cmdp-within-online framework for meta-safe reinforcement learning. In The Eleventh International Conference on Learning Representations, 2022.
- Efficient off-policy safe reinforcement learning using trust region conditional value at risk. IEEE Robotics and Automation Letters, 7(3):7644–7651, 2022.
- Trust region-based safe distributional reinforcement learning for multiple constraints. In Thirty-seventh Conference on Neural Information Processing Systems, 2023a.
- Not only rewards but also constraints: Applications on legged robot locomotion. arXiv preprint arXiv:2308.12517, 2023b.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
- Guided online distillation: Promoting safe reinforcement learning by offline demonstration. arXiv preprint arXiv:2309.09408, 2023.
- Shengbo Eben Li. Deep reinforcement learning. In Reinforcement Learning for Sequential Decision and Optimal Control, pages 365–402. Springer, 2023.
- Accelerated primal-dual policy optimization for safe reinforcement learning. arXiv preprint arXiv:1802.06480, 2018.
- Safety-aware causal representation for trustworthy reinforcement learning in autonomous driving. 2023.
- Constrained variational policy optimization for safe reinforcement learning. In International Conference on Machine Learning, pages 13644–13668. PMLR, 2022.
- Datasets and benchmarks for offline safe reinforcement learning. arXiv preprint arXiv:2306.09303, 2023a.
- Constrained decision transformer for offline safe reinforcement learning. arXiv preprint arXiv:2302.07351, 2023b.
- Joint synthesis of safety certificate and safe control policy using constrained reinforcement learning. In Learning for Dynamics and Control Conference, pages 97–109. PMLR, 2022.
- Learning safe multi-agent control with decentralized neural barrier certificates. arXiv preprint arXiv:2101.05436, 2021.
- Benchmarking safe exploration in deep reinforcement learning. arXiv preprint arXiv:1910.01708, 7, 2019.
- A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48:67–113, 2013.
- Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
- Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning, pages 9767–9779. PMLR, 2021.
- Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pages 9133–9143. PMLR, 2020.
- Reward constrained policy optimization. arXiv preprint arXiv:1805.11074, 2018.
- Recovery rl: Safe reinforcement learning with learned recovery zones. IEEE Robotics and Automation Letters, 6(3):4915–4922, 2021.
- Safe reinforcement learning in constrained markov decision processes. In International Conference on Machine Learning, pages 9797–9806. PMLR, 2020.
- Safe policy optimization with local generalized linear function approximations. Advances in Neural Information Processing Systems, 34:20759–20771, 2021.
- Multi-task reinforcement learning: a hierarchical bayesian approach. In Proceedings of the 24th international conference on Machine learning, pages 1015–1022, 2007.
- Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability. arXiv preprint arXiv:2209.08025, 2022.
- Crpo: A new approach for safe reinforcement learning with convergence guarantee. In International Conference on Machine Learning, pages 11480–11491. PMLR, 2021.
- Projection-based constrained policy optimization. arXiv preprint arXiv:2010.03152, 2020.
- Safe reinforcement learning for legged locomotion. arXiv preprint arXiv:2203.02638, 2022.
- Trace norm regularised deep multi-task learning. arXiv preprint arXiv:1606.04038, 2016.
- Constraint-conditioned policy optimization for versatile safe reinforcement learning. arXiv preprint arXiv:2310.03718, 2023.
- Convergent policy optimization for safe reinforcement learning. arXiv preprint arXiv:1910.12156, 2019.
- Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
- Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3712–3722, 2018.
- Exact verification of relu neural control barrier functions. arXiv preprint arXiv:2310.09360, 2023.
- Random hypervolume scalarizations for provable multi-objective black box optimization. In International Conference on Machine Learning, pages 11096–11105. PMLR, 2020.
- First order constrained optimization in policy space. Advances in Neural Information Processing Systems, 2020.
- Model-free safe control for zero-violation reinforcement learning. In 5th Annual Conference on Robot Learning, 2021.
- Probabilistic safeguard for reinforcement learning using safety index guided gaussian process models. In Learning for Dynamics and Control Conference, pages 783–796. PMLR, 2023.
- Safe reinforcement learning of control-affine systems with vertex networks. In Learning for Dynamics and Control, pages 336–347. PMLR, 2021.