Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safe and Balanced: A Framework for Constrained Multi-Objective Reinforcement Learning (2405.16390v1)

Published 26 May 2024 in cs.AI and cs.LG

Abstract: In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints. To tackle this issue, we propose a primal-based framework that orchestrates policy optimization between multi-objective learning and constraint adherence. Our method employs a novel natural policy gradient manipulation method to optimize multiple RL objectives and overcome conflicting gradients between different tasks, since the simple weighted average gradient direction may not be beneficial for specific tasks' performance due to misaligned gradients of different task objectives. When there is a violation of a hard constraint, our algorithm steps in to rectify the policy to minimize this violation. We establish theoretical convergence and constraint violation guarantees in a tabular setting. Empirically, our proposed method also outperforms prior state-of-the-art methods on challenging safe multi-objective reinforcement learning tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. A distributional view on multi-objective policy optimization. In International Conference on Machine Learning, pages 11–22. PMLR, 2020.
  2. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017.
  3. On the theory of policy gradient methods: Optimality, approximation, and distribution shift. The Journal of Machine Learning Research, 22(1):4431–4506, 2021.
  4. Safe and robust learning control with gaussian processes. In 2015 European Control Conference (ECC), pages 2496–2501. IEEE, 2015.
  5. A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory, pages 1691–1692. PMLR, 2018.
  6. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  7. Reinforcement learning in economics and finance. Computational Economics, pages 1–38, 2021.
  8. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018.
  9. Lyapunov-based safe policy optimization for continuous control. arXiv preprint arXiv:1901.10031, 2019.
  10. Finite sample analyses for td (0) with function approximation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  11. Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  12. Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  13. Constrained reinforcement learning for vehicle motion planning with topological reachability analysis. Robotics, 11(4):81, 2022.
  14. A human-centered safe robot reinforcement learning framework with interactive behaviors. Frontiers in Neurorobotics, 17, 2023.
  15. Safe multi-agent reinforcement learning for multi-robot control. Artificial Intelligence, 319:103905, 2023.
  16. A review of safe reinforcement learning: Methods, theory and applications. arXiv preprint arXiv:2205.10330, 2022.
  17. A constrained multi-objective reinforcement learning framework. In Conference on Robot Learning, pages 883–893. PMLR, 2022.
  18. J John. Vector optimization, theory, application, and extensions, 2004.
  19. Sham M Kakade. A natural policy gradient. Advances in neural information processing systems, 14, 2001.
  20. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  21. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013.
  22. Learning-based model predictive control for safe exploration. In 2018 IEEE conference on decision and control (CDC), pages 6059–6066. IEEE, 2018.
  23. In defense of the unitary scalarization for deep multi-task learning. Advances in Neural Information Processing Systems, 35:12169–12183, 2022.
  24. Temporal logic guided safe reinforcement learning using control barrier functions. arXiv preprint arXiv:1903.09885, 2019.
  25. Conflict-averse gradient descent for multi-task learning. Advances in Neural Information Processing Systems, 34:18878–18890, 2021.
  26. Ipo: Interior-point policy optimization under constraints. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 4940–4947, 2020.
  27. Safe reinforcement learning: A control barrier function optimization approach. International Journal of Robust and Nonlinear Control, 31(6):1923–1940, 2021.
  28. Multi-task learning as a bargaining game. In International Conference on Machine Learning, pages 16428–16446. PMLR, 2022.
  29. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  30. Manifold-based multi-objective policy search with sample reuse. Neurocomputing, 263:3–14, 2017.
  31. Multi-objective reinforcement learning through continuous pareto manifold approximation. Journal of Artificial Intelligence Research, 57:187–227, 2016.
  32. Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
  33. Safe exploration for optimization with gaussian processes. In International conference on machine learning, pages 997–1005. PMLR, 2015.
  34. Policy gradient methods for reinforcement learning with function approximation. Advances in neural information processing systems, 12, 1999.
  35. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
  36. Reward constrained policy optimization. In International Conference on Learning Representations, 2018.
  37. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pages 5026–5033. IEEE, 2012.
  38. Safe exploration in finite markov decision processes with gaussian processes. Advances in Neural Information Processing Systems, 29, 2016.
  39. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine Learning, 84(1-2):51–80, 2011.
  40. Multi-task learning for dense prediction tasks: A survey. IEEE transactions on pattern analysis and machine intelligence, 44(7):3614–3633, 2021.
  41. A survey of multi-task deep reinforcement learning. Electronics, 9(9):1363, 2020.
  42. Safe exploration and optimization of constrained mdps using gaussian processes. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  43. Offline constrained multi-objective reinforcement learning via pessimistic dual value iteration. Advances in Neural Information Processing Systems, 34:25439–25451, 2021.
  44. Crpo: A new approach for safe reinforcement learning with convergence guarantee. In International Conference on Machine Learning, pages 11480–11491. PMLR, 2021.
  45. Foundation models for decision making: Problems, methods, and opportunities. arXiv preprint arXiv:2303.04129, 2023.
  46. Projection-based constrained policy optimization. In International Conference on Learning Representations, 2020.
  47. Gradient surgery for multi-task learning. Advances in Neural Information Processing Systems, 33:5824–5836, 2020.
  48. Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6):3586–3612, 2020.
  49. On the convergence of stochastic multi-objective gradient manipulation and beyond. Advances in Neural Information Processing Systems, 35:38103–38115, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets