Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 88 tok/s
GPT OSS 120B 471 tok/s Pro
Kimi K2 220 tok/s Pro
2000 character limit reached

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning (2407.16807v1)

Published 23 Jul 2024 in cs.LG and cs.AI

Abstract: Multi-objective reinforcement learning (MORL) is essential for addressing the intricacies of real-world RL problems, which often require trade-offs between multiple utility functions. However, MORL is challenging due to unstable learning dynamics with deep learning-based function approximators. The research path most taken has been to explore different value-based loss functions for MORL to overcome this issue. Our work empirically explores model-free policy learning loss functions and the impact of different architectural choices. We introduce two different approaches: Multi-objective Proximal Policy Optimization (MOPPO), which extends PPO to MORL, and Multi-objective Advantage Actor Critic (MOA2C), which acts as a simple baseline in our ablations. Our proposed approach is straightforward to implement, requiring only small modifications at the level of function approximator. We conduct comprehensive evaluations on the MORL Deep Sea Treasure, Minecart, and Reacher environments and show that MOPPO effectively captures the Pareto front. Our extensive ablation studies and empirical analyses reveal the impact of different architectural choices, underscoring the robustness and versatility of MOPPO compared to popular MORL approaches like Pareto Conditioned Networks (PCN) and Envelope Q-learning in terms of MORL metrics, including hypervolume and expected utility.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. A distributional view on multi-objective policy optimization. In International conference on machine learning, pages 11–22. PMLR, 2020.
  2. Dynamic weights in multi-objective deep reinforcement learning. In International conference on machine learning, pages 11–20. PMLR, 2019.
  3. MO-Gym: A library of multi-objective reinforcement learning environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn 2022, 2022.
  4. Sample-efficient multi-objective learning via generalized policy improvement prioritization. arXiv preprint arXiv:2301.07784, 2023.
  5. Torchrl: A data-driven decision-making library for pytorch. arXiv preprint arXiv:2306.00577, 2023.
  6. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
  7. A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023.
  8. Deep whole-body control: learning a unified policy for manipulation and locomotion. In Conference on Robot Learning, pages 138–149. PMLR, 2023.
  9. Multi-task deep reinforcement learning with popart. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 3796–3803, 2019.
  10. Q-pensieve: Boosting sample efficiency of multi-objective rl through memory sharing of q-snapshots. arXiv preprint arXiv:2212.03117, 2022.
  11. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  12. Reward-conditioned policies. arXiv preprint arXiv:1912.13465, 2019.
  13. On the generalization of representations in reinforcement learning. arXiv preprint arXiv:2203.00543, 2022.
  14. S. Levine. Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909, 2018.
  15. S. Liu and L. N. Vicente. Accuracy and fairness trade-offs in machine learning: A stochastic multi-objective approach. Computational Management Science, 19(3):513–537, 2022.
  16. I. Loshchilov and F. Hutter. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  17. Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. In The Eleventh International Conference on Learning Representations, 2022.
  18. Understanding and preventing capacity loss in reinforcement learning. arXiv preprint arXiv:2204.09560, 2022.
  19. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  20. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
  21. V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), pages 807–814, 2010.
  22. Learning the pareto front with hypernetworks. arXiv preprint arXiv:2010.04104, 2020.
  23. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  24. J. Platt and A. Barr. Constrained differential optimization. In Neural Information Processing Systems, 1987.
  25. Pareto conditioned networks. arXiv preprint arXiv:2204.05036, 2022.
  26. Actor-critic multi-objective reinforcement learning for non-linear utility functions. Autonomous Agents and Multi-Agent Systems, 37(2):23, 2023.
  27. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM, volume 2018, 2018.
  28. Progressive neural networks. arXiv preprint arXiv:1606.04671, 2016.
  29. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  30. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
  31. On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts. In AI 2008: Advances in Artificial Intelligence: 21st Australasian Joint Conference on Artificial Intelligence Auckland, New Zealand, December 1-5, 2008. Proceedings 21, pages 372–378. Springer, 2008.
  32. Potential-based multiobjective reinforcement learning approaches to low-impact agents for ai safety. Engineering Applications of Artificial Intelligence, 100:104186, 2021.
  33. Deep reinforcement learning and the deadly triad. arXiv preprint arXiv:1812.02648, 2018.
  34. Prediction-guided multi-objective reinforcement learning for continuous robot control. In International conference on machine learning, pages 10607–10616. PMLR, 2020.
  35. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019.
  36. Quality assessment of morl algorithms: A utility-based approach. In Benelearn 2015: proceedings of the 24th annual machine learning conference of Belgium and the Netherlands, 2015.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run paper prompts using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com