Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UCB-driven Utility Function Search for Multi-objective Reinforcement Learning (2405.00410v2)

Published 1 May 2024 in cs.LG

Abstract: In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method is shown to outperform various MORL baselines on Mujoco benchmark problems across different random seeds. The code is online at: https://github.com/SYCAMORE-1/ucb-MOPPO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Dynamic weights in multi-objective deep reinforcement learning. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 of Proceedings of Machine Learning Research, pages 11–20. PMLR, 2019. URL http://proceedings.mlr.press/v97/abels19a.html.
  2. Mo-gym: A library of multi-objective reinforcement learning environments. In Proceedings of the 34th Benelux Conference on Artificial Intelligence BNAIC/Benelearn, 2022.
  3. Sample-efficient multi-objective learning via generalized policy improvement prioritization. In N. Agmon, B. An, A. Ricci, and W. Yeoh, editors, Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2023, London, United Kingdom, 29 May 2023 - 2 June 2023, pages 2003–2012. ACM, 2023. 10.5555/3545946.3598872. URL https://dl.acm.org/doi/10.5555/3545946.3598872.
  4. Evolutionary Reinforcement Learning: A Survey. Intelligent Computing, 2:0025, 2023.
  5. L. Breiman. Bagging predictors. Mach. Learn., 24(2):123–140, 1996. 10.1007/BF00058655. URL https://doi.org/10.1007/BF00058655.
  6. Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Applied Intelligence, 50(10):3301–3317, 2020.
  7. Meta-learning for multi-objective reinforcement learning. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 977–983, 2019. 10.1109/IROS40897.2019.8968092.
  8. Introduction to Evolutionary Computing, Second Edition. Natural Computing Series. Springer, 2015. ISBN 978-3-662-44873-1. 10.1007/978-3-662-44874-8. URL https://doi.org/10.1007/978-3-662-44874-8.
  9. Indicator-based multi-objective evolutionary algorithms: A comprehensive survey. ACM Comput. Surv., 53(2):29:1–29:35, 2021. 10.1145/3376916. URL https://doi.org/10.1145/3376916.
  10. E. A. Feinberg and A. Shwartz. Constrained markov decision models with weighted discounted rewards. Math. Oper. Res., 20(2):302–320, 1995. 10.1287/MOOR.20.2.302. URL https://doi.org/10.1287/moor.20.2.302.
  11. A toolkit for reliable benchmarking and research in multi-objective reinforcement learning. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023, 2023. URL http://papers.nips.cc/paper_files/paper/2023/hash/4aa8891583f07ae200ba07843954caeb-Abstract-Datasets_and_Benchmarks.html.
  12. Multi-objective reinforcement learning based on decomposition: A taxonomy and framework. J. Artif. Intell. Res., 79:679–723, 2024. 10.1613/JAIR.1.15702. URL https://doi.org/10.1613/jair.1.15702.
  13. A survey of uncertainty in deep neural networks. Artif. Intell. Rev., 56(S1):1513–1589, 2023. 10.1007/S10462-023-10562-9. URL https://doi.org/10.1007/s10462-023-10562-9.
  14. A practical guide to multi-objective reinforcement learning and planning. Auton. Agents Multi Agent Syst., 36(1):26, 2022.
  15. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  16. Reward-conditioned policies. CoRR, abs/1912.13465, 2019. URL http://arxiv.org/abs/1912.13465.
  17. Deep reinforcement learning for multiobjective optimization. IEEE Trans. Cybern., 51(6):3103–3114, 2021. 10.1109/TCYB.2020.2977661. URL https://doi.org/10.1109/TCYB.2020.2977661.
  18. F. Liu and C. Qian. Prediction guided meta-learning for multi-objective reinforcement learning. In IEEE Congress on Evolutionary Computation, CEC 2021, Kraków, Poland, June 28 - July 1, 2021, pages 2171–2178. IEEE, 2021. 10.1109/CEC45853.2021.9504972. URL https://doi.org/10.1109/CEC45853.2021.9504972.
  19. Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=TjEzIsyEsQ6.
  20. A. Mazumdar and V. Kyrki. Hybrid Surrogate Assisted Evolutionary Multiobjective Reinforcement Learning for Continuous Robot Control. In International Conference on the Applications of Evolutionary Computation (Part of EvoStar), pages 61–75. Springer, 2024.
  21. Multi-objective deep reinforcement learning, 2016.
  22. S. Natarajan and P. Tadepalli. Dynamic preferences in multi-criteria reinforcement learning. In L. D. Raedt and S. Wrobel, editors, ICML, Bonn, Germany, August 7-11, 2005, volume 119 of ACM International Conference Proceeding Series, pages 601–608, 2005. 10.1145/1102351.1102427. URL https://doi.org/10.1145/1102351.1102427.
  23. Additional planning with multiple objectives for reinforcement learning. Knowl. Based Syst., 193:105392, 2020. 10.1016/J.KNOSYS.2019.105392. URL https://doi.org/10.1016/j.knosys.2019.105392.
  24. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021.
  25. Actor-critic multi-objective reinforcement learning for non-linear utility functions. Auton. Agents Multi Agent Syst., 37(2):23, 2023. 10.1007/S10458-023-09604-X. URL https://doi.org/10.1007/s10458-023-09604-x.
  26. Point-based planning for multi-objective pomdps. In Q. Yang and M. J. Wooldridge, editors, Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, July 25-31, 2015, pages 1666–1672. AAAI Press, 2015. URL http://ijcai.org/Abstract/15/238.
  27. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  28. Generating Behavior-Diverse Game AIs with Evolutionary Multi-objective Deep Reinforcement Learning. In IJCAI, pages 3371–3377, 2020.
  29. Learning fair policies in multi-objective (deep) reinforcement learning with average and discounted rewards. In Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8905–8915. PMLR, 2020. URL http://proceedings.mlr.press/v119/siddique20a.html.
  30. Gaussian process optimization in the bandit setting: No regret and experimental design. In J. Fürnkranz and T. Joachims, editors, Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pages 1015–1022. Omnipress, 2010. URL https://icml.cc/Conferences/2010/papers/422.pdf.
  31. Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control. In H. D. III and A. Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 10607–10616. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/xu20h.html.
  32. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. In H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, and R. Garnett, editors, Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pages 14610–14621, 2019. URL https://proceedings.neurips.cc/paper/2019/hash/4a46fbfca3f1465a27b210f4bdfe6ab3-Abstract.html.
  33. Q. Zhang and H. Li. Moea/d: A multiobjective evolutionary algorithm based on decomposition. IEEE Transactions on Evolutionary Computation, 11(6):712–731, 2007. 10.1109/TEVC.2007.892759.
  34. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320, 2003.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com