Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
81 tokens/sec
Gemini 2.5 Pro Premium
47 tokens/sec
GPT-5 Medium
22 tokens/sec
GPT-5 High Premium
20 tokens/sec
GPT-4o
88 tokens/sec
DeepSeek R1 via Azure Premium
79 tokens/sec
GPT OSS 120B via Groq Premium
459 tokens/sec
Kimi K2 via Groq Premium
192 tokens/sec
2000 character limit reached

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning (2402.02665v1)

Published 5 Feb 2024 in cs.LG

Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Dynamic weights in multi-objective deep reinforcement learning. In International conference on machine learning. PMLR, 11–20.
  2. Optimistic linear support and successor features as a basis for optimal policy transfer. In International Conference on Machine Learning. PMLR, 394–413.
  3. Distributional reinforcement learning. MIT Press.
  4. Multi-objectivization of reinforcement learning problems by reward shaping. In 2014 international joint conference on neural networks (IJCNN). IEEE, 2315–2322.
  5. Distributional Pareto-Optimal Multi-Objective Reinforcement Learning. In Thirty-seventh Conference on Neural Information Processing Systems.
  6. Meta-learning for multi-objective reinforcement learning. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 977–983.
  7. Distributional Method for Risk Averse Reinforcement Learning. arXiv preprint arXiv:2302.14109 (2023).
  8. Daniel Dewey. 2014. Reinforcement learning and the reward engineering principle. In 2014 AAAI Spring Symposium Series.
  9. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 7930 (2022), 47–53.
  10. Reinforcement learning for constrained markov decision processes. In International Conference on Artificial Intelligence and Statistics. PMLR, 2656–2664.
  11. Efficient risk-averse reinforcement learning. Advances in Neural Information Processing Systems 35 (2022), 32639–32652.
  12. Hossein Hajiabolhassan and Ronald Ortner. 2023. Online Regret Bounds for Satisficing in MDPs. In Sixteenth European Workshop on Reinforcement Learning.
  13. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 26.
  14. Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning. Autonomous Agents and Multi-Agent Systems 37, 2 (2023), 26.
  15. Expected scalarised returns dominance: a new solution concept for multi-objective decision making. Neural Computing and Applications (2022), 1–21.
  16. Reward (mis) design for autonomous driving. Artificial Intelligence 316 (2023), 103829.
  17. Stochastically dominant distributional reinforcement learning. In International conference on machine learning. PMLR, 6745–6754.
  18. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, Vol. 99. Citeseer, 278–287.
  19. Distributional reinforcement learning via moment matching. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9144–9152.
  20. Roxana Rădulescu. 2020. A Utility-Based Perspective on Multi-Objective Multi-Agent Decision Making. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: AAMAS. 2207–2208.
  21. Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems 34, 1 (2020), 10.
  22. Actor-critic multi-objective reinforcement learning for non-linear utility functions. Autonomous Agents and Multi-Agent Systems 37, 2 (2023), 23.
  23. Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM, Vol. 2018.
  24. A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48 (2013), 67–113.
  25. Reward Is Enough. Artificial Intelligence (2021), 103535.
  26. Improving robustness via risk averse distributional reinforcement learning. In Learning for Dynamics and Control. PMLR, 958–968.
  27. Using soft maximin for risk averse multi-objective decision-making. Autonomous Agents and Multi-Agent Systems 37, 1 (2023), 11.
  28. Jessica Taylor. 2016. Quantilizers: A Safer Alternative to Maximizers for Limited Optimization.. In AAAI Workshop: AI, Ethics, and Society.
  29. Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology 20 (2018), 27–40.
  30. The impact of environmental stochasticity on value-based multiobjective reinforcement learning. Neural Computing and Applications (2022), 1–17.
  31. Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence 100 (2021), 104186.
  32. Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021). Autonomous Agents and Multi-Agent Systems 36, 2 (2022), 41.
  33. Safety-constrained reinforcement learning with a distributional safety critic. Machine Learning 112, 3 (2023), 859–887.
  34. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems 32 (2019).

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Reddit Logo Streamline Icon: https://streamlinehq.com