2000 character limit reached
Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning (2402.02665v1)
Published 5 Feb 2024 in cs.LG
Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.
- Dynamic weights in multi-objective deep reinforcement learning. In International conference on machine learning. PMLR, 11–20.
- Optimistic linear support and successor features as a basis for optimal policy transfer. In International Conference on Machine Learning. PMLR, 394–413.
- Distributional reinforcement learning. MIT Press.
- Multi-objectivization of reinforcement learning problems by reward shaping. In 2014 international joint conference on neural networks (IJCNN). IEEE, 2315–2322.
- Distributional Pareto-Optimal Multi-Objective Reinforcement Learning. In Thirty-seventh Conference on Neural Information Processing Systems.
- Meta-learning for multi-objective reinforcement learning. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 977–983.
- Distributional Method for Risk Averse Reinforcement Learning. arXiv preprint arXiv:2302.14109 (2023).
- Daniel Dewey. 2014. Reinforcement learning and the reward engineering principle. In 2014 AAAI Spring Symposium Series.
- Discovering faster matrix multiplication algorithms with reinforcement learning. Nature 610, 7930 (2022), 47–53.
- Reinforcement learning for constrained markov decision processes. In International Conference on Artificial Intelligence and Statistics. PMLR, 2656–2664.
- Efficient risk-averse reinforcement learning. Advances in Neural Information Processing Systems 35 (2022), 32639–32652.
- Hossein Hajiabolhassan and Ronald Ortner. 2023. Online Regret Bounds for Satisficing in MDPs. In Sixteenth European Workshop on Reinforcement Learning.
- A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems 36, 1 (2022), 26.
- Monte Carlo tree search algorithms for risk-aware and multi-objective reinforcement learning. Autonomous Agents and Multi-Agent Systems 37, 2 (2023), 26.
- Expected scalarised returns dominance: a new solution concept for multi-objective decision making. Neural Computing and Applications (2022), 1–21.
- Reward (mis) design for autonomous driving. Artificial Intelligence 316 (2023), 103829.
- Stochastically dominant distributional reinforcement learning. In International conference on machine learning. PMLR, 6745–6754.
- Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, Vol. 99. Citeseer, 278–287.
- Distributional reinforcement learning via moment matching. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9144–9152.
- Roxana Rădulescu. 2020. A Utility-Based Perspective on Multi-Objective Multi-Agent Decision Making. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: AAMAS. 2207–2208.
- Multi-objective multi-agent decision making: a utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems 34, 1 (2020), 10.
- Actor-critic multi-objective reinforcement learning for non-linear utility functions. Autonomous Agents and Multi-Agent Systems 37, 2 (2023), 23.
- Multi-objective reinforcement learning for the expected utility of the return. In Proceedings of the Adaptive and Learning Agents workshop at FAIM, Vol. 2018.
- A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research 48 (2013), 67–113.
- Reward Is Enough. Artificial Intelligence (2021), 103535.
- Improving robustness via risk averse distributional reinforcement learning. In Learning for Dynamics and Control. PMLR, 958–968.
- Using soft maximin for risk averse multi-objective decision-making. Autonomous Agents and Multi-Agent Systems 37, 1 (2023), 11.
- Jessica Taylor. 2016. Quantilizers: A Safer Alternative to Maximizers for Limited Optimization.. In AAAI Workshop: AI, Ethics, and Society.
- Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology 20 (2018), 27–40.
- The impact of environmental stochasticity on value-based multiobjective reinforcement learning. Neural Computing and Applications (2022), 1–17.
- Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety. Engineering Applications of Artificial Intelligence 100 (2021), 104186.
- Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021). Autonomous Agents and Multi-Agent Systems 36, 2 (2022), 41.
- Safety-constrained reinforcement learning with a distributional safety critic. Machine Learning 112, 3 (2023), 859–887.
- A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems 32 (2019).