Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-objective Reinforcement Learning with Nonlinear Preferences: Provable Approximation for Maximizing Expected Scalarized Return

Published 5 Nov 2023 in cs.LG and cs.AI | (2311.02544v4)

Abstract: We study multi-objective reinforcement learning with nonlinear preferences over trajectories. That is, we maximize the expected value of a nonlinear function over accumulated rewards (expected scalarized return or ESR) in a multi-objective Markov Decision Process (MOMDP). We derive an extended form of Bellman optimality for nonlinear optimization that explicitly considers time and current accumulated reward. Using this formulation, we describe an approximation algorithm for computing an approximately optimal non-stationary policy in pseudopolynomial time for smooth scalarization functions with a constant number of rewards. We prove the approximation analytically and demonstrate the algorithm experimentally, showing that there can be a substantial gap between the optimal policy computed by our algorithm and alternative baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift. J. Mach. Learn. Res., 22(1).
  2. Multi-Objective Reinforcement Learning with Non-Linear Scalarization. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’22, 9–17. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450392136.
  3. Near-Optimal Regret Bounds for Reinforcement Learning. In Proceedings of the 21st International Conference on Neural Information Processing Systems, NIPS’08, 89–96. Red Hook, NY, USA: Curran Associates Inc. ISBN 9781605609492.
  4. Minimax Regret Bounds for Reinforcement Learning. In Precup, D.; and Teh, Y. W., eds., Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, 263–272. PMLR.
  5. Fairness and Welfare Quantification for Regret in Multi-Armed Bandits. Proceedings of the AAAI Conference on Artificial Intelligence, 37(6): 6762–6769.
  6. R-Max - a General Polynomial Time Algorithm for near-Optimal Reinforcement Learning. J. Mach. Learn. Res., 3(null): 213–231.
  7. The unreasonable fairness of maximum Nash welfare. ACM Transactions on Economics and Computation (TEAC), 7(3): 1–32.
  8. Reinforcement Learning with Stepwise Fairness Constraints. In Ruiz, F.; Dy, J.; and van de Meent, J.-W., eds., Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, volume 206 of Proceedings of Machine Learning Research, 10594–10618. PMLR.
  9. Welfare and Fairness in Multi-Objective Reinforcement Learning. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, 1991–1999. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450394321.
  10. A Brief Guide to Multi-Objective Reinforcement Learning and Planning. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’23, 1988–1990. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems. ISBN 9781450394321.
  11. Fairness in reinforcement learning. In International conference on machine learning, 1617–1626. PMLR.
  12. Near-optimal reinforcement learning in polynomial time. Machine learning, 49: 209–232.
  13. Multiobjective reinforcement learning: A comprehensive overview. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 45(3): 385–398.
  14. Hypervolume-based multi-objective reinforcement learning. In International Conference on Evolutionary Multi-Criterion Optimization, 352–366. Springer.
  15. Learning Fair Policies in Multiobjective (Deep) Reinforcement Learning with Average and Discounted Rewards. In Proceedings of the 37th International Conference on Machine Learning, ICML’20. JMLR.org.
  16. Reinforcement learning: An introduction. MIT press.
  17. Theory of games and economic behavior, 2nd rev. Princeton university press.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.