Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Demonstration Guided Multi-Objective Reinforcement Learning (2404.03997v1)

Published 5 Apr 2024 in cs.LG and cs.AI

Abstract: Multi-objective reinforcement learning (MORL) is increasingly relevant due to its resemblance to real-world scenarios requiring trade-offs between multiple objectives. Catering to diverse user preferences, traditional reinforcement learning faces amplified challenges in MORL. To address the difficulty of training policies from scratch in MORL, we introduce demonstration-guided multi-objective reinforcement learning (DG-MORL). This novel approach utilizes prior demonstrations, aligns them with user preferences via corner weight support, and incorporates a self-evolving mechanism to refine suboptimal demonstrations. Our empirical studies demonstrate DG-MORL's superiority over existing MORL algorithms, establishing its robustness and efficacy, particularly under challenging conditions. We also provide an upper bound of the algorithm's sample complexity.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Dynamic weights in multi-objective deep reinforcement learning. In International conference on machine learning, pp.  11–20. PMLR, 2019.
  2. Optimistic linear support and successor features as a basis for optimal policy transfer. In International Conference on Machine Learning, pp.  394–413. PMLR, 2022.
  3. Sample-efficient multi-objective learning via generalized policy improvement prioritization. arXiv preprint arXiv:2301.07784, 2023.
  4. Efficient online reinforcement learning with offline data. arXiv preprint arXiv:2302.02948, 2023.
  5. Successor features for transfer in reinforcement learning. Advances in neural information processing systems, 30, 2017.
  6. Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48):30079–30087, 2020.
  7. Intrinsically motivated learning of hierarchical collections of skills. In Proceedings of the 3rd International Conference on Development and Learning, volume 112, pp.  19. Citeseer, 2004.
  8. Pd-morl: Preference-driven multi-objective reinforcement learning algorithm. 2023.
  9. Universal successor features approximators. arXiv preprint arXiv:1812.07626, 2018.
  10. Exploration-guided reward shaping for reinforcement learning under sparse rewards. Advances in Neural Information Processing Systems, 35:5829–5842, 2022.
  11. Designing multi-objective multi-armed bandits algorithms: A study. In The 2013 international joint conference on neural networks (IJCNN), pp.  1–8. IEEE, 2013.
  12. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995, 2019.
  13. First return, then explore. Nature, 590(7847):580–586, 2021.
  14. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022.
  15. Tunable dynamics in agent-based simulation using multi-objective reinforcement learning. In Adaptive and Learning Agents Workshop (ALA-19) at AAMAS, Montreal, Canada, May 13-14, 2019, pp.  1–7, 2019.
  16. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
  17. Complexity analysis of real-time reinforcement learning. In AAAI, volume 93, pp.  99–105, 1993.
  18. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
  19. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  20. Li, Y. Reinforcement learning in practice: Opportunities and challenges. arXiv preprint arXiv:2202.11296, 2022.
  21. A multi-objective multi-agent deep reinforcement learning approach to residential appliance scheduling. IET Smart Grid, 5(4):260–280, 2022a.
  22. Inferring preferences from demonstrations in multi-objective reinforcement learning: A dynamic weight-based approach. Adaptive Learning Agent Workshop @ AAMAS 2023 London, 2023.
  23. Multi-objective generalized linear bandits. arXiv preprint arXiv:1905.12879, 2019.
  24. Aw-opt: Learning robotic skills with imitation andreinforcement at scale. In Conference on Robot Learning, pp.  1078–1088. PMLR, 2022b.
  25. Learning robotic skills with imitation and reinforcement at scale, December 29 2022c. US Patent App. 17/843,288.
  26. Mannion, P. Knowledge-based multi-objective multi-agent reinforcement learning. PhD thesis, 2017.
  27. The steering approach for multi-criteria reinforcement learning. Advances in Neural Information Processing Systems, 14, 2001.
  28. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  29. Multi-objective deep reinforcement learning. arXiv preprint arXiv:1610.02707, 2016.
  30. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pp.  6292–6299. IEEE, 2018.
  31. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  32. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp.  278–287. Citeseer, 1999.
  33. Self-imitation learning. In International Conference on Machine Learning, pp.  3878–3887. PMLR, 2018.
  34. Multi-objective reinforcement learning with continuous pareto frontier approximation. In Proceedings of the AAAI conference on artificial intelligence, volume 29, 2015.
  35. Survey of model-based reinforcement learning: Applications on robotics. Journal of Intelligent & Robotic Systems, 86(2):153–173, 2017.
  36. Puterman, M. L. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  37. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  38. Roijers, D. M. Multi-objective decision-theoretic planning. AI Matters, 2(4):11–12, 2016.
  39. Linear support for multi-objective coordination graphs. In AAMAS’14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, volume 2, pp.  1297–1304. IFAAMAS/ACM, 2014.
  40. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  41. Managing power consumption and performance of computing systems using reinforcement learning. Advances in neural information processing systems, 20, 2007.
  42. Jump-start reinforcement learning. In International Conference on Machine Learning, pp.  34556–34583. PMLR, 2023.
  43. Empirical evaluation methods for multiobjective reinforcement learning algorithms. Machine learning, 84:51–80, 2011.
  44. Multi-objective reinforcement learning using sets of pareto dominating policies. The Journal of Machine Learning Research, 15(1):3483–3512, 2014.
  45. Scalarized multi-objective reinforcement learning: Novel design techniques. In 2013 IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), pp.  191–199. IEEE, 2013.
  46. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. arXiv preprint arXiv:1707.08817, 2017.
  47. Learning diverse policies with soft self-generated guidance. International Journal of Intelligent Systems, 2023, 2023.
  48. Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
  49. A generalized algorithm for multi-objective reinforcement learning and policy adaptation. Advances in neural information processing systems, 32, 2019.
  50. Quality assessment of morl algorithms: A utility-based approach. In Benelearn 2015: proceedings of the 24th annual machine learning conference of Belgium and the Netherlands, 2015.

Summary

We haven't generated a summary for this paper yet.