Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes (2401.08850v2)

Published 16 Jan 2024 in cs.LG and cs.AI

Abstract: Discrete-action reinforcement learning algorithms often falter in tasks with high-dimensional discrete action spaces due to the vast number of possible actions. A recent advancement leverages value-decomposition, a concept from multi-agent reinforcement learning, to tackle this challenge. This study delves deep into the effects of this value-decomposition, revealing that whilst it curtails the over-estimation bias inherent to Q-learning algorithms, it amplifies target variance. To counteract this, we present an ensemble of critics to mitigate target variance. Moreover, we introduce a regularisation loss that helps to mitigate the effects that exploratory actions in one dimension can have on the value of optimal actions in other dimensions. Our novel algorithm, REValueD, tested on discretised versions of the DeepMind Control Suite tasks, showcases superior performance, especially in the challenging humanoid and dog tasks. We further dissect the factors influencing REValueD's performance, evaluating the significance of the regularisation loss and the scalability of REValueD with increasing sub-actions per dimension.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020.
  2. Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In International conference on machine learning, pages 176–185. PMLR, 2017.
  3. Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning. arXiv preprint arXiv:2303.14716, 2023.
  4. A distributional perspective on reinforcement learning. In International conference on machine learning, pages 449–458. PMLR, 2017.
  5. Distributional reinforcement learning. MIT Press, 2023.
  6. Richard Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
  7. Measuring the reliability of reinforcement learning algorithms. arXiv preprint arXiv:1912.05663, 2019.
  8. Ucb exploration via q-ensembles. arXiv preprint arXiv:1706.01502, 2017a.
  9. Randomized ensembled double q-learning: Learning fast without a model. arXiv preprint arXiv:2101.05982, 2021.
  10. Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017b.
  11. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI, 1998(746-752):2, 1998.
  12. Carla: An open urban driving simulator. In Conference on robot learning, pages 1–16. PMLR, 2017.
  13. Value function factorization with dynamic weighting for deep multi-agent reinforcement learning. Information Sciences, 615:191–208, 2022.
  14. Deep reinforcement learning in large discrete action spaces. arXiv preprint arXiv:1512.07679, 2015.
  15. Multi-agent deep reinforcement learning: a survey. Artificial Intelligence Review, pages 1–49, 2022.
  16. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3389–3396. IEEE, 2017.
  17. Hado Hasselt. Double q-learning. Advances in neural information processing systems, 23, 2010.
  18. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  19. Distributed prioritized experience replay. arXiv preprint arXiv:1803.00933, 2018.
  20. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. arXiv preprint arXiv:1806.10293, 2018.
  21. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23(6):4909–4926, 2021.
  22. Multi-agent reinforcement learning as a rehearsal for decentralized planning. Neurocomputing, 190:82–94, 2016.
  23. Maxmin q-learning: Controlling the estimation bias of q-learning. arXiv preprint arXiv:2002.06487, 2020.
  24. Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning, pages 6131–6141. PMLR, 2021.
  25. Reducing variance in temporal-difference value estimation via ensemble of deep networks. In International Conference on Machine Learning, pages 13285–13301. PMLR, 2022.
  26. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 64–69. IEEE, 2007.
  27. Discrete sequential prediction of continuous actions for deep rl. arXiv preprint arXiv:1705.05035, 2017.
  28. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  29. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In International Conference on Machine Learning, pages 2681–2690. PMLR, 2017.
  30. Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016.
  31. Lenient multi-agent deep reinforcement learning. arXiv preprint arXiv:1707.04402, 2017.
  32. Lenient learners in cooperative multiagent systems. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pages 801–803, 2006.
  33. Factored action spaces in deep reinforcement learning. 2021.
  34. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning, 2018.
  35. Weighted qmix: Expanding monotonic value function factorisation for deep multi-agent reinforcement learning. Advances in neural information processing systems, 33:10199–10210, 2020.
  36. Ensemble value functions for efficient exploration in multi-agent reinforcement learning. arXiv preprint arXiv:2302.03439, 2023.
  37. Is bang-bang control all you need? solving continuous control with bernoulli policies. Advances in Neural Information Processing Systems, 34:27209–27221, 2021.
  38. Solving continuous control via q-learning. arXiv preprint arXiv:2210.12566, 2022.
  39. Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. arXiv preprint arXiv:1705.07269, 2017.
  40. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016.
  41. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
  42. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.
  43. Leveraging factored action spaces for efficient offline reinforcement learning in healthcare. Advances in Neural Information Processing Systems, 35:34272–34286, 2022.
  44. Discretizing continuous action space for on-policy optimization. In Proceedings of the aaai conference on artificial intelligence, volume 34, pages 5981–5988, 2020.
  45. CORL: Research-oriented deep offline reinforcement learning library. In 3rd Offline RL Workshop: Offline RL as a ”Launchpad”, 2022. URL https://openreview.net/forum?id=SyAS49bBcv.
  46. Action branching architectures for deep reinforcement learning. In Proceedings of the aaai conference on artificial intelligence, volume 32, 2018.
  47. Learning to represent action values as a hypergraph on the action vertices. arXiv preprint arXiv:2010.14680, 2020.
  48. Issues in using function approximation for reinforcement learning. In Proceedings of the Fourth Connectionist Models Summer School, volume 255, page 263. Hillsdale, NJ, 1993.
  49. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  50. Q-learning in enormous action spaces via amortized approximate maximization. arXiv preprint arXiv:2001.08116, 2020.
  51. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  52. Starcraft ii: A new challenge for reinforcement learning. arXiv preprint arXiv:1708.04782, 2017.
  53. Adaptive ensemble q-learning: Minimizing estimation bias via error feedback. Advances in Neural Information Processing Systems, 34:24778–24790, 2021.
  54. Q-learning. Machine learning, 8:279–292, 1992.
  55. Gerhard Weiß. Distributed reinforcement learning. In The Biology and technology of intelligent autonomous agents, pages 415–428. Springer, 1995.
  56. An introduction to collective intelligence. arXiv preprint cs/9908014, 1999.
  57. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020.
  58. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. Advances in neural information processing systems, 33:11853–11864, 2020.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. David Ireland (6 papers)
  2. Giovanni Montana (74 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com