Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution (2404.04253v1)

Published 5 Apr 2024 in cs.LG, cs.AI, and cs.RO

Abstract: Recent reinforcement learning approaches have shown surprisingly strong capabilities of bang-bang policies for solving continuous control benchmarks. The underlying coarse action space discretizations often yield favourable exploration characteristics while final performance does not visibly suffer in the absence of action penalization in line with optimal control theory. In robotics applications, smooth control signals are commonly preferred to reduce system wear and energy efficiency, but action costs can be detrimental to exploration during early training. In this work, we aim to bridge this performance gap by growing discrete action spaces from coarse to fine control resolution, taking advantage of recent results in decoupled Q-learning to scale our approach to high-dimensional action spaces up to dim(A) = 38. Our work indicates that an adaptive control resolution in combination with value decomposition yields simple critic-only algorithms that yield surprisingly strong performance on continuous control tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Relative entropy regularized policy iteration. arXiv preprint arXiv:1812.02256, 2018a.
  2. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920, 2018b.
  3. Charles W. Anderson. Learning to Control an Inverted Pendulum with Connectionist Networks. In Proceedings of the American Control Conference (ACC), 1988.
  4. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617, 2018.
  5. Deepmind lab. arXiv preprint arXiv:1612.03801, 2016.
  6. On the “bang-bang” control problem. Quarterly of Applied Mathematics, 14(1), 1956.
  7. Value constrained model-free continuous control. arXiv preprint arXiv:1902.04623, 2019.
  8. Exploration with unreliable intrinsic reward in multi-agent reinforcement learning. arXiv preprint arXiv:1906.02138, 2019.
  9. Deep coordination graphs. In International Conference on Machine Learning, pages 980–991. PMLR, 2020.
  10. Decentralized reinforcement learning control of a robotic manipulator. In 2006 9th International Conference on Control, Automation, Robotics and Vision, pages 1–6. IEEE, 2006.
  11. Scaling multi-agent reinforcement learning with selective parameter sharing. In International Conference on Machine Learning, pages 1989–1998. PMLR, 2021.
  12. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1710.00336, 2017.
  13. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI, 1998:2, 1998.
  14. Mix & match agent curricula for reinforcement learning. In International Conference on Machine Learning, pages 1087–1095. PMLR, 2018.
  15. Growing action spaces. In International Conference on Machine Learning, pages 3040–3051. PMLR, 2020.
  16. Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning, pages 1146–1155. PMLR, 2017.
  17. Coordinated reinforcement learning. In ICML, volume 2, pages 227–234. Citeseer, 2002.
  18. Cooperative multi-agent control using deep reinforcement learning. In International conference on autonomous agents and multiagent systems, pages 66–83. Springer, 2017.
  19. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  20. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  21. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
  22. Benjamin J. Hodel. Learning to Operate an Excavator via Policy Optimization. Procedia Computer Science, 140, 2018.
  23. Learning Gentle Object Manipulation with Curiosity-Driven Deep Reinforcement Learning. arXiv:1903.08542, 2019.
  24. Revalued: Regularised ensemble value-decomposition for factorisable markov decision processes. arXiv preprint arXiv:2401.08850, 2024.
  25. Population based training of neural networks. arXiv preprint arXiv:1711.09846, 2017.
  26. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789–1828, 2006.
  27. J. Lambert and M. Levine. A two-stage learning control system. Trans. on Automatic Control, 15(3), 1970.
  28. J. P. LaSalle. Time Optimal Control Systems. Proceedings of the National Academy of Sciences, 45(4), 1959.
  29. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer, 2000.
  30. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 64–69. IEEE, 2007.
  31. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27:1–31, 2012.
  32. Optimization methods for the verification of second order sufficient conditions for bang–bang controls. Optimal Control Applications and Methods, 26(3), 2005.
  33. Discrete sequential prediction of continuous actions for deep rl. arXiv preprint arXiv:1705.05035, 2017.
  34. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  35. Human-level control through deep reinforcement learning. nature, 518:529–533, 2015.
  36. Regularizing action policies for smooth control with reinforcement learning. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1810–1816. IEEE, 2021.
  37. Continuous-discrete reinforcement learning for hybrid control in robotics. In Conference on Robot Learning, pages 735–751. PMLR, 2020.
  38. Remember and Forget for Experience Replay. In International Conference on Machine Learning (ICML), 2019.
  39. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34, 2021.
  40. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pages 4295–4304. PMLR, 2018.
  41. Direct behavior specification via constrained reinforcement learning. arXiv preprint arXiv:2112.12228, 2021.
  42. Q-decomposition for reinforcement learning agents. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 656–663, 2003.
  43. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  44. Distributed value functions. In ICML, 1999.
  45. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  46. Is bang-bang control all you need? solving continuous control with bernoulli policies. Advances in Neural Information Processing Systems, 34, 2021.
  47. Strength through diversity: Robust behavior learning via mixture policies. In Conference on Robot Learning, pages 1144–1155. PMLR, 2022a.
  48. Solving continuous control via q-learning. In The Eleventh International Conference on Learning Representations, 2022b.
  49. Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. arXiv preprint arXiv:1705.07269, 2017.
  50. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460–9471, 2022.
  51. Grow your limits: Continuous improvement with real-world rl for robotic locomotion. arXiv preprint arXiv:2310.17634, 2023.
  52. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5887–5896. PMLR, 2019.
  53. The Bang-Bang Principle for Linear Control Systems. Journal of the Society for Industrial and Applied Mathematics Series A Control, 2(2), 1964.
  54. Value-decomposition multi-agent actor-critics. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 11352–11360, 2021.
  55. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
  56. Reinforcement learning: An introduction. 2018.
  57. Growing up together: Structured exploration for large action spaces. 2019.
  58. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pages 330–337, 1993.
  59. Discretizing continuous action space for on-policy optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5981–5988, 2020.
  60. Arash Tavakoli. On structural and temporal credit assignment in reinforcement learning. PhD thesis, Imperial College London, 2021.
  61. Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  62. Learning to represent action values as a hypergraph on the action vertices. In International Conference on Learning Representations, 2021.
  63. Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators. IEEE T-RO, 35(1), 2019.
  64. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  65. Q-learning in enormous action spaces via amortized approximate maximization. arXiv preprint arXiv:2001.08116, 2020.
  66. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  67. Hybrid reward architecture for reinforcement learning. Advances in Neural Information Processing Systems, 30, 2017.
  68. Myosuite – a contact-rich simulation suite for musculoskeletal motor control. https://github.com/myohub/myosuite, 2022. URL https://arxiv.org/abs/2205.13600.
  69. M. Waltz and K. Fu. A heuristic approach to reinforcement learning control systems. IEEE TACON, 10(4), 1965.
  70. Dop: Off-policy multi-agent decomposed policy gradients. In International Conference on Learning Representations, 2020.
  71. Q-learning. Machine learning, 8:279–292, 1992.
  72. Compositional Transfer in Hierarchical Reinforcement Learning. In Robotics: Science and Systems (RSS), 2020.
  73. Reinforcement learning for adaptive mesh refinement. In International Conference on Artificial Intelligence and Statistics, pages 5997–6014. PMLR, 2023.
  74. Mean field multi-agent reinforcement learning. In International Conference on Machine Learning, pages 5571–5580. PMLR, 2018.
  75. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pages 1094–1100. PMLR, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com