Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Solving Continuous Control via Q-learning (2210.12566v2)

Published 22 Oct 2022 in cs.LG, cs.AI, and cs.RO

Abstract: While there has been substantial success for solving continuous control with actor-critic methods, simpler critic-only methods such as Q-learning find limited application in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilisation, compute requirements and wider hyperparameter search spaces. We show that a simple modification of deep Q-learning largely alleviates these issues. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a variety of continuous control tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (75)
  1. Maximum a posteriori policy optimisation. arXiv preprint arXiv:1806.06920, 2018.
  2. Cooperative prioritized sweeping. In Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, pp.  160–168, 2021.
  3. Distributed distributional deterministic policy gradients. arXiv preprint arXiv:1804.08617, 2018.
  4. A distributional perspective on reinforcement learning. In International Conference on Machine Learning, pp.  449–458. PMLR, 2017.
  5. On the “bang-bang” control problem. Quarterly of Applied Mathematics, 14(1):11–18, 1956.
  6. Exploration with unreliable intrinsic reward in multi-agent reinforcement learning. arXiv preprint arXiv:1906.02138, 2019.
  7. Deep coordination graphs. In International Conference on Machine Learning, pp.  980–991. PMLR, 2020.
  8. Decentralized reinforcement learning control of a robotic manipulator. In 2006 9th International Conference on Control, Automation, Robotics and Vision, pp.  1–6. IEEE, 2006.
  9. Scaling multi-agent reinforcement learning with selective parameter sharing. In International Conference on Machine Learning, pp.  1989–1998. PMLR, 2021.
  10. Parameter sharing deep deterministic policy gradient for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:1710.00336, 2017.
  11. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI, 1998:2, 1998.
  12. Implicit quantile networks for distributional reinforcement learning. In International conference on machine learning, pp.  1096–1105. PMLR, 2018a.
  13. Distributional reinforcement learning with quantile regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018b.
  14. Growing action spaces. In International Conference on Machine Learning, pp.  3040–3051. PMLR, 2020.
  15. Stabilising experience replay for deep multi-agent reinforcement learning. In International conference on machine learning, pp.  1146–1155. PMLR, 2017.
  16. Coordinated reinforcement learning. In ICML, volume 2, pp.  227–234. Citeseer, 2002.
  17. Cooperative multi-agent control using deep reinforcement learning. In International conference on autonomous agents and multiagent systems, pp.  66–83. Springer, 2017.
  18. World models. arXiv preprint arXiv:1803.10122, 2018.
  19. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  20. Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603, 2019.
  21. Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193, 2020.
  22. Temporal difference learning for model predictive control. In ICML, 2022.
  23. Rainbow: Combining improvements in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018.
  24. Acme: A research framework for distributed reinforcement learning. arXiv preprint arXiv:2006.00979, 2020. URL https://arxiv.org/abs/2006.00979.
  25. Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation. ArXiv, abs/1806.10293, 2018.
  26. Mini cheetah: A platform for pushing the limits of dynamic quadruped control. In 2019 International Conference on Robotics and Automation (ICRA), pp.  6295–6301. IEEE, 2019.
  27. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research, 7:1789–1828, 2006.
  28. Joseph P LaSalle. The ‘bang-bang’principle. IFAC Proceedings Volumes, 1(1):503–507, 1960.
  29. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In In Proceedings of the Seventeenth International Conference on Machine Learning. Citeseer, 2000.
  30. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 30, 2017.
  31. Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
  32. Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp.  64–69. IEEE, 2007.
  33. Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27:1–31, 2012.
  34. Discrete sequential prediction of continuous actions for deep rl. arXiv preprint arXiv:1705.05035, 2017.
  35. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  36. Human-level control through deep reinforcement learning. nature, 518:529–533, 2015.
  37. Continuous-discrete reinforcement learning for hybrid control in robotics. In Conference on Robot Learning, pp.  735–751. PMLR, 2020.
  38. Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016.
  39. Lenient learners in cooperative multiagent systems. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems, pp.  801–803, 2006.
  40. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pp.  2778–2787. PMLR, 2017.
  41. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 34, 2021.
  42. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  4295–4304. PMLR, 2018.
  43. Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning, pp.  91–100. PMLR, 2022.
  44. Q-decomposition for reinforcement learning agents. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp.  656–663, 2003.
  45. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  46. Distributed value functions. In ICML, 1999.
  47. Trust region policy optimization. In International conference on machine learning, pp.  1889–1897. PMLR, 2015.
  48. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  49. Planning to explore via self-supervised world models. In International Conference on Machine Learning, pp.  8583–8592. PMLR, 2020.
  50. Is bang-bang control all you need? solving continuous control with bernoulli policies. Advances in Neural Information Processing Systems, 34, 2021.
  51. Strength through diversity: Robust behavior learning via mixture policies. In Conference on Robot Learning, pp.  1144–1155. PMLR, 2022a.
  52. Learning to plan optimistically: Uncertainty-guided deep exploration via latent model ensembles. In Conference on Robot Learning, pp.  1156–1167. PMLR, 2022b.
  53. Learning to factor policies and action-value functions: Factored action space representations for deep reinforcement learning. arXiv preprint arXiv:1705.07269, 2017.
  54. Qtran: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  5887–5896. PMLR, 2019.
  55. Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136, 2020.
  56. Value-decomposition multi-agent actor-critics. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  11352–11360, 2021.
  57. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint arXiv:1706.05296, 2017.
  58. Reinforcement learning: An introduction. 2018.
  59. Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp.  330–337, 1993.
  60. Discretizing continuous action space for on-policy optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  5981–5988, 2020.
  61. Arash Tavakoli. On structural and temporal credit assignment in reinforcement learning. PhD thesis, Imperial College London, 2021.
  62. Action branching architectures for deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  63. Learning to represent action values as a hypergraph on the action vertices. In International Conference on Learning Representations, 2021.
  64. dm_control: Software and tasks for continuous control. Software Impacts, 6:100022, 2020.
  65. Q-learning in enormous action spaces via amortized approximate maximization. arXiv preprint arXiv:2001.08116, 2020.
  66. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
  67. Hybrid reward architecture for reinforcement learning. Advances in Neural Information Processing Systems, 30, 2017.
  68. Dop: Off-policy multi-agent decomposed policy gradients. In International Conference on Learning Representations, 2020.
  69. Q-learning. Machine learning, 8:279–292, 1992.
  70. Representation matters: Improving perception and exploration for robotics. 2021 IEEE International Conference on Robotics and Automation (ICRA), pp.  6512–6519, 2021.
  71. Accelerated policy learning with parallel differentiable simulation. In International Conference on Learning Representations, 2021.
  72. Mean field multi-agent reinforcement learning. In International Conference on Machine Learning, pp.  5571–5580. PMLR, 2018.
  73. Soft actor-critic (sac) implementation in pytorch. https://github.com/denisyarats/pytorch_sac, 2020.
  74. Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
  75. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning, pp.  1094–1100. PMLR, 2020.
Citations (18)

Summary

We haven't generated a summary for this paper yet.