Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adapting Double Q-Learning for Continuous Reinforcement Learning (2309.14471v1)

Published 25 Sep 2023 in cs.LG and cs.AI

Abstract: Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins. In this work we present a novel approach to the bias correction, similar in spirit to Double Q-Learning. We propose using a policy in form of a mixture with two components. Each policy component is maximized and assessed by separate networks, which removes any basis for the overestimation bias. Our approach shows promising near-SOTA results on a small set of MuJoCo environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (8)
  1. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pp. 1587–1596. PMLR, 2018.
  2. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  3. Hasselt, H. Double q-learning. Advances in neural information processing systems, 23:2613–2621, 2010.
  4. Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), 2016. ISSN 2159-5399. doi: 10.1609/aaai.v30i1.10295.
  5. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics. In International Conference on Machine Learning, pp. 5556–5566. PMLR, 2020.
  6. Maxmin q-learning: Controlling the estimation bias of q-learning. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https://openreview.net/forum?id=Bkg0u3Etwr.
  7. Issues in using function approximation for reinforcement learning. In Proceedings of the Fourth Connectionist Models Summer School, pp.  255–263. Hillsdale, NJ, 1993.
  8. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.

Summary

We haven't generated a summary for this paper yet.