Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback (2306.11918v1)

Published 20 Jun 2023 in cs.LG and cs.AI

Abstract: The ensemble method is a promising way to mitigate the overestimation issue in Q-learning, where multiple function approximators are used to estimate the action values. It is known that the estimation bias hinges heavily on the ensemble size (i.e., the number of Q-function approximators used in the target), and that determining the `right' ensemble size is highly nontrivial, because of the time-varying nature of the function approximation errors during the learning process. To tackle this challenge, we first derive an upper bound and a lower bound on the estimation bias, based on which the ensemble size is adapted to drive the bias to be nearly zero, thereby coping with the impact of the time-varying approximation errors accordingly. Motivated by the theoretic findings, we advocate that the ensemble method can be combined with Model Identification Adaptive Control (MIAC) for effective ensemble size adaptation. Specifically, we devise Adaptive Ensemble Q-learning (AdaEQ), a generalized ensemble method with two key steps: (a) approximation error characterization which serves as the feedback for flexibly controlling the ensemble size, and (b) ensemble size adaptation tailored towards minimizing the estimation bias. Extensive experiments are carried out to show that AdaEQ can improve the learning performance than the existing methods for the MuJoCo benchmark.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR, 2020.
  2. Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In International Conference on Machine Learning, pages 176–185. PMLR, 2017.
  3. K Johan Astrom. Adaptive feedback control. Proceedings of the IEEE, 75(2):185–217, 1987.
  4. Richard Bellman. Dynamic programming and stochastic control processes. Information and control, 1(3):228–239, 1958.
  5. Dimitri P Bertsekas. Reinforcement learning and optimal control. Athena Scientific Belmont, MA, 2019.
  6. Randomized ensembled double q-learning: Learning fast without a model. arXiv preprint arXiv:2101.05982, 2021.
  7. Thomas G Dietterich. Ensemble methods in machine learning. In International workshop on multiple classifier systems, pages 1–15. Springer, 2000.
  8. Estimating maximum expected value through gaussian approximation. In International Conference on Machine Learning, pages 1032–1040. PMLR, 2016.
  9. Addressing function approximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596. PMLR, 2018.
  10. Off-policy deep reinforcement learning without exploration. In International Conference on Machine Learning, pages 2052–2062. PMLR, 2019.
  11. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870. PMLR, 2018.
  12. Hado Hasselt. Double q-learning. Advances in neural information processing systems, 23:2613–2621, 2010.
  13. How to train your robot with deep reinforcement learning: lessons we have learned. The International Journal of Robotics Research, 40(4-5):698–721, 2021.
  14. When to trust your model: Model-based policy optimization. arXiv preprint arXiv:1906.08253, 2019.
  15. A deterministic improved q-learning for path planning of a mobile robot. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 43(5):1141–1153, 2013.
  16. Stabilizing off-policy q-learning via bootstrapping error reduction. arXiv preprint arXiv:1906.00949, 2019.
  17. Maxmin q-learning: Controlling the estimation bias of q-learning. arXiv preprint arXiv:2002.06487, 2020.
  18. Bias-corrected q-learning to control max-operator bias in q-learning. In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 93–99. IEEE, 2013.
  19. Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. arXiv preprint arXiv:2007.04938, 2020.
  20. Why deep neural networks for function approximation? arXiv preprint arXiv:1610.04161, 2016.
  21. Resource management with deep reinforcement learning. In Proceedings of the 15th ACM workshop on hot topics in networks, pages 50–56, 2016.
  22. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  23. Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE transactions on neural networks and learning systems, 24(10):1513–1525, 2013.
  24. Planning with goal-conditioned policies. arXiv preprint arXiv:1911.08453, 2019.
  25. Romeo Ortega and Yu Tang. Robustness of adaptive controllers—a survey. Automatica, 25(5):651–677, 1989.
  26. Ensemble bootstrapping for q-learning. arXiv preprint arXiv:2103.00445, 2021.
  27. Revisiting the softmax bellman operator: New benefits and new perspective. In International Conference on Machine Learning, pages 5916–5925. PMLR, 2019.
  28. Reinforcement learning: An introduction. MIT press, 2018.
  29. Issues in using function approximation for reinforcement learning. In Proceedings of the 1993 Connectionist Models Summer School Hillsdale, NJ. Lawrence Erlbaum, 1993.
  30. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
  31. On the power of randomization for scheduling real-time traffic in wireless networks. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pages 59–68. IEEE, 2020.
  32. Deep reinforcement learning and the deadly triad. arXiv preprint arXiv:1812.02648, 2018.
  33. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
  34. Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 18(1):73–82, 2005.
  35. Q-learning. Machine learning, 8(3-4):279–292, 1992.
  36. On the importance of hyperparameter optimization for model-based reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pages 4015–4023. PMLR, 2021.
  37. Weighted double q-learning. In IJCAI, pages 3455–3461, 2017.
  38. Self-correcting q-learning. arXiv preprint arXiv:2012.01100, 2020.
Citations (18)

Summary

We haven't generated a summary for this paper yet.