Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning (2403.02107v6)
Abstract: The vast majority of Reinforcement Learning methods is largely impacted by the computation effort and data requirements needed to obtain effective estimates of action-value functions, which in turn determine the quality of the overall performance and the sample-efficiency of the learning procedure. Typically, action-value functions are estimated through an iterative scheme that alternates the application of an empirical approximation of the BeLLMan operator and a subsequent projection step onto a considered function space. It has been observed that this scheme can be potentially generalized to carry out multiple iterations of the BeLLMan operator at once, benefiting the underlying learning algorithm. However, till now, it has been challenging to effectively implement this idea, especially in high-dimensional problems. In this paper, we introduce iterated $Q$-Network (i-QN), a novel principled approach that enables multiple consecutive BeLLMan updates by learning a tailored sequence of action-value functions where each serves as the target for the next. We show that i-QN is theoretically grounded and that it can be seamlessly used in value-based and actor-critic methods. We empirically demonstrate the advantages of i-QN in Atari $2600$ games and MuJoCo continuous control problems.
- An optimistic perspective on offline reinforcement learning. In International conference on machine learning, 2020.
- Deep reinforcement learning at the edge of the statistical precipice. In Advances in neural information processing systems, 2021.
- The arcade learning environment: An evaluation platform for general agents. Journal of artificial intelligence research, 47:253–279, 2013.
- A distributional perspective on reinforcement learning. In International conference on machine learning, 2017.
- Bertsekas, D. Reinforcement learning and optimal control. Athena Scientific, 2019.
- Bertsekas, D. P. Dynamic programming and optimal control 4th edition, volume II. Athena scientific, 2015.
- Crossq: Batch normalization in deep reinforcement learning for greater sample efficiency and simplicity. In International conference on learning representations, 2024.
- JAX: composable transformations of Python+NumPy programs, 2018.
- Bradtke, S. Reinforcement learning applied to linear quadratic regulation. In Advances in neural information processing systems, 1992.
- Openai gym, 2016.
- Dopamine: A research framework for deep reinforcement learning. arXiv preprint arXiv:1812.06110, 2018.
- Implicit quantile networks for distributional reinforcement learning. In International conference on machine learning, 2018.
- Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022.
- Mushroomrl: Simplifying reinforcement learning research. Journal of machine learning research, 22(131):1–5, 2021.
- Tree-based batch mode reinforcement learning. Journal of machine learning research, 6(2005):503–556, 2005.
- Farahmand, A.-m. Regularization in reinforcement learning. 2011.
- Orchestrated value mapping for reinforcement learning. arXiv preprint arXiv:2203.07171, 2022.
- Bayesian bellman operators. In Advances in neural information processing systems, 2021.
- Learn2assemble with structured representations and search for robotic architectural construction. In Conference on robot learning, 2022.
- Hypernetworks. In International conference on learning representations, 2016.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, 2018.
- Hasselt, H. Double q-learning. In Advances in neural information processing systems, 2010.
- Rainbow: Combining improvements in deep reinforcement learning. In AAAI conference on artificial intelligence, 2018.
- Dropout q-functions for doubly efficient reinforcement learning. In International Conference on Learning Representations, 2022.
- Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of machine learning research, 23(274):1–18, 2022.
- Adam: A method for stochastic optimization. In International conference on learning representations, 2015.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Revisiting the arcade learning environment: Evaluation protocols and open problems for general agents. Journal of artificial intelligence research, 61:523–562, 2018.
- Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- Deep exploration via bootstrapped dqn. In Advances in neural information processing systems, 2016.
- Training larger networks for deep reinforcement learning. arXiv preprint arXiv:2102.07920, 2021.
- Puterman, M. L. Markov decision processes. Handbooks in operations research and management science, 2:331–434, 1990.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of machine learning research, 22(268):1–8, 2021.
- Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
- Chaining value functions for off-policy learning. In AAAI conference on artificial intelligence, 2022.
- Sutton, R. S. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9–44, 1988.
- Reinforcement learning: An introduction. MIT Press, 1998.
- Mujoco: A physics engine for model-based control. In International conference on intelligent robots and systems, 2012.
- Deep reinforcement learning with double q-learning. In AAAI conference on artificial intelligence, 2016.
- Parameterized projected bellman operator. In AAAI conference on artificial intelligence, 2024.
- Dueling network architectures for deep reinforcement learning. In International conference on machine learning, 2016.
- Watkins, C. J. C. H. Learning from Delayed Rewards. PhD thesis, King’s College, Oxford, 1989.