Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Episodic Reinforcement Learning with Expanded State-reward Space (2401.10516v1)

Published 19 Jan 2024 in cs.LG and cs.AI

Abstract: Empowered by deep neural networks, deep reinforcement learning (DRL) has demonstrated tremendous empirical successes in various domains, including games, health care, and autonomous driving. Despite these advancements, DRL is still identified as data-inefficient as effective policies demand vast numbers of environmental samples. Recently, episodic control (EC)-based model-free DRL methods enable sample efficiency by recalling past experiences from episodic memory. However, existing EC-based methods suffer from the limitation of potential misalignment between the state and reward spaces for neglecting the utilization of (past) retrieval states with extensive information, which probably causes inaccurate value estimation and degraded policy performance. To tackle this issue, we introduce an efficient EC-based DRL framework with expanded state-reward space, where the expanded states used as the input and the expanded rewards used in the training both contain historical and current information. To be specific, we reuse the historical states retrieved by EC as part of the input states and integrate the retrieved MC-returns into the immediate reward in each interactive transition. As a result, our method is able to simultaneously achieve the full utilization of retrieval information and the better evaluation of state values by a Temporal Difference (TD) loss. Empirical results on challenging Box2d and Mujoco tasks demonstrate the superiority of our method over a recent sibling method and common baselines. Further, we also verify our method's effectiveness in alleviating Q-value overestimation by additional experiments of Q-value comparison.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
  2. Dreamerpro: Reconstruction-free model-based reinforcement learning with prototypical representations. In International Conference on Machine Learning, pp. 4956–4975. PMLR, 2022.
  3. Intelligent problem-solving as integrated hierarchical reinforcement learning. Nature Machine Intelligence, 4(1):11–20, 2022.
  4. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018a.
  5. Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018b.
  6. Information optimization and transferable state abstractions in deep reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4782–4793, 2022.
  7. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  8. Rainbow: Combining improvements in deep reinforcement learning. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
  9. Deep variational reinforcement learning for pomdps. In International Conference on Machine Learning, pp. 2117–2126. PMLR, 2018.
  10. Extensions of lipschitz maps into banach spaces. Israel Journal of Mathematics, 54(2):129–138, 1986.
  11. Qmdp-net: Deep learning for planning under partial observability. Advances in neural information processing systems, 30, 2017.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Model-ensemble trust-region policy optimization. arXiv preprint arXiv:1802.10592, 2018.
  14. Solving continuous control with episodic memory. The Thirtieth International Joint Conference on Artificial Intelligence, pp.  2651–2657, 2021.
  15. Hippocampal contributions to control: the third way. Advances in neural information processing systems, 20, 2007.
  16. Neural episodic control with state abstraction. In The Eleventh International Conference on Learning Representations, 2022.
  17. Gated multi-attention representation in reinforcement learning. Knowledge-Based Systems, 233:107535, 2021.
  18. Sequential action-induced invariant representation for reinforcement learning. arXiv preprint arXiv:2309.12628, 2023a.
  19. The treatment of sepsis: an episodic memory-assisted deep reinforcement learning approach. Applied Intelligence, 53(9):11034–11044, 2023b.
  20. Continuous control with deep reinforcement learning. International Conference on Learning Representations, 2016.
  21. Kernel-based reinforcement learning in robust markov decision processes. In International Conference on Machine Learning, pp. 3973–3981. PMLR, 2019.
  22. Episodic memory deep q-networks. Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018.
  23. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. arXiv preprint arXiv:1807.03858, 2018.
  24. Playing atari with deep reinforcement learning. Advances in neural information processing systems, pp. 201–220, 2013.
  25. Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
  26. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pp. 1928–1937. PMLR, 2016.
  27. Value prediction network. Advances in neural information processing systems, 30, 2017.
  28. Neural episodic control. In International conference on machine learning, pp. 2827–2836. PMLR, 2017.
  29. Imagination-augmented agents for deep reinforcement learning. Advances in neural information processing systems, 30, 2017.
  30. Hypernetworks for zero-shot transfer in reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  9579–9587, 2023.
  31. Rambo-rl: Robust adversarial model-based offline reinforcement learning. Advances in neural information processing systems, 35:16082–16097, 2022.
  32. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
  33. Trust region policy optimization. In International conference on machine learning, pp. 1889–1897. PMLR, 2015.
  34. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  35. Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity. In International Conference on Machine Learning, pp. 19967–20025. PMLR, 2022.
  36. Sutton, Richard S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine learning proceedings 1990, pp.  216–224. Elsevier, 1990.
  37. Sutton, Richard S. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 2(4):160–163, 1991.
  38. Reinforcement learning: An introduction. MIT press, 2018.
  39. Tesauro, Gerald et al. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58–68, 1995.
  40. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.  5026–5033. IEEE, 2012.
  41. What is essential for unseen goal generalization of offline goal-conditioned rl? In International Conference on Machine Learning, pp. 39543–39571. PMLR, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets