Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Highway Graph to Accelerate Reinforcement Learning (2405.11727v1)

Published 20 May 2024 in cs.LG

Abstract: Reinforcement Learning (RL) algorithms often suffer from low training efficiency. A strategy to mitigate this issue is to incorporate a model-based planning algorithm, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. The major limitation of VI is the need to iterate over a large tensor. These still lead to intensive computations. We focus on improving the training efficiency of RL algorithms by improving the efficiency of the value learning process. For the deterministic environments with discrete state and action spaces, a non-branching sequence of transitions moves the agent without deviating from intermediate states, which we call a highway. On such non-branching highways, the value-updating process can be merged as a one-step process instead of iterating the value step-by-step. Based on this observation, we propose a novel graph structure, named highway graph, to model the state transition. Our highway graph compresses the transition model into a concise graph, where edges can represent multiple state transitions to support value propagation across multiple time steps in each iteration. We thus can obtain a more efficient value learning approach by facilitating the VI algorithm on highway graphs. By integrating the highway graph into RL (as a model-based off-policy RL method), the RL training can be remarkably accelerated in the early stages (within 1 million frames). Comparison against various baselines on four categories of environments reveals that our method outperforms both representative and novel model-free and model-based RL algorithms, demonstrating 10 to more than 150 times more efficiency while maintaining an equal or superior expected return, as confirmed by carefully conducted analyses. Moreover, a deep neural network-based agent is trained using the highway graph, resulting in better generalization and lower storage costs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
  2. Model-free episodic control. arXiv preprint arXiv:1606.04460, 2016.
  3. Correcting discount-factor mismatch in on-policy policy gradient methods. In International Conference on Machine Learning, volume 202, pp.  4218–4240. PMLR, 2023.
  4. IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures. In International Conference on Machine Learning, volume 80, pp.  1406–1415. PMLR, 2018.
  5. SEED RL: scalable and efficient deep-rl with accelerated central inference. In International Conference on Learning Representations, 2020.
  6. Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies. In International Conference on Machine Learning, pp. 9827–9869. PMLR, 2023.
  7. Understanding the difficulty of training deep feedforward neural networks. In International Conference on Artificial Intelligence and Statistics, volume 9, pp.  249–256, 2010.
  8. Generalizable episodic memory for deep reinforcement learning. In International Conference on Machine Learning, volume 139, pp.  4380–4390. PMLR, 2021.
  9. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
  10. Model based reinforcement learning for atari. In International Conference on Learning Representations, 2020.
  11. Recurrent experience replay in distributed reinforcement learning. In International Conference on Learning Representations, 2019.
  12. An introduction to metric spaces and fixed point theory. John Wiley & Sons, 2011.
  13. Google research football: A novel reinforcement learning environment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  4501–4510, 2020.
  14. Rllib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning, volume 80, pp.  3059–3068. PMLR, 2018.
  15. Episodic memory deep q-networks. In International Joint Conference on Artificial Intelligence, pp.  2433–2439, 2018.
  16. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
  17. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, volume 48, pp.  1928–1937. PMLR, 2016.
  18. Neural episodic control. In International Conference on Machine Learning, volume 70, pp.  2827–2836. PMLR, 2017a.
  19. Neural episodic control. In International conference on machine learning, pp. 2827–2836. PMLR, 2017b.
  20. Prioritized experience replay. In International Conference on Learning Representations, 2016.
  21. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
  22. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  23. Reinforcement learning: An introduction. MIT press, 2018.
  24. Monte carlo tree search: a review of recent modifications and applications. Artificial Intelligence Review, 56(3):2497–2562, 2023.
  25. Value iteration networks. Advances in Neural Information Processing Systems, 29:2146–2154, 2016.
  26. Gymnasium, 2023. URL https://zenodo.org/record/8127025.
  27. Efficient approximate value iteration for continuous gaussian pomdps. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 26, pp.  1832–1838, 2012.
  28. Episodic reinforcement learning with associative memory. In International Conference on Learning Representations, 2020.

Summary

We haven't generated a summary for this paper yet.