Event Tables for Efficient Experience Replay (2211.00576v2)
Abstract: Experience replay (ER) is a crucial component of many deep reinforcement learning (RL) systems. However, uniform sampling from an ER buffer can lead to slow convergence and unstable asymptotic behaviors. This paper introduces Stratified Sampling from Event Tables (SSET), which partitions an ER buffer into Event Tables, each capturing important subsequences of optimal behavior. We prove a theoretical advantage over the traditional monolithic buffer approach and combine SSET with an existing prioritized sampling strategy to further improve learning speed and stability. Empirical results in challenging MiniGrid domains, benchmark RL environments, and a high-fidelity car racing simulator demonstrate the advantages and versatility of SSET over existing ER buffer sampling approaches.
- Hindsight experience replay. In Advances in Neural Information Processing Systems, 2017.
- Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81–138, 1995. ISSN 0004-3702. doi: https://doi.org/10.1016/0004-3702(94)00011-O. URL https://www.sciencedirect.com/science/article/pii/000437029400011O.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Reverb: A framework for experience replay. arXiv preprint arXiv:2102.04736, 2021.
- Minimalistic gridworld environment for OpenAI gym. https://github.com/maximecb/gym-minigrid, 2018.
- On the lambert w function. Advances in Computational Mathematics, 5(1):329–359, 1996.
- The importance of experience replay database composition in deep reinforcement learning. In Advances in Neural Information Processing Systems (NIPS-DRLWS), 2015.
- Improved deep reinforcement learning for robotics through distribution-based experience retention. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3947–3952, 2016.
- Lucid dreaming for experience replay: refreshing past states with the current policy. Neural Computing and Applications, 34(3):1687–1712, 2022.
- Super-human performance in Gran Turismo Sport using deep reinforcement learning. IEEE Robotics and Automation Letters, 6(3):4257–4264, 2021. doi: 10.1109/LRA.2021.3064284.
- An empirical investigation of catastrophic forgeting in gradientbased neural networks. In International Conference on Learning Representations (ICLR), 2014.
- Marek Grzes. Reward shaping in episodic reinforcement learning. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2017.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870, 2018.
- Topological experience replay. In International Conference on Learning Representations (ICLR), 2021.
- Efficient diversified mini-batch selection using variable high-layer features. In Asian Conference on Machine Learning (ACML), 2019.
- Selective experience replay for lifelong learning. In AAAI Conference on Artificial Intelligence, 2018.
- Barc: Backward reachability curriculum for robotic reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA), 2019.
- Experience replay using transition sequences. Frontiers in Neurorobotics, 12:32, 2018.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13):3521–3526, 2017. doi: 10.1073/pnas.1611835114. URL https://www.pnas.org/doi/abs/10.1073/pnas.1611835114.
- Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots. Artificial Intelligence, 247:313–335, 2017.
- Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396, 2016.
- Periodic Q-learning. In Learning for Dynamics and Control, 2020.
- Sample-efficient deep reinforcement learning via episodic backward update. In Advances in Neural Information Processing Systems, 2019.
- A note on target Q-learning for solving finite MDPs with a generative oracle. arXiv preprint arXiv:2203.11489, 2022.
- Long-Ji Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3–4):293–321, May 1992.
- Adaptive auxiliary task weighting for reinforcement learning. In Advances in Neural Information Processing Systems, 2019.
- Conflict-averse gradient descent for multi-task learning. In Advances in neural information processing systems, 2021.
- Weighted importance sampling for off-policy learning with linear function approximation. In Advances in Neural Information Processing Systems, 2014.
- Automatic discovery of subgoals in reinforcement learning using diverse density. In International Conference on Machine Learning (ICML), 2001.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Language understanding for textbased games using deep reinforcement learning. In Conference on Empirical Methods in Natural Language Processing, 2015.
- Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning (ICML), 1999.
- Model-augmented prioritized experience replay. In International Conference on Learning Representations (ICLR), 2021.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Experience replay for continual learning. In Advances in Neural Information Processing Systems, 2019. URL https://proceedings.neurips.cc/paper/2019/file/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Paper.pdf.
- Prioritized experience replay. In International Conference on Learning Representations (ICLR), 2016.
- Stratified sampling based experience replay for efficient camera selection decisions. In IEEE International Conference on Multimedia Big Data (BigMM), 2020.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, january 2016.
- Autonomous Overtaking in Gran Turismo Sport Using Curriculum Reinforcement Learning. In IEEE International Conference on Robotics and Automation (ICRA), 2021.
- Reinforcement learning: An introduction. MIT press, 2018.
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. IEEE, 2012.
- Deep reinforcement learning with double Q-learning. In AAAI Conference on Artificial Intelligence, 2016.
- Q-learning. Machine learning, 8(3):279–292, 1992.
- Outracing champion Gran Turismo drivers with deep reinforcement learning. Nature, 602:223–228, 2022.
- Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In AAAI Conference on Artificial Intelligence, 2017.
- Experience replay optimization. In International Joint Conference on Artificial Intelligence (IJCAI), 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.