Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency (2403.00673v2)
Abstract: Deep reinforcement learning (DRL) algorithms require substantial samples and computational resources to achieve higher performance, which restricts their practical application and poses challenges for further development. Given the constraint of limited resources, it is essential to leverage existing computational work (e.g., learned policies, samples) to enhance sample efficiency and reduce the computational resource consumption of DRL algorithms. Previous works to leverage existing computational work require intrusive modifications to existing algorithms and models, designed specifically for specific algorithms, lacking flexibility and universality. In this paper, we present the Snapshot Reinforcement Learning (SnapshotRL) framework, which enhances sample efficiency by simply altering environments, without making any modifications to algorithms and models. By allowing student agents to choose states in teacher trajectories as the initial state to sample, SnapshotRL can effectively utilize teacher trajectories to assist student agents in training, allowing student agents to explore a larger state space at the early training phase. We propose a simple and effective SnapshotRL baseline algorithm, S3RL, which integrates well with existing DRL algorithms. Our experiments demonstrate that integrating S3RL with TD3, SAC, and PPO algorithms on the MuJoCo benchmark significantly improves sample efficiency and average return, without extra samples and additional computational resources.
- Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/ba1c5356d9164bb64c446a4b690226b0-Abstract-Conference.html.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.
- Distilling policy distillation. In Kamalika Chaudhuri and Masashi Sugiyama (eds.), The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16-18 April 2019, Naha, Okinawa, Japan, volume 89 of Proceedings of Machine Learning Research, pp. 1331–1340. PMLR, 2019. URL http://proceedings.mlr.press/v89/czarnecki19a.html.
- Playing atari games with deep reinforcement learning and human checkpoint replay. CoRR, abs/1607.05077, 2016. URL http://arxiv.org/abs/1607.05077.
- Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022. URL http://jmlr.org/papers/v23/21-1342.html.
- Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning. arXiv preprint arXiv:2402.03046, 2024. URL https://arxiv.org/abs/2402.03046.
- Aw-opt: Learning robotic skills with imitation and reinforcement at scale. CoRR, abs/2111.05424, 2021. URL https://arxiv.org/abs/2111.05424.
- Finetuning from offline reinforcement learning: Challenges, trade-offs and practical solutions. CoRR, abs/2303.17396, 2023. doi: 10.48550/ARXIV.2303.17396. URL https://doi.org/10.48550/arXiv.2303.17396.
- Contrastive initial state buffer for reinforcement learning. arXiv preprint arXiv:2309.09752, 2023.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21-25, 2018, pp. 6292–6299. IEEE, 2018. doi: 10.1109/ICRA.2018.8463162. URL https://doi.org/10.1109/ICRA.2018.8463162.
- Accelerating online reinforcement learning with offline datasets. CoRR, abs/2006.09359, 2020. URL https://arxiv.org/abs/2006.09359.
- Cal-QL: Calibrated offline RL pre-training for efficient online fine-tuning. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023. URL https://openreview.net/forum?id=PhCWNmatOX.
- Time limits in reinforcement learning. In Jennifer G. Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 4042–4051. PMLR, 2018. URL http://proceedings.mlr.press/v80/pardo18a.html.
- Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4):143, 2018. doi: 10.1145/3197517.3201311. URL https://doi.org/10.1145/3197517.3201311.
- Sample-efficient learning of nonprehensile manipulation policies via physics-based informed state distributions. CoRR, abs/1810.10654, 2018. URL http://arxiv.org/abs/1810.10654.
- Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Geoffrey J. Gordon, David B. Dunson, and Miroslav Dudík (eds.), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, AISTATS 2011, Fort Lauderdale, USA, April 11-13, 2011, volume 15 of JMLR Proceedings, pp. 627–635. JMLR.org, 2011. URL http://proceedings.mlr.press/v15/ross11a/ross11a.pdf.
- Learning montezuma’s revenge from a single demonstration. CoRR, abs/1812.03381, 2018. URL http://arxiv.org/abs/1812.03381.
- Truncated horizon policy search: Combining reinforcement learning & imitation learning. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. URL https://openreview.net/forum?id=ryUlhzWCZ.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE, 2012. doi: 10.1109/IROS.2012.6386109.
- Gymnasium, March 2023. URL https://zenodo.org/record/8127025.
- Jump-start reinforcement learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 34556–34583. PMLR, 2023. URL https://proceedings.mlr.press/v202/uchendu23a.html.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. CoRR, abs/1707.08817, 2017. URL http://arxiv.org/abs/1707.08817.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Supported policy optimization for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:31278–31291, 2022.
- Guiding online reinforcement learning with action-free offline pretraining. CoRR, abs/2301.12876, 2023. doi: 10.48550/arXiv.2301.12876. URL https://doi.org/10.48550/arXiv.2301.12876.