Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning (2405.03379v1)
Abstract: Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, especially for domains such as robotics. Our approach consists of a reverse curriculum followed by a forward curriculum. Unique to our approach compared to past work is the ability to efficiently leverage more than one demonstration via a per-demonstration reverse curriculum generated via state resets. The result of our reverse curriculum is an initial policy that performs well on a narrow initial state distribution and helps overcome difficult exploration problems. A forward curriculum is then used to accelerate the training of the initial policy to perform well on the full initial state distribution of the task and improve demonstration and sample efficiency. We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency compared against various state-of-the-art learning-from-demonstration baselines, even solving previously unsolvable tasks that require high precision and control.
- Concrete problems in AI safety. CoRR, abs/1606.06565, 2016. URL http://arxiv.org/abs/1606.06565.
- Robot learning from demonstration. In Douglas H. Fisher (ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML 1997), Nashville, Tennessee, USA, July 8-12, 1997, pp. 12–20. Morgan Kaufmann, 1997.
- Efficient online reinforcement learning with offline data. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 1577–1594. PMLR, 2023. URL https://proceedings.mlr.press/v202/ball23a.html.
- Decision transformer: Reinforcement learning via sequence modeling. In Marc’Aurelio Ranzato, Alina Beygelzimer, Yann N. Dauphin, Percy Liang, and Jennifer Wortman Vaughan (eds.), Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pp. 15084–15097, 2021a. URL https://proceedings.neurips.cc/paper/2021/hash/7f489f642a0ddb10272b5c31057f0663-Abstract.html.
- Randomized ensembled double q-learning: Learning fast without a model. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021b. URL https://openreview.net/forum?id=AY8zfZm0tDd.
- Sequential dexterity: Chaining dexterous policies for long-horizon manipulation. In Conference on Robot Learning, 2023.
- Go-explore: a new approach for hard-exploration problems. CoRR, abs/1901.10995, 2019. URL http://arxiv.org/abs/1901.10995.
- Reverse curriculum generation for reinforcement learning. In 1st Annual Conference on Robot Learning, CoRL 2017, Mountain View, California, USA, November 13-15, 2017, Proceedings, volume 78 of Proceedings of Machine Learning Research, pp. 482–495. PMLR, 2017. URL http://proceedings.mlr.press/v78/florensa17a.html.
- Brax - A differentiable physics engine for large scale rigid body simulation. In Joaquin Vanschoren and Sai-Kit Yeung (eds.), Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021. URL https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/d1f491a404d6854880943e5c3cd9ca25-Abstract-round1.html.
- Maniskill2: A unified benchmark for generalizable manipulation skills. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=b_CQDy9vrD1.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer G. Dy and Andreas Krause (eds.), Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, pp. 1856–1865. PMLR, 2018. URL http://proceedings.mlr.press/v80/haarnoja18b.html.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=JdTnc9gjVfJ.
- Playing atari games with deep reinforcement learning and human checkpoint replay. CoRR, abs/1607.05077, 2016. URL http://arxiv.org/abs/1607.05077.
- Barc: Backward reachability curriculum for robotic reinforcement learning. In International Conference on Robotics and Automation, ICRA 2019, Montreal, QC, Canada, May 20-24, 2019, pp. 15–21. IEEE, 2019. doi: 10.1109/ICRA.2019.8794206. URL https://doi.org/10.1109/ICRA.2019.8794206.
- Prioritized level replay. In Marina Meila and Tong Zhang (eds.), Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, pp. 4940–4950. PMLR, 2021. URL http://proceedings.mlr.press/v139/jiang21b.html.
- Offline reinforcement learning with implicit q-learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net, 2022. URL https://openreview.net/forum?id=68n2s9ZJWF8.
- Towards practical multi-object manipulation using relational reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020, pp. 4051–4058. IEEE, 2020. doi: 10.1109/ICRA40945.2020.9197468. URL https://doi.org/10.1109/ICRA40945.2020.9197468.
- Isaac gym: High performance gpu-based physics simulation for robot learning, 2021.
- What matters in learning from offline human demonstrations for robot manipulation. In Aleksandra Faust, David Hsu, and Gerhard Neumann (eds.), Conference on Robot Learning, 8-11 November 2021, London, UK, volume 164 of Proceedings of Machine Learning Research, pp. 1678–1690. PMLR, 2021. URL https://proceedings.mlr.press/v164/mandlekar22a.html.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE International Conference on Robotics and Automation, ICRA 2018, Brisbane, Australia, May 21-25, 2018, pp. 6292–6299. IEEE, 2018. doi: 10.1109/ICRA.2018.8463162. URL https://doi.org/10.1109/ICRA.2018.8463162.
- Cal-ql: Calibrated offline RL pre-training for efficient online fine-tuning. CoRR, abs/2303.05479, 2023. doi: 10.48550/arXiv.2303.05479. URL https://doi.org/10.48550/arXiv.2303.05479.
- Curriculum learning for reinforcement learning domains: A framework and survey. J. Mach. Learn. Res., 21:181:1–181:50, 2020. URL http://jmlr.org/papers/v21/20-212.html.
- Deepmimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph., 37(4):143, 2018. doi: 10.1145/3197517.3201311. URL https://doi.org/10.1145/3197517.3201311.
- Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. CoRR, abs/1910.00177, 2019. URL http://arxiv.org/abs/1910.00177.
- Data-efficient deep reinforcement learning for dexterous manipulation. CoRR, abs/1704.03073, 2017. URL http://arxiv.org/abs/1704.03073.
- Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations. In Proceedings of Robotics: Science and Systems (RSS), 2018.
- Backplay: ”man muss immer umkehren”. CoRR, abs/1807.06919, 2018. URL http://arxiv.org/abs/1807.06919.
- Learning montezuma’s revenge from a single demonstration. CoRR, abs/1812.03381, 2018. URL http://arxiv.org/abs/1812.03381.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017. URL http://arxiv.org/abs/1707.06347.
- Masked world models for visual control. In Karen Liu, Dana Kulic, and Jeffrey Ichnowski (eds.), Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, pp. 1332–1344. PMLR, 2022. URL https://proceedings.mlr.press/v205/seo23a.html.
- End-to-end robotic reinforcement learning without reward engineering. Robotics: Science and Systems, 2019.
- Jump-start reinforcement learning. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pp. 34556–34583. PMLR, 2023. URL https://proceedings.mlr.press/v202/uchendu23a.html.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on Robot Learning (CoRL), 2019. URL https://arxiv.org/abs/1910.10897.
- Robopianist: A benchmark for high-dimensional robot control. CoRR, abs/2304.04150, 2023. doi: 10.48550/arXiv.2304.04150. URL https://doi.org/10.48550/arXiv.2304.04150.
- Online decision transformer. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvári, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pp. 27042–27059. PMLR, 2022. URL https://proceedings.mlr.press/v162/zheng22c.html.
- Reinforcement and imitation learning for diverse visuomotor skills. In Hadas Kress-Gazit, Siddhartha S. Srinivasa, Tom Howard, and Nikolay Atanasov (eds.), Robotics: Science and Systems XIV, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, June 26-30, 2018, 2018. doi: 10.15607/RSS.2018.XIV.009. URL http://www.roboticsproceedings.org/rss14/p09.html.