CRISP: Curriculum Inducing Primitive Informed Subgoal Prediction for Hierarchical Reinforcement Learning (2304.03535v5)
Abstract: Hierarchical reinforcement learning (HRL) is a promising approach that uses temporal abstraction to solve complex long horizon problems. However, simultaneously learning a hierarchy of policies is unstable as it is challenging to train higher-level policy when the lower-level primitive is non-stationary. In this paper, we present CRISP, a novel HRL algorithm that effectively generates a curriculum of achievable subgoals for evolving lower-level primitives using reinforcement learning and imitation learning. CRISP uses the lower level primitive to periodically perform data relabeling on a handful of expert demonstrations, using a novel primitive informed parsing (PIP) approach, thereby mitigating non-stationarity. Since our approach only assumes access to a handful of expert demonstrations, it is suitable for most robotic control tasks. Experimental evaluations on complex robotic maze navigation and robotic manipulation tasks demonstrate that inducing hierarchical curriculum learning significantly improves sample efficiency, and results in efficient goal conditioned policies for solving temporally extended tasks. Additionally, we perform real world robotic experiments on complex manipulation tasks and demonstrate that CRISP demonstrates impressive generalization in real world scenarios.
- Hindsight experience replay. In NIPS, 2017.
- The option-critic architecture. CoRR, abs/1609.05140, 2016.
- Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13:341–379, 2003.
- Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, page 41–48, New York, NY, USA, 2009. Association for Computing Machinery. ISBN 9781605585161. doi: 10.1145/1553374.1553380. URL https://doi.org/10.1145/1553374.1553380.
- Accelerating robotic reinforcement learning via parameterized action primitives. CoRR, abs/2110.15360, 2021. URL https://arxiv.org/abs/2110.15360.
- Feudal reinforcement learning. In Advances in Neural Information Processing Systems 5, [NIPS Conference], pages 271–278, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc. ISBN 1-55860-274-7.
- Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. CoRR, cs.LG/9905014, 1999. URL https://arxiv.org/abs/cs/9905014.
- Goal-conditioned imitation learning. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Leave no trace: Learning to reset for safe and autonomous reinforcement learning, 2017.
- Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pages 1515–1528. PMLR, 2018.
- Structure in the space of value functions. Machine Learning, 49(2-3):325–346, 2002.
- Accuracy-based curriculum learning in deep reinforcement learning. arXiv preprint arXiv:1806.09614, 2018.
- Multi-level discovery of deep options, 2017.
- D4RL: datasets for deep data-driven reinforcement learning. CoRR, abs/2004.07219, 2020. URL https://arxiv.org/abs/2004.07219.
- A divergence minimization perspective on imitation learning methods. In Conference on Robot Learning, pages 1259–1277. PMLR, 2020.
- Relay policy learning: Solving long-horizon tasks via imitation and reinforcement learning. CoRR, abs/1910.11956, 2019.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. CoRR, abs/1801.01290, 2018. URL http://arxiv.org/abs/1801.01290.
- When waiting is not an option : Learning options with a deliberation cost. CoRR, abs/1709.04571, 2017. URL http://arxiv.org/abs/1709.04571.
- Learning with options that terminate off-policy. CoRR, abs/1711.03817, 2017. URL http://arxiv.org/abs/1711.03817.
- The termination critic. CoRR, abs/1902.09996, 2019. URL http://arxiv.org/abs/1902.09996.
- Learning from demonstrations for real world reinforcement learning. CoRR, abs/1704.03732, 2017.
- Generative adversarial imitation learning. CoRR, abs/1606.03476, 2016. URL http://arxiv.org/abs/1606.03476.
- Leslie Pack Kaelbling. Learning to achieve goals. In IN PROC. OF IJCAI-93, pages 1094–1098. Morgan Kaufmann, 1993.
- Variational curriculum reinforcement learning for unsupervised discovery of skills. arXiv preprint arXiv:2310.19424, 2023.
- Adam: A method for stochastic optimization, 2014. URL http://arxiv.org/abs/1412.6980. cite arxiv:1412.6980Comment: Published as a conference paper at the 3rd International Conference for Learning Representations, San Diego, 2015.
- Compile: Compositional imitation learning and execution. In International Conference on Machine Learning, pages 3418–3428. PMLR, 2019.
- Learnings options end-to-end for continuous action tasks. CoRR, abs/1712.00004, 2017. URL http://arxiv.org/abs/1712.00004.
- Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. arXiv preprint arXiv:1809.02925, 2018.
- DDCO: discovery of deep continuous options forrobot learning from demonstrations. CoRR, abs/1710.05421, 2017a. URL http://arxiv.org/abs/1710.05421.
- Ddco: Discovery of deep continuous options for robot learning from demonstrations, 2017b.
- Swirl: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards. The International Journal of Robotics Research, 38(2-3):126–145, 2019. doi: 10.1177/0278364918784350. URL https://doi.org/10.1177/0278364918784350.
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. CoRR, abs/1604.06057, 2016.
- Mujoco haptix: A virtual reality system for hand manipulation. 11 2015. doi: 10.1109/HUMANOIDS.2015.7363441.
- Steven M. LaValle. Rapidly-exploring random trees: A new tool for path planning. 1998. URL https://api.semanticscholar.org/CorpusID:14744621.
- Steven M. Lavalle. Rapidly-exploring random trees: A new tool for path planning. Technical report, ., 1998.
- End-to-end training of deep visuomotor policies. CoRR, abs/1504.00702, 2015.
- Hierarchical actor-critic. CoRR, abs/1712.00948, 2017.
- Multi-class generative adversarial networks with the L2 loss function. CoRR, abs/1611.04076, 2016. URL http://arxiv.org/abs/1611.04076.
- Data-efficient hierarchical reinforcement learning. CoRR, abs/1805.08296, 2018.
- Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:1909.10618, 2019.
- Overcoming exploration in reinforcement learning with demonstrations. CoRR, abs/1709.10089, 2017.
- Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. CoRR, abs/2110.03655, 2021. URL https://arxiv.org/abs/2110.03655.
- Evolving curricula with regret-based environment design. In International Conference on Machine Learning, pages 17473–17498. PMLR, 2022.
- Reinforcement learning with hierarchies of machines. In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems, volume 10. MIT Press, 1998.
- Accelerating reinforcement learning with learned skill priors. CoRR, abs/2010.11944, 2020. URL https://arxiv.org/abs/2010.11944.
- Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In International Conference on Machine Learning, pages 7750–7761. PMLR, 2020.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
- Automated curricula through setter-solver interactions. arXiv preprint arXiv:1909.12892, 2019.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. CoRR, abs/1709.10087, 2017.
- Exploration via hindsight goal generation. Advances in Neural Information Processing Systems, 32, 2019.
- Universal value function approximators. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 1312–1320, Lille, France, 07–09 Jul 2015. PMLR. URL https://proceedings.mlr.press/v37/schaul15.html.
- T. Shankar and Abhinav Gupta. Learning robot skills with temporal variational inference. In ICML, 2020.
- Autonomous reinforcement learning via subgoal curricula. Advances in Neural Information Processing Systems, 34:18474–18486, 2021.
- TACO: Learning task decomposition via temporal alignment for control. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4654–4663. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/shiarlis18a.html.
- Parrot: Data-driven behavioral priors for reinforcement learning. CoRR, abs/2011.10024, 2020. URL https://arxiv.org/abs/2011.10024.
- Robust reinforcement learning via genetic curriculum. In 2022 International Conference on Robotics and Automation (ICRA), pages 5560–5566. IEEE, 2022.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1):181–211, 1999. ISSN 0004-3702. doi: https://doi.org/10.1016/S0004-3702(99)00052-1. URL https://www.sciencedirect.com/science/article/pii/S0004370299000521.
- Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pages 5026–5033. IEEE, 2012.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. CoRR, abs/1707.08817, 2017.
- Feudal networks for hierarchical reinforcement learning. CoRR, abs/1703.01161, 2017.
- Regularized hierarchical policies for compositional transfer in robotics. CoRR, abs/1906.11228, 2019. URL http://arxiv.org/abs/1906.11228.
- Data-efficient hindsight off-policy option learning. CoRR, abs/2007.15588, 2020. URL https://arxiv.org/abs/2007.15588.
- Utsav Singh (8 papers)
- Vinay P. Namboodiri (85 papers)