SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation (2410.18065v1)
Abstract: Robot learning has proven to be a general and effective technique for programming manipulators. Imitation learning is able to teach robots solely from human demonstrations but is bottlenecked by the capabilities of the demonstrations. Reinforcement learning uses exploration to discover better behaviors; however, the space of possible improvements can be too large to start from scratch. And for both techniques, the learning difficulty increases proportional to the length of the manipulation task. Accounting for this, we propose SPIRE, a system that first uses Task and Motion Planning (TAMP) to decompose tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths. We develop novel strategies to train learning agents when deployed in the context of a planning system. We evaluate SPIRE on a suite of long-horizon and contact-rich robot manipulation problems. We find that SPIRE outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance, is 6 times more data efficient in the number of human demonstrations needed to train proficient agents, and learns to complete tasks nearly twice as efficiently. View https://sites.google.com/view/spire-corl-2024 for more details.
- Scalable deep reinforcement learning for vision-based robotic manipulation. In Conference on robot learning, pages 651–673. PMLR, 2018.
- Mt-opt: Continuous multi-task robotic reinforcement learning at scale. arXiv preprint arXiv:2104.08212, 2021.
- Industreal: Transferring contact-rich assembly tasks from simulation to reality. arXiv preprint arXiv:2305.17110, 2023.
- Dextreme: Transfer of agile in-hand manipulation from simulation to reality. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5977–5984. IEEE, 2023.
- Reward is enough. Artificial Intelligence, 299:103535, 2021.
- Algorithms for inverse reinforcement learning. In Icml, volume 1, page 2, 2000.
- R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3:9–44, 1988.
- Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. arXiv preprint arXiv:1710.04615, 2017.
- What matters in learning from offline human demonstrations for robot manipulation. In Conference on Robot Learning (CoRL), 2021.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
- Bc-z: Zero-shot task generalization with robotic imitation learning. In Conference on Robot Learning, pages 991–1002. PMLR, 2022.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Human-in-the-loop task and motion planning for imitation learning. In 7th Annual Conference on Robot Learning, 2023.
- Plan-seq-learn: Language model guided rl for solving long horizon robotics tasks. 2024.
- Integrated task and motion planning. Annual review of control, robotics, and autonomous systems, 4:265–293, 2021.
- P. Dayan and G. E. Hinton. Feudal reinforcement learning. In NIPS, 1992.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artif. Intell., 112:181–211, 1999.
- The option-critic architecture. AAAI, 2017.
- Feudal networks for hierarchical reinforcement learning. ICML, 2017.
- Data-efficient hierarchical reinforcement learning. In NeurIPS, 2018.
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In NIPS, 2016.
- Sdrl: Interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In AAAI, 2019.
- J. Rafati and D. C. Noelle. Learning representations in model-free hierarchical reinforcement learning. AAAI, 2019.
- Meta reinforcement learning with autonomous inference of subtask dependencies. ICLR, 2020.
- Possibility before utility: Learning and using hierarchical affordances. ICLR, 2022.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Progprompt: program generation for situated robot task planning using large language models. Autonomous Robots, 47(8):999–1012, 2023.
- Do as i can and not as i say: Grounding language in robotic affordances. In arXiv preprint arXiv:2204.01691, 2022.
- Large language models for chemistry robotics. Autonomous Robots, 47(8):1057–1086, 2023.
- RePLan: Robotic Replanning with Perception and Language Models. arXiv e-prints, art. arXiv:2401.04157, Jan. 2024. doi:10.48550/arXiv.2401.04157.
- D. A. Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
- Implicit behavioral cloning. In Conference on Robot Learning, pages 158–168. PMLR, 2022.
- Strictly batch imitation learning by energy-based distribution matching. Advances in Neural Information Processing Systems, 33:7354–7365, 2020.
- Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
- 3d diffusion policy. arXiv preprint arXiv:2403.03954, 2024.
- Playfusion: Skill acquisition via diffusion from language-annotated play. In Conference on Robot Learning, pages 2012–2029. PMLR, 2023.
- Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation. In 7th Annual Conference on Robot Learning, 2023.
- Planning with diffusion for flexible behavior synthesis. arXiv preprint arXiv:2205.09991, 2022.
- Behavior transformers: Cloning k𝑘kitalic_k modes with one stone. Advances in neural information processing systems, 35:22955–22968, 2022.
- From play to policy: Conditional behavior generation from uncurated robot data. arXiv preprint arXiv:2210.10047, 2022.
- Behavior generation with latent actions. arXiv preprint arXiv:2403.03181, 2024.
- Causal imitation learning under temporally correlated noise, 2022.
- Learning fine-grained bimanual manipulation with low-cost hardware, 2023.
- Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. arXiv preprint arXiv:2401.02117, 2024.
- J. Ho and S. Ermon. Generative adversarial imitation learning. In Neural Information Processing Systems, 2016.
- Model-free imitation learning with policy optimization. ArXiv, abs/1605.08478, 2016.
- A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models. ArXiv, abs/1611.03852, 2016.
- Learning from demonstrations for real world reinforcement learning. ArXiv, abs/1704.03732, 2017.
- Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards. ArXiv, abs/1707.08817, 2017.
- Integrating behavior cloning and reinforcement learning for improved performance in dense and sparse reward environments. In Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems, pages 465–473, 2020.
- Teach a robot to fish: Versatile imitation from one minute of demonstrations. arXiv preprint arXiv:2303.01497, 2023.
- Overcoming exploration in reinforcement learning with demonstrations. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 6292–6299, 2017.
- Residual reinforcement learning for robot control. 2019 International Conference on Robotics and Automation (ICRA), pages 6023–6029, 2018.
- Residual policy learning. ArXiv, abs/1812.06298, 2018.
- L. P. Kaelbling and T. Lozano-Pérez. Hierarchical task and motion planning in the now. In ICRA, 2011.
- Differentiable physics and stable modes for tool-use and manipulation planning. 2018.
- PDDLStream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 30, pages 440–448, 2020.
- Incentivizing exploration in reinforcement learning with deep predictive models. ArXiv, abs/1507.00814, 2015.
- #exploration: A study of count-based exploration for deep reinforcement learning. NIPS, 2017.
- Never give up: Learning directed exploration strategies. ICLR, 2020.
- First return then explore. Nature, 590 7847:580–586, 2021.
- Loss of plasticity in continual deep reinforcement learning. ArXiv, abs/2303.07507, 2023.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.