Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Offline Imitation Learning Through Graph Search and Retrieval (2407.15403v1)

Published 22 Jul 2024 in cs.RO, cs.AI, and cs.LG

Abstract: Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills. Nevertheless, many real-world manipulation tasks involve precise and dexterous robot-object interactions, which make it difficult for humans to collect high-quality expert demonstrations. As a result, a robot has to learn skills from suboptimal demonstrations and unstructured interactions, which remains a key challenge. Existing works typically use offline deep reinforcement learning (RL) to solve this challenge, but in practice these algorithms are unstable and fragile due to the deadly triad issue. To overcome this problem, we propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval. We first use pretrained representation to organize the interaction experience into a graph and perform a graph search to calculate the values of different behaviors. Then, we apply a retrieval-based procedure to identify the best behavior (actions) on each state and use behavior cloning to learn that behavior. We evaluate our method in both simulation and real-world robotic manipulation tasks with complex visual inputs, covering various precise and dexterous manipulation skills with objects of different physical properties. GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines. Our project page is at https://zhaohengyin.github.io/gsr.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Imitation learning by estimating expertise of demonstrators. In International Conference on Machine Learning, pages 1732–1748. PMLR, 2022.
  2. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
  3. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  4. Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In International conference on machine learning, pages 783–792. PMLR, 2019.
  5. Distance minimization for reward learning from scored trajectories. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, 2016.
  6. Learning from suboptimal demonstration via self-supervised reward regression. In Conference on robot learning, pages 1262–1277. PMLR, 2021a.
  7. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021b.
  8. Diffusion policy: Visuomotor policy learning via action diffusion. Robotics: Science and Systems, 2023.
  9. Robust learning from demonstrations with mixed qualities using leveraged gaussian processes. IEEE Transactions on Robotics, 35(3):564–576, 2019.
  10. Locally weighted regression: an approach to regression analysis by local fitting. Journal of the American statistical association, 83(403):596–610, 1988.
  11. On the effectiveness of retrieval, alignment, and replay in manipulation. IEEE Robotics and Automation Letters, 2024.
  12. Behavior retrieval: Few-shot imitation learning by querying unlabeled datasets. Robotics: Science and Systems, 2023.
  13. Sparse graphical memory for robust planning. Advances in Neural Information Processing Systems, 33:5251–5262, 2020.
  14. Search on the replay buffer: Bridging planning and reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  15. Deep spatial autoencoders for visuomotor learning. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 512–519. IEEE, 2016.
  16. A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
  17. Bootstrap your own latent-a new approach to self-supervised learning. Advances in neural information processing systems, 33:21271–21284, 2020.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  19. Few-shot preference learning for human-in-the-loop rl. In Conference on Robot Learning, pages 2014–2025. PMLR, 2023.
  20. Furniturebench: Reproducible real-world benchmark for long-horizon complex manipulation. arXiv preprint arXiv:2305.12821, 2023.
  21. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  22. Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
  23. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  24. Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
  25. Learning to discern: Imitating heterogeneous human demonstrations with preference and representation learning. In Conference on Robot Learning, pages 1437–1449. PMLR, 2023.
  26. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019.
  27. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  28. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  29. Where are we in the search for an artificial visual cortex for embodied intelligence? Advances in Neural Information Processing Systems, 36, 2024.
  30. What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021.
  31. Learning multimodal rewards from rankings. In Conference on Robot Learning, pages 342–352. PMLR, 2022.
  32. Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
  33. R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
  34. The surprising effectiveness of representation learning for visual imitation. arXiv preprint arXiv:2112.01511, 2021.
  35. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning. arXiv preprint arXiv:1910.00177, 2019.
  36. Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
  37. Semi-parametric topological memory for navigation. arXiv preprint arXiv:1803.00653, 2018.
  38. Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pages 785–799. PMLR, 2023.
  39. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  40. Reinforcement learning: An introduction. MIT press, 2018.
  41. Gello: A general, low-cost, and intuitive teleoperation framework for robot manipulators. arXiv preprint arXiv:2309.13037, 2023.
  42. Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361, 2019.
  43. Discriminator-weighted offline imitation learning from suboptimal demonstrations. In International Conference on Machine Learning, pages 24725–24742. PMLR, 2022.
  44. World model as a graph: Learning latent landmarks for planning. In International Conference on Machine Learning, pages 12611–12620. PMLR, 2021a.
  45. Breaking the deadly triad with a target network. In International Conference on Machine Learning, pages 12621–12631. PMLR, 2021b.
  46. Confidence-aware imitation learning from demonstrations with varying optimality. Advances in Neural Information Processing Systems, 34:12340–12350, 2021c.
  47. Discriminator-guided model-based offline imitation learning. In Conference on Robot Learning, pages 1266–1276. PMLR, 2023.
  48. Learning fine-grained bimanual manipulation with low-cost hardware. arXiv preprint arXiv:2304.13705, 2023.
  49. Value memory graph: A graph-structured world model for offline reinforcement learning. arXiv preprint arXiv:2206.04384, 2022.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com