Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes (2312.02697v1)

Published 5 Dec 2023 in cs.RO

Abstract: In this work, we focus on addressing the long-horizon manipulation tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to select and instantiate three parameterized action primitives: push, pick, and place. We first train the pick and place options by behavior cloning (BC). Subsequently, we use hierarchical reinforcement learning (HRL) to train the high-level policy and push option. During HRL, we propose a Spatially Extended Q-update (SEQ) to augment the updates for the push option and a Two-Stage Update Scheme (TSUS) to alleviate the non-stationary transition problem in updating the high-level policy. We demonstrate that HCLM significantly outperforms baseline methods in terms of success rate and efficiency in diverse tasks. We also highlight our method's ability to generalize to more cluttered environments with more additional blocks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. C. Li, R. Zhang, J. Wong, C. Gokmen, S. Srivastava, R. Martín-Martín, C. Wang, G. Levine, M. Lingelbach, J. Sun, M. Anvari, M. Hwang, M. Sharma, A. Aydin, D. Bansal, S. Hunter, K. Kim, A. Lou, C. R. Matthews, I. Villa-Renteria, J. H. Tang, C. Tang, F. Xia, S. Savarese, H. Gweon, K. Liu, J. Wu, and L. Fei-Fei, “BEHAVIOR-1K: A benchmark for embodied AI with 1, 000 everyday activities and realistic simulation,” in Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, ser. Proceedings of Machine Learning Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol. 205.   PMLR, 2022, pp. 80–93.
  2. R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artif. Intell., vol. 112, no. 1-2, pp. 181–211, 1999.
  3. A. Zeng, P. Florence, J. Tompson, S. Welker, J. Chien, M. Attarian, T. Armstrong, I. Krasin, D. Duong, V. Sindhwani, and J. Lee, “Transporter networks: Rearranging the visual world for robotic manipulation,” in 4th Conference on Robot Learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA, ser. Proceedings of Machine Learning Research, J. Kober, F. Ramos, and C. J. Tomlin, Eds., vol. 155.   PMLR, 2020, pp. 726–747.
  4. M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning, 8-11 November 2021, London, UK, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol. 164.   PMLR, 2021, pp. 894–906.
  5. S. Nasiriany, H. Liu, and Y. Zhu, “Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks,” in 2022 International Conference on Robotics and Automation, ICRA 2022, Philadelphia, PA, USA, May 23-27, 2022.   IEEE, 2022, pp. 7477–7484.
  6. R. Strudel, A. Pashevich, I. Kalevatykh, I. Laptev, J. Sivic, and C. Schmid, “Learning to combine primitive skills: A step towards versatile robotic manipulation §,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020.   IEEE, 2020, pp. 4637–4643.
  7. S. Kumra, S. Joshi, and F. Sahin, “Learning robotic manipulation tasks via task progress based gaussian reward and loss adjusted exploration,” IEEE Robotics Autom. Lett., vol. 7, no. 1, pp. 534–541, 2022.
  8. S. Kumra, S. Joshi, and F. Sahin, “Learning multi-step robotic manipulation policies from visual observation of scene and q-value predictions of previous action,” in 2022 International Conference on Robotics and Automation, ICRA 2022, Philadelphia, PA, USA, May 23-27, 2022.   IEEE, 2022, pp. 8245–8251.
  9. P. Abolghasemi and L. Bölöni, “Accept synthetic objects as real: End-to-end training of attentive deep visuomotor policies for manipulation in clutter,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020.   IEEE, 2020, pp. 6506–6512.
  10. Y. Deng, X. Guo, Y. Wei, K. Lu, B. Fang, D. Guo, H. Liu, and F. Sun, “Deep reinforcement learning for robotic pushing and picking in cluttered environment,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019, Macau, SAR, China, November 3-8, 2019.   IEEE, 2019, pp. 619–626.
  11. B. Tang, M. Corsaro, G. Konidaris, S. Nikolaidis, and S. Tellex, “Learning collaborative pushing and grasping policies in dense clutter,” in IEEE International Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021.   IEEE, 2021, pp. 6177–6184.
  12. R. Papallas and M. R. Dogar, “Non-prehensile manipulation in clutter with human-in-the-loop,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020.   IEEE, 2020, pp. 6723–6729.
  13. A. Zeng, S. Song, S. Welker, J. Lee, A. Rodriguez, and T. A. Funkhouser, “Learning synergies between pushing and grasping with self-supervised deep reinforcement learning,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018, Madrid, Spain, October 1-5, 2018.   IEEE, 2018, pp. 4238–4245.
  14. L. Berscheid, P. Meißner, and T. Kröger, “Self-supervised learning for precise pick-and-place without object model,” IEEE Robotics Autom. Lett., vol. 5, no. 3, pp. 4828–4835, 2020.
  15. Y. Su, L. Yu, H. Wang, S. Lu, P. Ser, W. Hsu, W. Lai, B. Xie, H. Huang, T. Lee, and H. Chen, “Pose-aware placement of objects with semantic labels - brandname-based affordance prediction and cooperative dual-arm active manipulation,” in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2019, Macau, SAR, China, November 3-8, 2019.   IEEE, 2019, pp. 4760–4767.
  16. W. Zhao and W. Chen, “Hierarchical POMDP planning for object manipulation in clutter,” Robotics Auton. Syst., vol. 139, p. 103736, 2021.
  17. S. Cheong, B. Y. Cho, J. Lee, C. Kim, and C. Nam, “Where to relocate?: Object rearrangement inside cluttered and confined environments for robotic manipulation,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020.   IEEE, 2020, pp. 7791–7797.
  18. C. Nam, J. Lee, S. Cheong, B. Y. Cho, and C. Kim, “Fast and resilient manipulation planning for target retrieval in clutter,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020.   IEEE, 2020, pp. 3777–3783.
  19. M. B. Imtiaz, Y. Qiao, and B. Lee, “Prehensile and non-prehensile robotic pick-and-place of objects in clutter using deep reinforcement learning,” Sensors, vol. 23, no. 3, p. 1513, 2023.
  20. B. Tang and G. S. Sukhatme, “Selective object rearrangement in clutter,” in Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, ser. Proceedings of Machine Learning Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol. 205.   PMLR, 2022, pp. 1001–1010.
  21. G. Liu, J. D. Winter, D. Steckelmacher, R. K. Hota, A. Nowé, and B. Vanderborght, “Synergistic task and motion planning with reinforcement learning-based non-prehensile actions,” IEEE Robotics Autom. Lett., vol. 8, no. 5, pp. 2764–2771, 2023.
  22. R. Li, A. Jabri, T. Darrell, and P. Agrawal, “Towards practical multi-object manipulation using relational reinforcement learning,” in 2020 IEEE International Conference on Robotics and Automation, ICRA 2020, Paris, France, May 31 - August 31, 2020.   IEEE, 2020, pp. 4051–4058.
  23. J. Lee, C. Nam, J. Park, and C. Kim, “Tree search-based task and motion planning with prehensile and non-prehensile manipulation for obstacle rearrangement in clutter,” in IEEE International Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021.   IEEE, 2021, pp. 8516–8522.
  24. X. Ying and M. C. Chuah, “Uctnet: Uncertainty-aware cross-modal transformer network for indoor RGB-D semantic segmentation,” in Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XXX, ser. Lecture Notes in Computer Science, S. Avidan, G. J. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., vol. 13690.   Springer, 2022, pp. 20–37.
  25. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015.   IEEE Computer Society, 2015, pp. 3431–3440.
  26. H. Huang, D. Wang, R. Walters, and R. Platt, “Equivariant transporter network,” in Robotics: Science and Systems XVIII, New York City, NY, USA, June 27 - July 1, 2022, K. Hauser, D. A. Shell, and S. Huang, Eds., 2022.
  27. D. Seita, P. Florence, J. Tompson, E. Coumans, V. Sindhwani, K. Goldberg, and A. Zeng, “Learning to rearrange deformable cables, fabrics, and bags with goal-conditioned transporter networks,” in IEEE International Conference on Robotics and Automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021.   IEEE, 2021, pp. 4568–4575.
  28. M. H. Lim, A. Zeng, B. Ichter, M. Bandari, E. Coumans, C. J. Tomlin, S. Schaal, and A. Faust, “Multi-task learning with sequence-conditioned transporter networks,” in 2022 International Conference on Robotics and Automation, ICRA 2022, Philadelphia, PA, USA, May 23-27, 2022.   IEEE, 2022, pp. 2489–2496.
  29. G. Sóti, X. Huang, C. Wurll, and B. Hein, “Train what you know - precise pick-and-place with transporter networks,” in IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023.   IEEE, 2023, pp. 5814–5820.
  30. M. Eppe, C. Gumbsch, M. Kerzel, P. D. H. Nguyen, M. V. Butz, and S. Wermter, “Intelligent problem-solving as integrated hierarchical reinforcement learning,” Nat. Mach. Intell., vol. 4, no. 1, pp. 11–20, 2022.
  31. M. M. Botvinick, Y. Niv, and A. G. Barto, “Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective,” cognition, vol. 113, no. 3, pp. 262–280, 2009.
  32. Y. Lee, J. Yang, and J. J. Lim, “Learning to coordinate manipulation skills via skill behavior diversification,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.   OpenReview.net, 2020.
  33. J. Lee and J. Choi, “Hierarchical primitive composition: Simultaneous activation of skills with inconsistent action dimensions in multiple hierarchies,” IEEE Robotics Autom. Lett., vol. 7, no. 3, pp. 7581–7588, 2022.
  34. X. Yang, Z. Ji, J. Wu, Y. Lai, C. Wei, G. Liu, and R. Setchi, “Hierarchical reinforcement learning with universal policies for multistep robotic manipulation,” IEEE Trans. Neural Networks Learn. Syst., vol. 33, no. 9, pp. 4727–4741, 2022.
  35. R. Wang, R. Yu, B. An, and Z. Rabinovich, “I22{{}^{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPThrl: Interactive influence-based hierarchical reinforcement learning,” in Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, C. Bessiere, Ed.   ijcai.org, 2020, pp. 3131–3138.
  36. A. Levy, G. D. Konidaris, R. P. Jr., and K. Saenko, “Learning multi-level hierarchies with hindsight,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.   OpenReview.net, 2019.
  37. O. Nachum, S. Gu, H. Lee, and S. Levine, “Data-efficient hierarchical reinforcement learning,” in Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., 2018, pp. 3307–3317.
  38. W. Masson, P. Ranchod, and G. D. Konidaris, “Reinforcement learning with parameterized actions,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, D. Schuurmans and M. P. Wellman, Eds.   AAAI Press, 2016, pp. 1934–1940.
  39. T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016.
  40. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2021.
  41. S. Kumra, S. Joshi, and F. Sahin, “Antipodal robotic grasping using generative residual convolutional neural network,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020, Las Vegas, NV, USA, October 24, 2020 - January 24, 2021.   IEEE, 2020, pp. 9626–9633.

Summary

  • The paper proposes HCLM, a hierarchical policy that decomposes long-horizon manipulation tasks into manageable, vision-based subtasks.
  • It integrates behavior cloning for initial skill acquisition with hierarchical reinforcement learning to refine high-level decision making.
  • Experimental results demonstrate that HCLM outperforms baselines in diverse cluttered environments, highlighting its adaptability and efficiency.

Introduction

Robotic manipulation in densely cluttered environments is a challenging area of research with significant implications for real-world applications. Robots assisting in domestic or office settings must navigate through spaces filled with obstacles and require the ability to manipulate objects with precision over extended periods. A novel approach to this problem involves breaking down complex tasks into smaller, more manageable segments, which can be treated as subtasks each addressed by a specific action or skill.

Vision-Based Hierarchical Policy Learning

The paper presents a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM), aimed at solving the problem of long-horizon manipulation tasks amidst dense clutter. The core of the HCLM framework is a two-part policy structure, consisting of a high-level policy that decides on the manipulation primitive to be used (push, pick or place) and options that operate at a lower level to execute these decisions based on visual input.

Training Approach and Architectural Components

To train the pick and place options within the HCLM framework, behavior cloning (BC) from expert demonstrations is first used to overcome the usual exploration challenges. Afterwards, hierarchical reinforcement learning (HRL) is utilized to refine the high-level policy and the push option. A novel feature of the HRL training is the Spatially Extended Q-update (SEQ), which enhances the push option's updates. Additionally, a Two-Stage Update Scheme (TSUS) is introduced to address issues related to the high-level policy's non-stationary transitions. These components work together to efficiently execute complex manipulation tasks by integrating the actions of pushing, picking, and placing objects.

Performance and Adaptability Evaluation

The proposed HCLM framework was rigorously tested through a series of experiments using six different cluttered-scene manipulation tasks. It demonstrated significant performance improvements over several baselines in terms of success rate and efficiency. Moreover, HCLM proved to be adaptable to environments with varying levels of clutter, maintaining a high success rate even as the number of extraneous blocks increased. The policy's distinct components were also individually evaluated, underscoring their collective contribution to the overall effectiveness of HCLM.

Conclusion

The HCLM policy signifies a forward leap in robotic manipulation, particularly in unstructured and cluttered environments. By employing an amalgamation of behavior cloning for initial learning, followed by sophisticated hierarchical reinforcement learning, robots can resolve long-horizon tasks with remarkable skill and adaptability. Potential for future work includes extending this hierarchical policy framework by incorporating additional primitives and devising more flexible solutions to the non-stationary transition problem that plagues the high-level policy updates.