Hierarchical Visual Policy Learning for Long-Horizon Robot Manipulation in Densely Cluttered Scenes (2312.02697v1)

Published 5 Dec 2023 in cs.RO

Abstract: In this work, we focus on addressing the long-horizon manipulation tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to select and instantiate three parameterized action primitives: push, pick, and place. We first train the pick and place options by behavior cloning (BC). Subsequently, we use hierarchical reinforcement learning (HRL) to train the high-level policy and push option. During HRL, we propose a Spatially Extended Q-update (SEQ) to augment the updates for the push option and a Two-Stage Update Scheme (TSUS) to alleviate the non-stationary transition problem in updating the high-level policy. We demonstrate that HCLM significantly outperforms baseline methods in terms of success rate and efficiency in diverse tasks. We also highlight our method's ability to generalize to more cluttered environments with more additional blocks.

References (41)

Summary

The paper proposes HCLM, a hierarchical policy that decomposes long-horizon manipulation tasks into manageable, vision-based subtasks.
It integrates behavior cloning for initial skill acquisition with hierarchical reinforcement learning to refine high-level decision making.
Experimental results demonstrate that HCLM outperforms baselines in diverse cluttered environments, highlighting its adaptability and efficiency.

Introduction

Robotic manipulation in densely cluttered environments is a challenging area of research with significant implications for real-world applications. Robots assisting in domestic or office settings must navigate through spaces filled with obstacles and require the ability to manipulate objects with precision over extended periods. A novel approach to this problem involves breaking down complex tasks into smaller, more manageable segments, which can be treated as subtasks each addressed by a specific action or skill.

Vision-Based Hierarchical Policy Learning

The paper presents a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM), aimed at solving the problem of long-horizon manipulation tasks amidst dense clutter. The core of the HCLM framework is a two-part policy structure, consisting of a high-level policy that decides on the manipulation primitive to be used (push, pick or place) and options that operate at a lower level to execute these decisions based on visual input.

Training Approach and Architectural Components

To train the pick and place options within the HCLM framework, behavior cloning (BC) from expert demonstrations is first used to overcome the usual exploration challenges. Afterwards, hierarchical reinforcement learning (HRL) is utilized to refine the high-level policy and the push option. A novel feature of the HRL training is the Spatially Extended Q-update (SEQ), which enhances the push option's updates. Additionally, a Two-Stage Update Scheme (TSUS) is introduced to address issues related to the high-level policy's non-stationary transitions. These components work together to efficiently execute complex manipulation tasks by integrating the actions of pushing, picking, and placing objects.

Performance and Adaptability Evaluation

The proposed HCLM framework was rigorously tested through a series of experiments using six different cluttered-scene manipulation tasks. It demonstrated significant performance improvements over several baselines in terms of success rate and efficiency. Moreover, HCLM proved to be adaptable to environments with varying levels of clutter, maintaining a high success rate even as the number of extraneous blocks increased. The policy's distinct components were also individually evaluated, underscoring their collective contribution to the overall effectiveness of HCLM.

Conclusion

The HCLM policy signifies a forward leap in robotic manipulation, particularly in unstructured and cluttered environments. By employing an amalgamation of behavior cloning for initial learning, followed by sophisticated hierarchical reinforcement learning, robots can resolve long-horizon tasks with remarkable skill and adaptability. Potential for future work includes extending this hierarchical policy framework by incorporating additional primitives and devising more flexible solutions to the non-stationary transition problem that plagues the high-level policy updates.

PDF Markdown