Hierarchical Planning for Long-Horizon Manipulation with Geometric and Symbolic Scene Graphs (2012.07277v2)

Published 14 Dec 2020 in cs.RO

Abstract: We present a visually grounded hierarchical planning algorithm for long-horizon manipulation tasks. Our algorithm offers a joint framework of neuro-symbolic task planning and low-level motion generation conditioned on the specified goal. At the core of our approach is a two-level scene graph representation, namely geometric scene graph and symbolic scene graph. This hierarchical representation serves as a structured, object-centric abstraction of manipulation scenes. Our model uses graph neural networks to process these scene graphs for predicting high-level task plans and low-level motions. We demonstrate that our method scales to long-horizon tasks and generalizes well to novel task goals. We validate our method in a kitchen storage task in both physical simulation and the real world. Our experiments show that our method achieved over 70% success rate and nearly 90% of subgoal completion rate on the real robot while being four orders of magnitude faster in computation time compared to standard search-based task-and-motion planner.

Authors (4)

Yifeng Zhu (21 papers)
Jonathan Tremblay (43 papers)
Stan Birchfield (64 papers)
Yuke Zhu (134 papers)

Citations (103)

View on Semantic Scholar

Summary

Hierarchical Planning for Long-Horizon Manipulation Using Geometric and Symbolic Scene Graphs

The paper introduces a novel approach to hierarchical planning for long-horizon manipulation tasks by leveraging both geometric and symbolic scene graphs. The authors aim to address the challenges faced by robots when tasked with long-term autonomy in everyday environments, particularly focusing on the complexity posed by the high-dimensional search space of continuous robot actions. Traditional task-and-motion planning (TAMP) approaches often rely on predefined symbolic rules and known dynamic models, making them less adaptable to dynamic sensory input and more resource-intensive due to costly search procedures. The introduction of hierarchical planning using neuro-symbolic frameworks provides a promising alternative.

Key Components

At the heart of the proposed method is a visually grounded hierarchical planning algorithm that directly works with visual observations to achieve both high-level task planning and low-level motion generation. This is achieved through the use of structured, object-centric scene graph representations, comprising two levels:

Geometric Scene Graphs: These graphs depict the 6-DoF poses of entities in the environment and their spatial relations. They are derived from RGB input using a 6-DoF pose estimation model and serve as the foundational geometric abstraction for manipulation tasks.
Symbolic Scene Graphs: Representing abstract semantic relations among entities, symbolic scene graphs structure the manipulation scene in terms of more abstract, logic-based inter-object relationships. They are derived from geometric scene graphs using a symbol mapping function.

The planning process operates on graph representations using Graph Neural Networks (GNNs) to handle both levels of abstraction efficiently, facilitating the prediction of high-level task plans and low-level motion execution based on grounded geometries.

Numerical Results and Evaluation

Experiments conducted in the paper demonstrate the potential of the hierarchical planning model, particularly in successfully scaling to long-horizon tasks while maintaining performance and efficiency. The algorithm achieves over a 70% success rate and nearly a 90% completion rate for subgoals using a real robot setup. Additionally, the computational time required by this framework is significantly lower—four orders of magnitude faster—compared to standard search-based task-and-motion planners, highlighting the advantage of leveraging learned models for efficient inference.

Implications and Future Work

The hierarchical planning approach using geometric and symbolic scene graphs presents several important implications for AI and robotics:

Improved Efficiency: The method showcases improved computational efficiency and ability to generalize to novel task goals, potentially transforming how long-horizon tasks are approached in dynamic and unstructured environments.
Integration of Learning and Planning: The approach bridges the gap between neural network-based learning and symbolic reasoning, paving the path for more advanced neuro-symbolic methods in robotic planning.
Robustness and Adaptability: By using scene graph structures, the method demonstrates robustness to unseen tasks and adaptability, which is critical for autonomy in increasingly complex environments.

Future research could explore learning the symbol mapping function automatically, enhancing the framework’s flexibility and its ability to generalize to diverse environments. Additionally, extending the representation to include articulated and deformable objects could further broaden its applicability in real-world scenarios.

Overall, the paper contributes a substantial step forward in the development of manipulation planning algorithms, leveraging the potential of hierarchical structures and neuro-symbolic integration to address the demands of long-horizon tasks in robotics.

Related Papers

Find Related Papers

YouTube

Show All Videos