Hierarchical Planning for Long-Horizon Manipulation Using Geometric and Symbolic Scene Graphs
The paper introduces a novel approach to hierarchical planning for long-horizon manipulation tasks by leveraging both geometric and symbolic scene graphs. The authors aim to address the challenges faced by robots when tasked with long-term autonomy in everyday environments, particularly focusing on the complexity posed by the high-dimensional search space of continuous robot actions. Traditional task-and-motion planning (TAMP) approaches often rely on predefined symbolic rules and known dynamic models, making them less adaptable to dynamic sensory input and more resource-intensive due to costly search procedures. The introduction of hierarchical planning using neuro-symbolic frameworks provides a promising alternative.
Key Components
At the heart of the proposed method is a visually grounded hierarchical planning algorithm that directly works with visual observations to achieve both high-level task planning and low-level motion generation. This is achieved through the use of structured, object-centric scene graph representations, comprising two levels:
- Geometric Scene Graphs: These graphs depict the 6-DoF poses of entities in the environment and their spatial relations. They are derived from RGB input using a 6-DoF pose estimation model and serve as the foundational geometric abstraction for manipulation tasks.
- Symbolic Scene Graphs: Representing abstract semantic relations among entities, symbolic scene graphs structure the manipulation scene in terms of more abstract, logic-based inter-object relationships. They are derived from geometric scene graphs using a symbol mapping function.
The planning process operates on graph representations using Graph Neural Networks (GNNs) to handle both levels of abstraction efficiently, facilitating the prediction of high-level task plans and low-level motion execution based on grounded geometries.
Numerical Results and Evaluation
Experiments conducted in the paper demonstrate the potential of the hierarchical planning model, particularly in successfully scaling to long-horizon tasks while maintaining performance and efficiency. The algorithm achieves over a 70% success rate and nearly a 90% completion rate for subgoals using a real robot setup. Additionally, the computational time required by this framework is significantly lower—four orders of magnitude faster—compared to standard search-based task-and-motion planners, highlighting the advantage of leveraging learned models for efficient inference.
Implications and Future Work
The hierarchical planning approach using geometric and symbolic scene graphs presents several important implications for AI and robotics:
- Improved Efficiency: The method showcases improved computational efficiency and ability to generalize to novel task goals, potentially transforming how long-horizon tasks are approached in dynamic and unstructured environments.
- Integration of Learning and Planning: The approach bridges the gap between neural network-based learning and symbolic reasoning, paving the path for more advanced neuro-symbolic methods in robotic planning.
- Robustness and Adaptability: By using scene graph structures, the method demonstrates robustness to unseen tasks and adaptability, which is critical for autonomy in increasingly complex environments.
Future research could explore learning the symbol mapping function automatically, enhancing the framework’s flexibility and its ability to generalize to diverse environments. Additionally, extending the representation to include articulated and deformable objects could further broaden its applicability in real-world scenarios.
Overall, the paper contributes a substantial step forward in the development of manipulation planning algorithms, leveraging the potential of hierarchical structures and neuro-symbolic integration to address the demands of long-horizon tasks in robotics.