Prime the search: Using large language models for guiding geometric task and motion planning by warm-starting tree search (2506.07062v1)

Published 8 Jun 2025 in cs.RO and cs.AI

Abstract: The problem of relocating a set of objects to designated areas amidst movable obstacles can be framed as a Geometric Task and Motion Planning (G-TAMP) problem, a subclass of task and motion planning (TAMP). Traditional approaches to G-TAMP have relied either on domain-independent heuristics or on learning from planning experience to guide the search, both of which typically demand significant computational resources or data. In contrast, humans often use common sense to intuitively decide which objects to manipulate in G-TAMP problems. Inspired by this, we propose leveraging LLMs, which have common sense knowledge acquired from internet-scale data, to guide task planning in G-TAMP problems. To enable LLMs to perform geometric reasoning, we design a predicate-based prompt that encodes geometric information derived from a motion planning algorithm. We then query the LLM to generate a task plan, which is then used to search for a feasible set of continuous parameters. Since LLMs are prone to mistakes, instead of committing to LLM's outputs, we extend Monte Carlo Tree Search (MCTS) to a hybrid action space and use the LLM to guide the search. Unlike the previous approach that calls an LLM at every node and incurs high computational costs, we use it to warm-start the MCTS with the nodes explored in completing the LLM's task plan. On six different G-TAMP problems, we show our method outperforms previous LLM planners and pure search algorithms. Code can be found at: https://github.com/iMSquared/prime-the-search

PDF Abstract

Leveraging LLMs for Enhanced Geometric Task and Motion Planning

The paper "Prime the search: Using LLMs for guiding geometric task and motion planning by warm-starting tree search," published in The International Journal of Robotics Research, introduces a novel approach to improve Geometric Task and Motion Planning (G-TAMP) by integrating LLMs with Monte Carlo Tree Search (MCTS). Recognizing the challenges associated with traditional task and motion planning methods, the authors propose leveraging LLMs to imbue robots with common sense knowledge, thereby enhancing their ability to handle complex planning tasks efficiently.

Overview of G-TAMP Challenges

Geometric Task and Motion Planning involves intricate problems where robots are required to move objects amidst obstacles to achieve a desired configuration. Traditional approaches range from pure-planning algorithms, relying heavily on domain-independent heuristics and computational optimization, to learning-based methods that demand extensive planning experience for effective guidance. These methods, while beneficial, often struggle with efficiently utilizing domain-specific information to resolve infeasibility issues and consume significant computational resources.

Proposal for LLM Integration

The key insight from this paper is the utilization of LLMs, pre-trained on internet-scale data, to mimic the common sense reasoning akin to human intuition. The authors propose a system called Search Tree augmented by LLM (STaLM), where LLMs guide MCTS by warm-starting the tree search using task plans generated from LLM prompts. This approach capitalizes on LLMs to provide initial task plans that are refined through MCTS, minimizing LLM computational costs while harnessing the common sense embedded within LLMs to avoid exhaustive motion planning calls.

Methodology

The design involves predicate-based prompts encoding geometric information computed via motion planning algorithms. These prompts enable LLMs to produce task plans that inform MCTS. The paper extends MCTS to manage hybrid action spaces, consisting of discrete task actions and continuous parameters, allowing it to concretize LLM-generated plans and further refine them if continuous parameters are unfeasible.

Numerical Results and Computational Efficiency

Across six diverse G-TAMP problems, the paper's approach demonstrates superior performance in planning speed and success rate compared to existing algorithms. STaLM significantly outperforms both pure search and learning-guided methods, illustrating its computational efficiency by reducing the number of LLM queries without compromising planning accuracy.

Implications and Future Directions

This research opens new avenues for integrating LLMs into robotic systems, providing a scalable solution to complex planning problems by leveraging pre-existing common knowledge. The integration of LLMs with MCTS paves the path for practical applications in dynamic environments, potentially enhancing the adaptability and intelligence of autonomous systems. Future developments might explore the incorporation of multi-modal foundation models to address limitations in geometry conveyance and state estimation under uncertainty, a critical step for real-world deployment.

Conclusion

The paper presents a compelling argument and empirical evidence for a paradigm shift in robotic planning by employing LLMs. Its novel approach to warm-starting tree searches with LLM-generated knowledge signifies a meaningful advance in task and motion planning, offering a robust solution that promises to improve the efficiency and effectiveness of future robotic systems. This work exemplifies the growing intersection of AI and robotics, demonstrating the potential of LLMs to enrich robotic intelligence and autonomy.

PDF Markdown Bookmark Chat (Pro)

Authors (4)

Dongryung Lee (2 papers)
Sejune Joo (4 papers)
Kimin Lee (69 papers)
Beomjoon Kim (21 papers)