Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions (2405.04215v1)

Published 7 May 2024 in cs.AI

Abstract: Today's classical planners are powerful, but modeling input tasks in formats such as PDDL is tedious and error-prone. In contrast, planning with LLMs allows for almost any input text, but offers no guarantees on plan quality or even soundness. In an attempt to merge the best of these two approaches, some work has begun to use LLMs to automate parts of the PDDL creation process. However, these methods still require various degrees of expert input. We present NL2Plan, the first domain-agnostic offline LLM-driven planning system. NL2Plan uses an LLM to incrementally extract the necessary information from a short text prompt before creating a complete PDDL description of both the domain and the problem, which is finally solved by a classical planner. We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks - a clear improvement over a plain chain-of-thought reasoning LLM approach, which only solves 2 tasks. Moreover, in two out of the five failure cases, instead of returning an invalid plan, NL2Plan reports that it failed to solve the task. In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results, such as the PDDL representation, increasing explainability and making it an assistive tool for PDDL creation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. TIC: Translate-Infer-Compile for accurate ’text to plan’ using LLMs and logical intermediate representations. arXiv:2402.06608.
  2. Language Models are Few-Shot Learners. In Proc. NeurIPS, 1877–1901.
  3. PARIS: Planning Algorithms for Reconfiguring Independent Sets. In Proc. ECAI 2023, 453–460.
  4. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. arXiv:2205.05718.
  5. Dynamic Planning with a LLM. arXiv:2308.06391.
  6. PDDL2.1: An Extension to PDDL for Expressing Temporal Planning Domains. JAIR, 20: 61–124.
  7. Automated Planning: Theory and Practice. Morgan Kaufmann.
  8. Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning. arXiv:2305.14909.
  9. Extracting Planning Domains from Execution Traces: a Progress Report. In ICAPS 2023 Workshop on Knowledge Engineering for Planning and Scheduling (KEPS).
  10. Reasoning with Language Model is Planning with World Model. arXiv:2305.14992.
  11. Helmert, M. 2006. The Fast Downward Planning System. JAIR, 26: 191–246.
  12. Towards Reasoning in Large Language Models: A Survey. arXiv:2212.10403.
  13. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. arXiv preprint arXiv:2201.07207.
  14. Learning a Planning Domain Model From Natural Language Process Manuals. IEEE Access, 8: 143219–143232.
  15. Large Language Models are Zero-Shot Reasoners. In Koyejo, S.; Mohamed, S.; Agarwal, A.; Belgrave, D.; Cho, K.; and Oh, A., eds., Advances in Neural Information Processing Systems, volume 35, 22199–22213. Curran Associates, Inc.
  16. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv:2304.11477.
  17. Faithful Chain-of-Thought Reasoning. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 305–329. Association for Computational Linguistics.
  18. Converting semantic web services into formal planning domain descriptions to enable manufacturing process planning and scheduling in industry 4.0. Engineering Applications of Artificial Intelligence, 126: 106727.
  19. PDDL – The Planning Domain Definition Language – Version 1.2. Technical Report CVC TR-98-003/DCS TR-1165, Yale Center for Computational Vision and Control, Yale University.
  20. McDermott, D. M. 2000. The 1998 AI Planning Systems Competition. AI Magazine, 21(2): 35.
  21. OpenAI. 2024. GPT-4 Technical Report. arXiv:2303.08774.
  22. Plansformer: Generating Symbolic Plans using Transformers. arXiv:2212.08681.
  23. Pednault, E. P. D. 1989. ADL: Exploring the Middle Ground between STRIPS and the Situation Calculus. In Proc. KR 1989, 324–332.
  24. VirtualHome: Simulating Household Activities via Programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  25. The LAMA Planner: Guiding Cost-Based Anytime Planning with Landmarks. JAIR, 39: 127–177.
  26. Saturated Cost Partitioning for Optimal Classical Planning. JAIR, 67: 129–167.
  27. Reflexion: language agents with verbal reinforcement learning. In Advances in Neural Information Processing Systems, volume 36, 8634–8652. Curran Associates, Inc.
  28. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proc. ICLR.
  29. Generalized Planning in PDDL Domains with Pretrained Large Language Models. arXiv:2305.11014.
  30. PDDL Planning with Pretrained Large Language Models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
  31. Blocks World revisited. AIJ, 125(1–2): 119–153.
  32. AutoPlanBench: Automatically generating benchmarks for LLM planners from PDDL. arXiv:2311.09830.
  33. PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change. arXiv:2206.10498.
  34. On the Planning Abilities of Large Language Models - A Critical Investigation. In Advances in Neural Information Processing Systems, volume 36, 75993–76005. Curran Associates, Inc.
  35. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171.
  36. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, volume 35, 24824–24837. Curran Associates, Inc.
  37. Self-Evaluation Guided Beam Search for Reasoning. arXiv:2305.00633.
  38. Translating Natural Language to Planning Goals with Large-Language Models. arXiv:2302.05128.
  39. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601.
  40. ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
  41. ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning. arXiv preprint arXiv:2308.13724.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Elliot Gestrin (1 paper)
  2. Marco Kuhlmann (13 papers)
  3. Jendrik Seipp (6 papers)
Citations (2)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets