Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generalized Planning in PDDL Domains with Pretrained Large Language Models (2305.11014v2)

Published 18 May 2023 in cs.AI

Abstract: Recent work has considered whether LLMs can function as planners: given a task, generate a plan. We investigate whether LLMs can serve as generalized planners: given a domain and training tasks, generate a program that efficiently produces plans for other tasks in the domain. In particular, we consider PDDL domains and use GPT-4 to synthesize Python programs. We also consider (1) Chain-of-Thought (CoT) summarization, where the LLM is prompted to summarize the domain and propose a strategy in words before synthesizing the program; and (2) automated debugging, where the program is validated with respect to the training tasks, and in case of errors, the LLM is re-prompted with four types of feedback. We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines. Overall, we find that GPT-4 is a surprisingly powerful generalized planner. We also conclude that automated debugging is very important, that CoT summarization has non-uniform impact, that GPT-4 is far superior to GPT-3.5, and that just two training tasks are often sufficient for strong generalization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691.
  2. Syntax-guided synthesis. IEEE.
  3. Policies that generalize: Solving many planning problems with the same policy. In Twenty-Fourth International Joint Conference on Artificial Intelligence.
  4. Features, Projections, and Representation Change for Generalized Planning. CoRR, abs/1801.10055.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  7. Chapman, D. 1987. Planning for Conjunctive Goals. Artificial Intelligence, 32: 333–377.
  8. Improving Code Generation by Training with Natural Language Feedback. arXiv:2303.16749.
  9. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374.
  10. Teaching large language models to self-debug. arXiv preprint arXiv:2304.05128.
  11. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
  12. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks. arXiv preprint arXiv:2205.05718.
  13. Inductive logic programming at 30: a new introduction. Journal of Artificial Intelligence Research, 74: 765–850.
  14. Learning and executing generalized robot plans. Artificial intelligence, 3: 251–288.
  15. PAL: Program-aided Language Models. arXiv preprint arXiv:2211.10435.
  16. Program synthesis. Foundations and Trends® in Programming Languages, 4(1-2): 1–119.
  17. Helmert, M. 2006. The fast downward planning system. Journal of Artificial Intelligence Research, 26: 191–246.
  18. VAL: Automatic plan validation, continuous effects and mixed initiative planning using PDDL. In 16th IEEE International Conference on Tools with Artificial Intelligence, 294–301. IEEE.
  19. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In International Conference on Machine Learning (ICML).
  20. Inner Monologue: Embodied Reasoning through Planning with Language Models. arXiv preprint arXiv:2207.05608.
  21. MathPrompter: Mathematical Reasoning using Large Language Models. arXiv:2303.05398.
  22. Self-planning Code Generation with Large Language Model. arXiv preprint arXiv:2303.06689.
  23. Computing plans with control flow and procedures using a classical planner. In Proceedings of the Eighth Annual Symposium on Combinatorial Search, SOCS-15, 62–69.
  24. A review of generalized planning. The Knowledge Engineering Review, 34.
  25. Reshaping diverse planning. In Proceedings of the AAAI Conference on Artificial Intelligence, 06, 9892–9899.
  26. Levesque, H. 2005. Planning with Loops. In IJCAI.
  27. Learning action strategies for planning domains using genetic programming. In Workshops on Applications of Evolutionary Computation, 684–695. Springer.
  28. Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753.
  29. On Grounded Planning for Embodied Tasks with Language Models. arXiv preprint arXiv:2209.00465.
  30. Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153.
  31. LLM+ P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv preprint arXiv:2304.11477.
  32. McDermott, D. 2000. The 1998 AI Planning Systems Competition. AI Magazine, 21(2): 35–55.
  33. Muggleton, S. 1991. Inductive logic programming. New generation computing, 8: 295–318.
  34. CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. arXiv:2203.13474.
  35. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774.
  36. Plansformer: Generating Symbolic Plans using Transformers. arXiv preprint arXiv:2212.08681.
  37. Planning with Large Language Models via Corrective Re-prompting. arXiv preprint arXiv:2211.09935.
  38. The LAMA planner: Guiding Cost-based Anytime Planning with Landmarks. Journal of Artificial Intelligence Research, 39: 127–177.
  39. Generalized Planning With Deep Reinforcement Learning. arXiv preprint arXiv:2005.02305.
  40. Universal value function approximators. In International conference on machine learning, 1312–1320. PMLR.
  41. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761.
  42. Computing hierarchical finite state controllers with classical planning. Journal of Artificial Intelligence Research, 62: 755–797.
  43. Generalized Planning as Heuristic Search. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 31, 569–577.
  44. Skill Induction and Planning with Latent Language. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1713–1726.
  45. Predicate Invention for Bilevel Planning. In AAAI Conference on Artificial Intelligence (AAAI).
  46. PDDL Planning with Pretrained Large Language Models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
  47. Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302.
  48. Finding diverse high-quality plans for hypothesis generation. In ECAI 2016, 1581–1582. IOS Press.
  49. Srivastava, S. 2011. Foundations and applications of generalized planning. AI Communications, 24(4): 349–351.
  50. Directed Search for Generalized Plans Using Classical Planners. In ICAPS.
  51. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, 761–768.
  52. Takayuki, Y. 2000. On the NP-completeness of the Slither Link puzzle. IPSJ SIGNotes ALgorithms.
  53. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). arXiv preprint arXiv:2206.10498.
  54. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903.
  55. Winner, E. Z. 2008. Learning Domain-Specific Planners from Example Plans. Ph.D. thesis, Carnegie Mellon University, USA.
  56. Conversational Automated Program Repair. arXiv:2301.13246.
  57. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128.
  58. PG3: Policy-Guided Planning for Generalized Policy Generation. In IJCAI.
  59. Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation. arXiv preprint arXiv:2305.00909.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Tom Silver (31 papers)
  2. Soham Dan (41 papers)
  3. Kavitha Srinivas (25 papers)
  4. Joshua B. Tenenbaum (257 papers)
  5. Leslie Pack Kaelbling (94 papers)
  6. Michael Katz (21 papers)
Citations (93)