Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SayCanPay: Heuristic Planning with Large Language Models using Learnable Domain Knowledge (2308.12682v2)

Published 24 Aug 2023 in cs.AI

Abstract: LLMs have demonstrated impressive planning abilities due to their vast "world knowledge". Yet, obtaining plans that are both feasible (grounded in affordances) and cost-effective (in plan length), remains a challenge, despite recent progress. This contrasts with heuristic planning methods that employ domain knowledge (formalized in action models such as PDDL) and heuristic search to generate feasible, optimal plans. Inspired by this, we propose to combine the power of LLMs and heuristic planning by leveraging the world knowledge of LLMs and the principles of heuristic search. Our approach, SayCanPay, employs LLMs to generate actions (Say) guided by learnable domain knowledge, that evaluates actions' feasibility (Can) and long-term reward/payoff (Pay), and heuristic search to select the best sequence of actions. Our contributions are (1) a novel framing of the LLM planning problem in the context of heuristic planning, (2) integrating grounding and cost-effective elements into the generated plans, and (3) using heuristic search over actions. Our extensive evaluations show that our model surpasses other LLM planning approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. arXiv:2204.01691.
  2. Planning as heuristic search. Artificial Intelligence, 129(1-2): 5–33.
  3. RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control. arXiv:2307.15818.
  4. BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop. In International Conference on Learning Representations, volume 105.
  5. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.
  6. Scaling Instruction-Finetuned Language Models. arXiv:2210.11416.
  7. LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale. arXiv:2208.07339.
  8. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv:2305.14314.
  9. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Minneapolis, Minnesota: Association for Computational Linguistics.
  10. Integrating action knowledge and LLMs for task planning and situation handling in open worlds. Autonomous Robots, 47(8): 981–997.
  11. A Survey of Vision-Language Pre-Trained Models. arXiv:2202.10936.
  12. GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers. arXiv:2210.17323.
  13. Planning in Observable POMDPs in Quasipolynomial Time. arXiv:2201.04735.
  14. Reasoning with Language Model is Planning with World Model. arXiv:2305.14992.
  15. Helmert, M. 2006. The fast downward planning system. Journal of Artificial Intelligence Research, 26: 191–246.
  16. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, 9118–9147. PMLR.
  17. Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents. arXiv:2303.00855.
  18. Inner Monologue: Embodied Reasoning through Planning with Language Models. arXiv:2207.05608.
  19. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2): 99–134.
  20. Reward Design with Language Models. In The Eleventh International Conference on Learning Representations.
  21. On Generative Spoken Language Modeling from Raw Audio. Transactions of the Association for Computational Linguistics, 9: 1336–1354.
  22. Code as Policies: Language Model Programs for Embodied Control. arXiv:2209.07753.
  23. Synthesizing Environment-Aware Activities via Activity Sketches. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6284–6292.
  24. Text2Motion: from natural language instructions to feasible plans. Autonomous Robots, 47(8): 1345–1365.
  25. LLM+P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv:2304.11477.
  26. Plansformer: Generating Symbolic Plans using Transformers. arXiv:2212.08681.
  27. Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8494–8502.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1): 5485–5551.
  29. PDDL Planning with Pretrained Large Language Models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
  30. ProgPrompt: Generating Situated Robot Task Plans using Large Language Models. In International Conference on Robotics and Automation (ICRA).
  31. LLaMA: Open and Efficient Foundation Language Models. arXiv:2302.13971.
  32. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). In NeurIPS 2022 Foundation Models for Decision Making Workshop.
  33. On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark). arXiv:2302.06706.
  34. Representation Learning with Contrastive Predictive Coding. arXiv:1807.03748.
  35. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Moens, M.-F.; Huang, X.; Specia, L.; and Yih, S. W.-t., eds., Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 8696–8708. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics.
  36. Translating Natural Language to Planning Goals with Large-Language Models. arXiv:2302.05128.
  37. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601.
  38. Transporter Networks: Rearranging the Visual World for Robotic Manipulation. In Proceedings of the 2020 Conference on Robot Learning, volume 155 of Proceedings of Machine Learning Research, 726–747. PMLR.
  39. Fine-Tuning Language Models from Human Preferences. arXiv:1909.08593.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Rishi Hazra (15 papers)
  2. Pedro Zuidberg Dos Martires (22 papers)
  3. Luc De Raedt (55 papers)
Citations (21)