Papers
Topics
Authors
Recent
2000 character limit reached

PROC2PDDL: Open-Domain Planning Representations from Texts (2403.00092v2)

Published 29 Feb 2024 in cs.CL

Abstract: Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used LLMs to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. Knowledge transfer from high-resource to low-resource programming languages for code llms. arXiv preprint arXiv:2308.09895.
  3. Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks.
  4. Richard E Fikes and Nils J Nilsson. 1971. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3-4):189–208.
  5. PDDL - the planning domain definition language. Technical Report "CVC TR-98-003/DSC TR-1165", Yale Center for Computational Vision and Control.
  6. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
  7. Grounded decoding: Guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855.
  8. Steven M LaValle. 2006. Planning algorithms. Cambridge university press.
  9. Framer: Planning models from natural language action descriptions. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 27, pages 434–442.
  10. Llm+p: Empowering large language models with optimal planning proficiency.
  11. Faithful chain-of-thought reasoning.
  12. Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749.
  13. Pddl planning with pretrained large language models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
  14. Katharina Stein and Alexander Koller. 2023. Autoplanbench:: Automatically generating benchmarks for llm planners from pddl. arXiv preprint arXiv:2311.09830.
  15. Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
  16. Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. Advances in Neural Information Processing Systems, 36.
  17. On the planning abilities of large language models–a critical investigation. arXiv preprint arXiv:2305.15771.
  18. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). arXiv preprint arXiv:2206.10498.
  19. Large language models still can’t plan (a benchmark for llms on planning and reasoning about change).
  20. Chain-of-thought prompting elicits reasoning in large language models.
  21. Learning adaptive planning representations with natural language guidance. arXiv preprint arXiv:2312.08566.
  22. Translating natural language to planning goals with large-language models.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube