PROC2PDDL: Open-Domain Planning Representations from Texts (2403.00092v2)
Abstract: Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used LLMs to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Knowledge transfer from high-resource to low-resource programming languages for code llms. arXiv preprint arXiv:2308.09895.
- Structured, flexible, and robust: benchmarking and improving large language models towards more human-like behavior in out-of-distribution reasoning tasks.
- Richard E Fikes and Nils J Nilsson. 1971. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence, 2(3-4):189–208.
- PDDL - the planning domain definition language. Technical Report "CVC TR-98-003/DSC TR-1165", Yale Center for Computational Vision and Control.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
- Grounded decoding: Guiding text generation with grounded models for robot control. arXiv preprint arXiv:2303.00855.
- Steven M LaValle. 2006. Planning algorithms. Cambridge university press.
- Framer: Planning models from natural language action descriptions. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 27, pages 434–442.
- Llm+p: Empowering large language models with optimal planning proficiency.
- Faithful chain-of-thought reasoning.
- Alfred: A benchmark for interpreting grounded instructions for everyday tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10740–10749.
- Pddl planning with pretrained large language models. In NeurIPS 2022 Foundation Models for Decision Making Workshop.
- Katharina Stein and Alexander Koller. 2023. Autoplanbench:: Automatically generating benchmarks for llm planners from pddl. arXiv preprint arXiv:2311.09830.
- Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
- Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. Advances in Neural Information Processing Systems, 36.
- On the planning abilities of large language models–a critical investigation. arXiv preprint arXiv:2305.15771.
- Large language models still can’t plan (a benchmark for llms on planning and reasoning about change). arXiv preprint arXiv:2206.10498.
- Large language models still can’t plan (a benchmark for llms on planning and reasoning about change).
- Chain-of-thought prompting elicits reasoning in large language models.
- Learning adaptive planning representations with natural language guidance. arXiv preprint arXiv:2312.08566.
- Translating natural language to planning goals with large-language models.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.