Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Learning adaptive planning representations with natural language guidance (2312.08566v1)

Published 13 Dec 2023 in cs.AI, cs.CL, and cs.RO

Abstract: Effective planning in the real world requires not only world knowledge, but the ability to leverage that knowledge to build the right representation of the task at hand. Decades of hierarchical planning techniques have used domain-specific temporal action abstractions to support efficient and accurate planning, almost always relying on human priors and domain knowledge to decompose hard tasks into smaller subproblems appropriate for a goal or set of goals. This paper describes Ada (Action Domain Acquisition), a framework for automatically constructing task-specific planning representations using task-general background knowledge from LLMs (LMs). Starting with a general-purpose hierarchical planner and a low-level goal-conditioned policy, Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks. On two language-guided interactive planning benchmarks (Mini Minecraft and ALFRED Household Tasks), Ada strongly outperforms other approaches that use LMs for sequential decision-making, offering more accurate plans and better generalization to complex tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Do as I Can, Not as I Say: Grounding Language in Robotic Affordances. arXiv:2204.01691, 2022.
  2. Hierarchical Planning: Relating Task and Goal Decomposition with Task Sharing. In IJCAI, 2016.
  3. Modular Multitask Reinforcement Learning with Policy Sketches. In ICML, 2017.
  4. Hindsight Experience Replay. In NeurIPS, 2017.
  5. Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models. arXiv:2305.17077, 2023.
  6. Learning Neural-Symbolic Descriptive Planning Models via Cube-Space Priors: The Voyage Home (To Strips). In IJCAI, 2020.
  7. Learning First-Order Symbolic Representations for Planning from the Structure of the State Space. In ECAI, 2020.
  8. Top-Down Synthesis for Library Learning. PACMPL, 7(POPL):1182–1213, 2023.
  9. Babble: Learning Better Abstractions with E-graphs and Anti-unification. PACMPL, 7(POPL):396–424, 2023.
  10. Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning. In ICLR, 2021.
  11. GLiB: Efficient Exploration for Relational Model-Based Reinforcement Learning via Goal-Literal Babbling. In AAAI, 2021.
  12. Modular Networks for Compositional Instruction Following. In NAACL-HLT, 2021.
  13. Tomás de la Rosa and Sheila McIlraith. Learning Domain Control Knowledge for TLPlan and Beyond. In ICAPS 2011 Workshop on Planning and Learning, 2011.
  14. Thomas G Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. JAIR, 13:227–303, 2000.
  15. DreamCoder: Growing Generalizable, Interpretable Knowledge with Wake–Sleep Bayesian Program Learning. Philosophical Transactions of the Royal Society, 381(2251):20220050, 2023.
  16. HTN Planning: Complexity and Expressivity. In AAAI, 1994.
  17. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving. Artif. Intell., 2(3-4):189–208, 1971.
  18. Integrated Task and Motion Planning. Ann. Rev. Control Robot. Auton. Syst., 4:265–293, 2021.
  19. Malte Helmert. The Fast Downward Planning System. JAIR, 26:191–246, 2006.
  20. People construct simplified mental representations to plan. Nature, 606(7912):129–136, 2022.
  21. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. In ICML, 2022.
  22. Language as an Abstraction for Hierarchical Deep Reinforcement Learning. In NeurIPS, 2019.
  23. From Skills to Symbols: Learning Symbolic Representations for Abstract High-Level Planning. JAIR, 61:215–289, 2018.
  24. Steven LaValle. Rapidly-Exploring Random Trees: A New Tool for Path Planning. Research Report 9811, 1998.
  25. Vladimir Lifschitz. On the Semantics of STRIPS. In Workshop on Reasoning about Actions and Plans, 1986.
  26. LLM+ P: Empowering Large Language Models with Optimal Planning Proficiency. arXiv:2304.11477, 2023.
  27. A Survey of Reinforcement Learning Informed by Natural Language. In IJCAI, 2019.
  28. Learning Rational Subgoals from Demonstrations and Instructions. In AAAI, 2023.
  29. PDSketch: Integrated Domain Programming, Learning, and Planning. In NeurIPS, 2022.
  30. Grounding predicates through actions, 2022.
  31. Mapping Instructions and Visual Observations to Actions with Reinforcement Learning. In EMNLP, 2017.
  32. EmbodiedGPT: Vision-Language Pre-training via Embodied Chain of Thought. arXiv:2305.15021, 2023.
  33. Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sourced Annotation. In CoRL, 2022.
  34. Learning Hierarchical Task Networks by Observation. In ICML, 2006.
  35. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050, 2023.
  36. Skill Induction and Planning with Latent Language. In ACL, 2022.
  37. ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks. In CVPR, 2020.
  38. Learning Symbolic Operators for Task and Motion Planning. In IROS, 2021.
  39. PDDL Planning with Pretrained Large Language Models. In NeurIPS Foundation Models for Decision Making Workshop, 2022.
  40. Generalized Planning in PDDL Domains with Pretrained Large Language Models. arXiv:2305.11014, 2023.
  41. LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models. In ICCV, 2023.
  42. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artif. Intell., 112(1-2):181–211, 1999.
  43. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation. In AAAI, 2011.
  44. Robots That Use Language. Annual Review of Control, Robotics, & Autonomous Systems, 3:25–55, 2020.
  45. Large Language Models Still Can’t Plan (A Benchmark for LLMs on Planning and Reasoning about Change). In NeurIPS Foundation Models for Decision Making Workshop, 2022.
  46. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv:2305.16291, 2023a.
  47. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents, 2023b.
  48. Leveraging Language to Learn Program Abstractions and Search Heuristics. In ICML, 2021.
  49. Translating Natural Language to Planning Goals with Large-Language Models. arXiv:2302.05128, 2023.
  50. Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory, 2023.
Citations (13)

Summary

  • The paper introduces the Ada framework that automatically constructs symbolic planning operators and low-level controllers using language models for task-specific planning.
  • It demonstrates bi-level planning where high-level plans built as PDDL operators are refined by learned low-level control policies in interactive environments.
  • Experimental results on Mini Minecraft and ALFRED benchmarks highlight significant improvements in planning efficiency and operator generalization.

This paper presents the Action Domain Acquisition (Ada) framework, which addresses the challenge of constructing effective task-specific planning representations using task-general background knowledge from LLMs (LMs). The primary goal is to enable automated construction of hierarchical task domains that support efficient and accurate planning without relying extensively on human-engineered priors. Ada constructs these domain-like abstractions by learning a library of planner-compatible high-level action definitions, leveraging interactive environments.

Key Contributions:

  • Adaptive Representation Learning: Ada automatically constructs symbolic planning operators that specify preconditions and effects for actions, facilitating traditional symbolic planning. Simultaneously, it learns local controllers that can execute these operators via low-level actions in interactive environments.
  • Hierarchical Action Spaces: Ada leverages LMs to extract potential high-level actions in the format of Planning Domain Definition Language (PDDL) operators and uses these to build a compositional library that can adapt to varied planning tasks.
  • Bi-Level Planning: Once Ada has defined a library of actions, it employs a hierarchical approach to planning. High-level plans are constructed using symbolic operators, and low-level control policies are learned or refined to achieve subgoals imposed by these high-level plans.
  • Task Environment and Language Benchmarking: Evaluations on language-guided benchmarks, Mini Minecraft and ALFRED, demonstrate Ada’s effectiveness in adapting generalized planning strategies to specific goals expressed in natural language, showing substantial improvements over existing methods of language-model driven planning.

Experimental Results:

  • Mini Minecraft and ALFRED Benchmarks: Ada achieves superior performance compared to baseline methods such as low-level planning-only (e.g., direct goal translation), subgoal sequence prediction, and code-based policy prediction across both simple and complex planning tasks.
  • The approach demonstrated 100% accuracy in successfully mining and crafting tasks in Mini Minecraft, achieving compositional tasks up to a complexity level of 26 steps. In ALFRED, Ada achieved a task completion rate of 79%, a significant advancement over baselines which struggled to exceed 21%.
  • Operator Generalization and Planning Efficiency: The framework shows that LLM-driven operators enable the generalization of learned actions to solve previously unseen tasks, effectively handling ambiguity and underspecification in human language instructions.

Challenges and Future Directions:

  • The framework currently relies on a predefined set of high-level predicates for initial state representation, limiting adaptability in the absence of pre-specified domain knowledge.
  • Expanding the role of multimodal LLMs may address challenges related to perceptual input integration and improve the handling of geometric or fine-grained motor tasks.

In conclusion, Ada leverages LMs to transition from generalized language knowledge to functional domain-specific planning representations, significantly advancing the capabilities of AI systems in constructing scalable and adaptive planning strategies from linguistic specifications. Importantly, the approach highlights the potential synergy between structured symbolic planning frameworks and the rich background knowledge encapsulated within modern LMs.