VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning (2410.23156v2)
Abstract: Broadly intelligent agents should form task-specific abstractions that selectively expose the essential elements of a task, while abstracting away the complexity of the raw sensorimotor space. In this work, we present Neuro-Symbolic Predicates, a first-order abstraction language that combines the strengths of symbolic and neural knowledge representations. We outline an online algorithm for inventing such predicates and learning abstract world models. We compare our approach to hierarchical reinforcement learning, vision-LLM planning, and symbolic predicate invention approaches, on both in- and out-of-distribution tasks across five simulated robotic domains. Results show that our approach offers better sample complexity, stronger out-of-distribution generalization, and improved interpretability.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Recent advances in hierarchical reinforcement learning. Discrete event dynamic systems, 13:341–379, 2003.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
- Learning neuro-symbolic relational transition models for bilevel planning. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4166–4173. IEEE, 2022.
- Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016.
- Long-horizon manipulation of unknown objects via task and motion planning with estimated affordances. In 2022 International Conference on Robotics and Automation (ICRA), pp. 1940–1946. IEEE, 2022.
- Trust the proc3s: Solving long-horizon robotics problems with llms and constraint satisfaction, 2024a.
- Partially observable task and motion planning with uncertainty and risk awareness. arXiv preprint arXiv:2403.10454, 2024b.
- Integrated task and motion planning. Annual review of control, robotics, and autonomous systems, 4(1):265–293, 2021.
- Visual programming: Compositional visual reasoning without training. ArXiv, abs/2211.11559, 2022.
- Interpret: Interactive predicate learning from language feedback for generalizable task planning. arXiv preprint arXiv:2405.19758, 2024.
- Landmarks, critical paths and abstractions: what’s the difference anyway? In Proceedings of the International Conference on Automated Planning and Scheduling, volume 19, pp. 162–169, 2009.
- Look before you leap: Unveiling the power of gpt-4v in robotic vision-language planning. arXiv preprint arXiv:2311.17842, 2023.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
- Learning portable representations for high-level planning. In International Conference on Machine Learning, pp. 4682–4691. PMLR, 2020.
- Autonomous learning of object-centric abstractions for high-level planning. In Proceedings of the The Tenth International Conference on Learning Representations, 2022.
- Llms can’t plan, but can help planning in llm-modulo frameworks, 2024.
- George Konidaris. On the necessity of abstraction. Current opinion in behavioral sciences, 29:1–7, 2019.
- From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research, 61:215–289, 2018.
- Learning efficient abstract planning models that choose what to predict. In Conference on Robot Learning, pp. 2070–2095. PMLR, 2023a.
- Bilevel planning for robots: An illustrated introduction. 2023b. https://lis.csail.mit.edu/bilevel-planning-for-robots-an-illustrated-introduction.
- Practice makes perfect: Planning to learn skill parameter policies, 2024.
- Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9493–9500. IEEE, 2023.
- Reinforcement learning with parameterized actions. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016.
- Pddl-the planning domain definition language. 1998. URL https://api.semanticscholar.org/CorpusID:59656859.
- Generative skill chaining: Long-horizon skill planning with diffusion models. In Conference on Robot Learning, pp. 2905–2925. PMLR, 2023.
- Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In 2022 International Conference on Robotics and Automation (ICRA), pp. 7477–7484. IEEE, 2022a.
- Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In 2022 International Conference on Robotics and Automation (ICRA), pp. 7477–7484. IEEE, 2022b.
- Learning symbolic models of stochastic domains. Journal of Artificial Intelligence Research, 29:309–352, 2007.
- Learning symbolic operators for task and motion planning. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3182–3189. IEEE, 2021.
- Learning neuro-symbolic skills for bilevel planning. arXiv preprint arXiv:2206.10680, 2022.
- Predicate invention for bilevel planning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp. 12120–12129, 2023.
- Vipergpt: Visual inference via python execution for reasoning. Proceedings of IEEE International Conference on Computer Vision (ICCV), 2023.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Worldcoder, a model-based llm agent: Building world models by writing code and interacting with the environment. arXiv preprint arXiv:2402.12275, 2024.
- In defense of pddl axioms. Artificial Intelligence, 168(1-2):38–69, 2005.
- Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441, 2023.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.