Goal-Space Planning with Subgoal Models
Abstract: This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
- DisTop: Discovering a Topological representation to learn diverse and rewarding skills. arXiv:2106.03853 [cs], 2021.
- Model-Based Reinforcement Learning with Value-Targeted Regression. In International Conference on Machine Learning, 2020.
- Hierarchical dynamic programming for robot path planning. In International Conference on Intelligent Robots and Systems, 2005.
- Successor features for transfer in reinforcement learning. In Advances in Neural Information Processing Systems, 2017.
- The Option Keyboard: Combining Skills in Reinforcement Learning. In Advances in Neural Information Processing Systems, 2019.
- Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48), 2020.
- PAC-inspired Option Discovery in Lifelong Reinforcement Learning. In Proceedings of the 31st International Conference on Machine Learning. PMLR, 2014.
- Forethought and hindsight in credit assignment. In Advances in Neural Information Processing Systems, 2020.
- Trajectory-wise Control Variates for Variance Reduction in Policy Gradient Methods. In 3rd Annual Conference on Robot Learning, volume 100 of Proceedings of Machine Learning Research, 2019.
- Learning Neuro-Symbolic Relational Transition Models for Bilevel Planning. In International Conference on Intelligent Robots and Systems. IEEE, 2022.
- Feudal reinforcement learning. In Advances in Neural Information Processing Systems, 1992.
- Thomas G Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research, 13:227–303, 2000.
- A hierarchical approach to efficient reinforcement learning in deterministic domains. In International Joint Conference on Autonomous Agents and Multiagent Systems, 2006.
- SNAP:Successor Entropy based Incremental Subgoal Discovery for Adaptive Navigation. In Motion, Interaction and Games, 2021.
- Sparse Graphical Memory for Robust Planning. In Advances in Neural Information Processing Systems, 2020.
- Amir-massoud Farahmand. Iterative Value-Aware Model Learning. In Advances in Neural Information Processing Systems, 2018.
- Value-Aware Loss Function for Model-based Reinforcement Learning. In International Conference on Artificial Intelligence and Statistics, 2017.
- TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning. In International Conference on Learning Representations, 2018.
- Planning-Augmented Hierarchical Reinforcement Learning. IEEE Robotics and Automation Letters, 6(3), 2021.
- Planning with abstract markov decision processes. In International Conference on Automated Planning and Scheduling, 2017.
- Mastering Diverse Domains through World Models. arXiv:2301.04104, 2023.
- Hierarchical solution of markov decision processes using macro-actions. In Uncertainty in Artificial Intelligence, 2013.
- Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In IEEE International Conference on Computer Vision, 2015.
- Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning. In Advances in Neural Information Processing Systems, 2021.
- Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning. In AAAI Conference on Artificial Intelligence, 2010.
- From Importance Sampling to Doubly Robust Policy Gradient. In International Conference on Machine Learning, 2020.
- Mapping State Space using Landmarks for Universal Goal Reaching. In Advances in Neural Information Processing Systems, 2019.
- Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models. arXiv:2006.04363, 2020.
- Finding Options that Minimize Planning Time. In International Conference on Machine Learning, 2019.
- What can I do here? A Theory of Affordances in Reinforcement Learning. In International Conference on Machine Learning, 2020.
- Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning. In Advances in Neural Information Processing Systems, 2021.
- George Konidaris. Constructing abstraction hierarchies using a skill-symbol loop. In International Joint Conference on Artificial Intelligence, volume 2016, 2016.
- Constructing symbolic representations for high-level planning. In AAAI Conference on Artificial Intelligence, 2014.
- Skill Discovery in Continuous Reinforcement Learning Domains using Skill Chaining. In Advances in Neural Information Processing Systems, 2009.
- Investigating Compounding Prediction Errors in Learned Dynamics Models. arXiv:2203.09637, 2022.
- Scaling up approximate value iteration with options: Better policies with fewer iterations. In International Conference on Machine Learning, pages 127–135. PMLR, 2014.
- Approximate Value Iteration with Temporally Extended Actions. Journal of Artificial Intelligence Research, 53, 2015.
- Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density. In International Conference on Machine Learning, 2001.
- Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time. Machine learning, 13(1), 1993.
- Planning with Goal-Conditioned Policies. In Advances in Neural Information Processing Systems, 2019.
- Policy Invariance under Reward Transformations: Theory and Application to Reward Shaping. In International Conference on Machine Learning, 1999.
- Value Prediction Network. In Advances in Neural Information Processing Systems, 2017.
- Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-Based Planning in Continuous State Domains. In International Joint Conference on Artificial Intelligence, 2018.
- Hill Climbing on Value Estimates for Search-Control in Dyna. In International Joint Conference on Artificial Intelligence, 2019.
- Sergey Pankov. Reward-Estimation Variance Elimination in Sequential Decision processes. arXiv:1811.06225, 2018.
- Empirical Design in Reinforcement Learning. arXiv:2304.01315, 2020.
- Roger Penrose. A Generalized Inverse for Matrices. Mathematical Proceedings of the Cambridge Philosophical Society, 51(3), 1955.
- Universal Value Function Approximators. In International Conference on Machine Learning, 2015.
- Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. Nature, 588(7839), 2020.
- Proximal policy optimization algorithms. arXiv:1707.06347, 2017.
- Sample-Based Learning and Search with Permanent and Transient Memories. In International Conference on Machine Learning, 2008.
- The Predictron: End-To-End Learning and Planning. In International Conference on Machine Learning, 2017.
- Intrinsically Motivated Reinforcement Learning. In Advances in Neural Information Processing Systems, 2004.
- Satinder P Singh. Scaling reinforcement learning algorithms by learning variable temporal resolution models. In Machine Learning Proceedings 1992, pages 406–415. Elsevier, 1992.
- Learning Options in Reinforcement Learning. In Abstraction, Reformulation, and Approximation, 2002.
- Richard S. Sutton. Integrated modeling and control based on reinforcement learning and dynamic programming. In Advances in Neural Information Processing Systems, 1991.
- Reinforcement Learning: An Introduction. MIT Press, 2018.
- Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 112(1-2), 1999.
- Horde: A Scalable Real-Time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction. In International Conference on Autonomous Agents and Multiagent Systems, 2011.
- An Emphatic Approach to the Problem of Off-Policy Temporal-Difference Learning. The Journal of Machine Learning Research, 2016.
- Reward-Respecting Subtasks for Model-Based Reinforcement Learning. Artificial Intelligence, 324, 2022.
- Erik Talvitie. Model Regularization for Stable Sample Roll-Outs. In Uncertainty in Artificial Intelligence, 2014.
- Erik Talvitie. Self-Correcting Models for Model-Based Reinforcement Learning. In AAAI Conference on Artificial Intelligence, 2017.
- Value Iteration Networks. In Advances in Neural Information Processing Systems, 2016.
- Hado van Hasselt. Double q-learning. In Advances in Neural Information Processing Systems, 2010.
- Deep Reinforcement Learning with Double Q-learning. In AAAI Conference on Artificial Intelligence, 2016.
- When to use Parametric Models in Reinforcement Learning? In Advances in Neural Information Processing Systems, 2019.
- Improving Multi-Step Prediction of Learned Time Series Models. In AAAI Conference on Artificial Intelligence, 2015.
- Planning with Expectation Models. In International Joint Conference on Artificial Intelligence, 2019.
- Average-Reward Learning and Planning with Options. In Advances in Neural Information Processing Systems, 2021.
- Imagination-Augmented Agents for Deep Reinforcement Learning. In Advances in Neural Information Processing Systems, 2017.
- Martha White. Unifying Task Specification in Reinforcement Learning. In International Conference on Machine Learning, 2017.
- Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 1992.
- Prioritization Methods for Accelerating MDP Solvers. Journal of Machine Learning Research, 2005.
- Combined Task and Motion Planning for Mobile Manipulation. International Conference on Automated Planning and Scheduling, 2010.
- Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies. In Proceedings of the 31st International Conference on Algorithmic Learning Theory. PMLR, 2020.
- World Model as a Graph: Learning Latent Landmarks for Planning. In International Conference on Machine Learning, 2021.
- Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning. In Advances in Neural Information Processing Systems, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.