Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Planning with a Learned Policy Basis to Optimally Solve Complex Tasks (2403.15301v2)

Published 22 Mar 2024 in cs.LG and cs.AI

Abstract: Conventional reinforcement learning (RL) methods can successfully solve a wide range of sequential decision problems. However, learning policies that can generalize predictably across multiple tasks in a setting with non-Markovian reward specifications is a challenging problem. We propose to use successor features to learn a policy basis so that each (sub)policy in it solves a well-defined subproblem. In a task described by a finite state automaton (FSA) that involves the same set of subproblems, the combination of these (sub)policies can then be used to generate an optimal solution without additional learning. In contrast to other methods that combine (sub)policies via planning, our method asymptotically attains global optimality, even in stochastic environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Optimistic linear support and successor features as a basis for optimal policy transfer. In International Conference on Machine Learning, 394–413. PMLR.
  2. The logical options framework. In International Conference on Machine Learning, 307–317. PMLR.
  3. The Option Keyboard: Combining Skills in Reinforcement Learning. In Neural Information Processing Systems, 13031–13041.
  4. Successor Features for Transfer in Reinforcement Learning. In Neural Information Processing Systems 30, 4055–4065.
  5. Fast reinforcement learning with generalized policy updates. Proceedings of the National Academy of Sciences, 117(48): 30079–30087.
  6. LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning. In International Joint Conference on Artificial Intelligence, IJCAI-19, 6065–6073.
  7. Dayan, P. 1993. Improving Generalization for Temporal Difference Learning: The Successor Representation. Neural Computation, 5(4): 613–624.
  8. Feudal Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 5.
  9. Dietterich, T. G. 2000. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition. Journal of Artificial Intelligence Research, 13: 227–303.
  10. Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes. In AAAI Conference on Artificial Intelligence, volume 36, 6970–6977.
  11. Compositional Reinforcement Learning from Logical Specifications. In Advances in Neural Information Processing Systems, volume 34, 10026–10039.
  12. Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 5604–5610.
  13. Systematic Generalisation through Task Temporal Logic and Deep Reinforcement Learning. CoRR, abs/2006.08767.
  14. Approximate value iteration with temporally extended actions. Journal of Artificial Intelligence Research, 53: 375–438.
  15. A Boolean Task Algebra for Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 33, 9497–9507.
  16. Linear support for multi-objective coordination graphs. In International Conference on Autonomous Agents & Multiagent Systems, volume 2, 1297–1304.
  17. Computing convex coverage sets for faster multi-objective coordination. Journal of Artificial Intelligence Research, 52: 399–443.
  18. Universal Value Function Approximators. In Bach, F.; and Blei, D., eds., Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, 1312–1320. Lille, France: PMLR.
  19. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 112(1-2): 181–211.
  20. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. In International Conference on Machine Learning, volume 80, 2107–2116. PMLR.
  21. Teaching multiple tasks to an RL agent using LTL. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, 452–461.
  22. Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning. Journal of Artificial Intelligence Research, 73: 173–208.
  23. Learning Reward Machines for Partially Observable Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 32.
  24. LTL2Action: Generalizing LTL Instructions for Multi-Task RL. In International Conference on Machine Learning, 10497–10508.
  25. Composing Value Functions in Reinforcement Learning. In International Conference on Machine Learning, volume 97, 6401–6409.
  26. Q-learning. Machine learning, 8: 279–292.
  27. On Efficiency in Hierarchical Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 33, 6708–6718.
  28. Reinforcement learning of non-Markov decision processes. Artificial intelligence, 73(1-2): 271–306.
  29. Discovering a Set of Policies for the Worst Case Reward. In International Conference on Learning Representations.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Guillermo Infante (3 papers)
  2. David Kuric (3 papers)
  3. Anders Jonsson (47 papers)
  4. Vicenç Gómez (39 papers)
  5. Herke van Hoof (38 papers)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets