Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bounding the Optimal Value Function in Compositional Reinforcement Learning (2303.02557v2)

Published 5 Mar 2023 in cs.LG

Abstract: In the field of reinforcement learning (RL), agents are often tasked with solving a variety of problems differing only in their reward functions. In order to quickly obtain solutions to unseen problems with new reward functions, a popular approach involves functional composition of previously solved tasks. However, previous work using such functional composition has primarily focused on specific instances of composition functions whose limiting assumptions allow for exact zero-shot composition. Our work unifies these examples and provides a more general framework for compositionality in both standard and entropy-regularized RL. We find that, for a broad class of functions, the optimal solution for the composite task of interest can be related to the known primitive task solutions. Specifically, we present double-sided inequalities relating the optimal composite value function to the value functions for the primitive tasks. We also show that the regret of using a zero-shot policy can be bounded for this class of functions. The derived bounds can be used to develop clipping approaches for reducing uncertainty during training, allowing agents to quickly adapt to new tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Utilizing prior solutions for reward shaping and composition in entropy-regularized reinforcement learning. arXiv preprint arXiv:2212.01174, 2022.
  2. Constructing a good behavior basis for transfer using generalized policy updates. arXiv preprint arXiv:2112.15025, 2021.
  3. Successor features for transfer in reinforcement learning. Advances in Neural Information Processing Systems, 30, 2017.
  4. The option keyboard: Combining skills in reinforcement learning. Advances in Neural Information Processing Systems, 32, 2019.
  5. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  6. Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4):613–624, 1993.
  7. Maximum entropy RL (provably) solves some robust RL problems. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=PtSAD3caaA2.
  8. Composable deep reinforcement learning for robotic manipulation. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2018a. 10.1109/icra.2018.8460756. URL https://doi.org/10.1109/icra.2018.8460756.
  9. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 1861–1870. PMLR, 10–15 Jul 2018b. URL https://proceedings.mlr.press/v80/haarnoja18b.html.
  10. Soft actor-critic algorithms and applications, 2019.
  11. Inequalities. Cambridge university press, 1952.
  12. Bilinear value networks. arXiv preprint arXiv:2204.13695, 2022.
  13. Composing entropic policies using divergence correction. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2911–2920. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/hunt19a.html.
  14. Constrained gpi for zero-shot transfer in reinforcement learning. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=sWNT5lT7l9G.
  15. Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning, pages 6131–6141. PMLR, 2021.
  16. Sergey Levine. Reinforcement Learning and Control as Probabilistic Inference: Tutorial and Review. arXiv, May 2018. URL https://arxiv.org/abs/1805.00909v3.
  17. Directed exploration in reinforcement learning with transferred knowledge. In Marc Peter Deisenroth, Csaba Szepesvári, and Jan Peters, editors, Proceedings of the Tenth European Workshop on Reinforcement Learning, volume 24 of Proceedings of Machine Learning Research, pages 59–76, Edinburgh, Scotland, 30 Jun–01 Jul 2013. PMLR. URL https://proceedings.mlr.press/v24/mann12a.html.
  18. Policy caches with successor features. In International Conference on Machine Learning, pages 8025–8033. PMLR, 2021.
  19. Hierarchical reinforcement learning: A comprehensive survey. ACM Computing Surveys (CSUR), 54(5):1–35, 2021.
  20. Mcp: Learning composable hierarchical control with multiplicative compositional policies. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 3681–3692. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/file/95192c98732387165bf8e396c0f2dad2-Paper.pdf.
  21. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  22. Hierarchy through composition with multitask LMDPs. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3017–3026. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/saxe17a.html.
  23. Reinforcement learning: An introduction. MIT press, 2018.
  24. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1):181–211, 1999. ISSN 0004-3702. https://doi.org/10.1016/S0004-3702(99)00052-1. URL https://www.sciencedirect.com/science/article/pii/S0004370299000521.
  25. A boolean task algebra for reinforcement learning. Advances in Neural Information Processing Systems, 33:9497–9507, 2020.
  26. Generalisation in lifelong reinforcement learning through logical composition. In Deep RL Workshop NeurIPS 2021, 2021. URL https://openreview.net/forum?id=kO-Cgmasm7.
  27. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(56):1633–1685, 2009. URL http://jmlr.org/papers/v10/taylor09a.html.
  28. Emanuel Todorov. Compositionality of optimal control laws. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems, volume 22. Curran Associates, Inc., 2009. URL https://proceedings.neurips.cc/paper/2009/file/3eb71f6293a2a31f3569e10af6552658-Paper.pdf.
  29. Composing value functions in reinforcement learning. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6401–6409. PMLR, 06 2019. URL https://proceedings.mlr.press/v97/van-niekerk19a.html.
  30. Brian D. Ziebart. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. Carnegie Mellon University, 12 2010. 10.1184/R1/6720692.v1. URL https://kilthub.cmu.edu/articles/thesis/Modeling_Purposeful_Adaptive_Behavior_with_the_Principle_of_Maximum_Causal_Entropy/6720692.
Citations (2)

Summary

We haven't generated a summary for this paper yet.