Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning (2307.05209v4)

Published 11 Jul 2023 in cs.AI and cs.LG

Abstract: Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning, March 2021a. URL http://arxiv.org/abs/2101.05265.
  2. Deep Reinforcement Learning at the Edge of the Statistical Precipice. In Advances in Neural Information Processing Systems, volume 34, pages 29304–29320. Curran Associates, Inc., 2021b. URL https://proceedings.neurips.cc/paper/2021/hash/f514cec81cb148559cf475e7426eed5e-Abstract.html.
  3. Richard Bellman. A Markovian Decision Process. Indiana University Mathematics Journal, 6(4):679–684, 1957. ISSN 0022-2518. doi: 10.1512/iumj.1957.6.56038. URL http://www.iumj.indiana.edu/IUMJ/fulltext.php?artid=56038&year=1957&volume=6.
  4. Contextualize Me – The Case for Context in Reinforcement Learning, February 2022. URL http://arxiv.org/abs/2202.04500.
  5. OpenAI Gym, June 2016. URL http://arxiv.org/abs/1606.01540.
  6. Reward Machines for Vision-Based Robotic Manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 14284–14290, Xi’an, China, May 2021. IEEE. ISBN 978-1-72819-077-8. doi: 10.1109/ICRA48506.2021.9561927. URL https://ieeexplore.ieee.org/document/9561927/.
  7. A System for General In-Hand Object Re-Orientation. In Conference on Robot Learning, November 2021. URL https://openreview.net/forum?id=7uSBJDoP7tY.
  8. Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results, May 2022. URL http://arxiv.org/abs/2107.04982. arXiv:2107.04982 [cs].
  9. Re-understanding Finite-State Representations of Recurrent Policy Networks. In Proceedings of the 38th International Conference on Machine Learning, pages 2388–2397. PMLR, July 2021. URL https://proceedings.mlr.press/v139/danesh21a.html.
  10. Offline Meta Reinforcement Learning – Identifiability Challenges and Effective Data Collection Strategies. In Advances in Neural Information Processing Systems, volume 34, pages 4607–4618. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/248024541dbda1d3fd75fe49d1a4df4d-Abstract.html.
  11. RL\ˆ2\: Fast Reinforcement Learning via Slow Reinforcement Learning. arXiv:1611.02779 [cs, stat], November 2016. URL http://arxiv.org/abs/1611.02779.
  12. Context-Adaptive Reinforcement Learning using Unsupervised Learning of Context Variables. In NeurIPS 2020 Workshop on Pre-registration in Machine Learning, pages 236–254. PMLR, July 2021. URL https://proceedings.mlr.press/v148/eghbal-zadeh21a.html.
  13. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. arXiv:1703.03400 [cs], July 2017. URL http://arxiv.org/abs/1703.03400.
  14. Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity, October 2022. URL http://arxiv.org/abs/2210.09579.
  15. Contextual Markov Decision Processes. arXiv:1502.02259 [cs, stat], February 2015. URL http://arxiv.org/abs/1502.02259.
  16. Symbolic Plans as High-Level Instructions for Reinforcement Learning. Proceedings of the International Conference on Automated Planning and Scheduling, 30:540–550, June 2020. ISSN 2334-0843. URL https://ojs.aaai.org/index.php/ICAPS/article/view/6750.
  17. Adam: A Method for Stochastic Optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  18. A Survey of Generalisation in Deep Reinforcement Learning. arXiv:2111.09794 [cs], December 2021. URL http://arxiv.org/abs/2111.09794.
  19. John Langford. Contextual reinforcement learning. In 2017 IEEE International Conference on Big Data (Big Data), pages 3–3, December 2017. doi: 10.1109/BigData.2017.8257902.
  20. John Langford. Real World Reinforcement Learning, June 2018. URL https://www.youtube.com/watch?v=zr6H4kR8vTg. Place: MLiTRW 2018, Criteo Paris.
  21. AI Safety Gridworlds, November 2017. URL http://arxiv.org/abs/1711.09883.
  22. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, February 2015. ISSN 1476-4687. doi: 10.1038/nature14236. URL https://www.nature.com/articles/nature14236.
  23. Policy invariance under reward transformations: Theory and application to reward shaping. In In Proceedings of the Sixteenth International Conference on Machine Learning, pages 278–287. Morgan Kaufmann, 1999.
  24. Reinforcement-learning optimal control for type-1 diabetes. In 2018 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pages 333–336, March 2018. doi: 10.1109/BHI.2018.8333436.
  25. Agent Modelling under Partial Observability for Deep Reinforcement Learning, November 2021. URL http://arxiv.org/abs/2006.09447.
  26. Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, December 2020. ISSN 1476-4687. doi: 10.1038/s41586-020-03051-4. URL https://www.nature.com/articles/s41586-020-03051-4.
  27. DeepAveragers: Offline Reinforcement Learning by Solving Derived Non-Parametric MDPs, October 2020. URL http://arxiv.org/abs/2010.08891.
  28. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  29. Value Iteration Networks, March 2017. URL http://arxiv.org/abs/1602.02867.
  30. Transfer Learning for Reinforcement Learning Domains: A Survey. Journal of Machine Learning Research, 10(56):1633–1685, 2009. ISSN 1533-7928. URL http://jmlr.org/papers/v10/taylor09a.html.
  31. PettingZoo: Gym for Multi-Agent Reinforcement Learning, October 2021. URL http://arxiv.org/abs/2009.14471.
  32. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, pages 2107–2116. PMLR, July 2018. URL https://proceedings.mlr.press/v80/icarte18a.html.
  33. Learning Reward Machines for Partially Observable Reinforcement Learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://papers.nips.cc/paper/2019/hash/532435c44bec236b471a47a88d63513d-Abstract.html.
  34. Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning. Journal of Artificial Intelligence Research, 73:173–208, January 2022. ISSN 1076-9757. doi: 10.1613/jair.1.12440. URL https://www.jair.org/index.php/jair/article/view/12440.
  35. Lisa A. Torrey and J. Shavlik. Chapter 11 Transfer Learning. undefined, 2009. URL https://www.semanticscholar.org/paper/Chapter-11-Transfer-Learning-Torrey-Shavlik/1890c124749d00cce965e0b9495eafe127e16a26.
  36. Deep reinforcement learning for automated radiation adaptation in lung cancer. Medical Physics, 44(12):6690–6705, December 2017. ISSN 2473-4209. doi: 10.1002/mp.12625.
  37. Learning to reinforcement learn. arXiv:1611.05763 [cs, stat], January 2017. URL http://arxiv.org/abs/1611.05763.
  38. A Study on Overfitting in Deep Reinforcement Learning, April 2018. URL http://arxiv.org/abs/1804.06893.
  39. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning, February 2020. URL http://arxiv.org/abs/1910.08348.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Guy Azran (2 papers)
  2. Mohamad H. Danesh (10 papers)
  3. Stefano V. Albrecht (73 papers)
  4. Sarah Keren (11 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com