Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 67 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 16 tok/s Pro
GPT-4o 92 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 461 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills (2306.13630v1)

Published 23 Jun 2023 in cs.RO, cs.AI, and cs.LG

Abstract: Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together enable the agent to learn from previously collected data and generalize over it to solve long-horizon tasks. We demonstrate our approach by testing it on a robotic arm that is required to solve complex tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
  2. V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
  3. M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.
  4. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
  5. A. Bagaria, J. K. Senthil, and G. Konidaris, “Skill discovery for exploration and planning using deep skill graphs,” in International Conference on Machine Learning.   PMLR, 2021, pp. 521–531.
  6. R. E. Korf, “Learning to solve problems by searching for macro-operators,” CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE, Tech. Rep., 1983.
  7. V. Gullapalli, “Reinforcement learning and its application to control,” Ph.D. dissertation, University of Massachusetts at Amherst, 1992.
  8. G. A. Iba, “A heuristic approach to the discovery of macro-operators,” Machine Learning, vol. 3, no. 4, pp. 285–317, 1989.
  9. A. McGovern, R. S. Sutton, and A. H. Fagg, “Roles of macro-actions in accelerating reinforcement learning,” in Grace Hopper celebration of women in computing, vol. 1317, 1997, p. 15.
  10. A. McGovern and R. S. Sutton, “Macro-actions in reinforcement learning: An empirical analysis,” Computer Science Department Faculty Publication Series, p. 15, 1998.
  11. R. S. Sutton, “Between mdps and semi-mdps: Learning, planning, and representing knowledge at multiple temporal scales,” 1998.
  12. R. S. Sutton, S. P. Singh, D. Precup, and B. Ravindran, “Improved switching among temporally abstract actions,” in Advances in neural information processing systems, 1999, pp. 1066–1072.
  13. R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial intelligence, vol. 112, no. 1-2, pp. 181–211, 1999.
  14. G. Konidaris and A. G. Barto, “Skill discovery in continuous reinforcement learning domains using skill chaining,” in Advances in neural information processing systems, 2009, pp. 1015–1023.
  15. G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “Cst: Constructing skill trees by demonstration,” Doctoral Dissertations, University of Massachuts, Ahmrest, 2011.
  16. D. Silver and K. Ciosek, “Compositional planning using optimal option models,” arXiv preprint arXiv:1206.6473, 2012.
  17. Y. Jinnai, D. Abel, D. Hershkowitz, M. Littman, and G. Konidaris, “Finding options that minimize planning time,” in International Conference on Machine Learning, 2019, pp. 3120–3129.
  18. S. James, B. Rosman, and G. Konidaris, “Learning to plan with portable symbols,” in Workshop on Planning and Learning (PAL@ ICML/IJCAI/AAMAS), 2018.
  19. A. G. Francis and A. Ram, “The utility problem in case-based reasoning,” in Case-Based Reasoning: Papers from the 1993 Workshop, 1993, pp. 160–161.
  20. G. Konidaris, “Constructing abstraction hierarchies using a skill-symbol loop,” in IJCAI: proceedings of the conference, vol. 2016.   NIH Public Access, 2016, p. 1648.
  21. A. Sharma, S. Gu, S. Levine, V. Kumar, and K. Hausman, “Dynamics-aware unsupervised discovery of skills,” arXiv preprint arXiv:1907.01657, 2019.
  22. J. Liang, M. Sharma, A. LaGrassa, S. Vats, S. Saxena, and O. Kroemer, “Search-based task planning with learned skill effect models for lifelong robotic manipulation,” arXiv preprint arXiv:2109.08771, 2021.
  23. S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  24. A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
  25. A. Singh, A. Yu, J. Yang, J. Zhang, A. Kumar, and S. Levine, “Cog: Connecting new skills to past experience with offline reinforcement learning,” arXiv preprint arXiv:2010.14500, 2020.
  26. A. Gupta, V. Kumar, C. Lynch, S. Levine, and K. Hausman, “Relay policy learning: Solving long horizon tasks via imitation and reinforcement learning,” Conference on Robot Learning (CoRL), 2019.
  27. E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2019.
  28. A. Goldenberg, B. Benhabib, and R. Fenton, “A complete generalized solution to the inverse kinematics of robots,” IEEE Journal on Robotics and Automation, vol. 1, no. 1, pp. 14–20, 1985.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube