Offline Skill Graph (OSG): A Framework for Learning and Planning using Offline Reinforcement Learning Skills (2306.13630v1)
Abstract: Reinforcement Learning has received wide interest due to its success in competitive games. Yet, its adoption in everyday applications is limited (e.g. industrial, home, healthcare, etc.). In this paper, we address this limitation by presenting a framework for planning over offline skills and solving complex tasks in real-world environments. Our framework is comprised of three modules that together enable the agent to learn from previously collected data and generalize over it to solve long-horizon tasks. We demonstrate our approach by testing it on a robotic arm that is required to solve complex tasks.
- D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” nature, vol. 550, no. 7676, pp. 354–359, 2017.
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
- M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling, “The arcade learning environment: An evaluation platform for general agents,” Journal of Artificial Intelligence Research, vol. 47, pp. 253–279, 2013.
- S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
- A. Bagaria, J. K. Senthil, and G. Konidaris, “Skill discovery for exploration and planning using deep skill graphs,” in International Conference on Machine Learning. PMLR, 2021, pp. 521–531.
- R. E. Korf, “Learning to solve problems by searching for macro-operators,” CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE, Tech. Rep., 1983.
- V. Gullapalli, “Reinforcement learning and its application to control,” Ph.D. dissertation, University of Massachusetts at Amherst, 1992.
- G. A. Iba, “A heuristic approach to the discovery of macro-operators,” Machine Learning, vol. 3, no. 4, pp. 285–317, 1989.
- A. McGovern, R. S. Sutton, and A. H. Fagg, “Roles of macro-actions in accelerating reinforcement learning,” in Grace Hopper celebration of women in computing, vol. 1317, 1997, p. 15.
- A. McGovern and R. S. Sutton, “Macro-actions in reinforcement learning: An empirical analysis,” Computer Science Department Faculty Publication Series, p. 15, 1998.
- R. S. Sutton, “Between mdps and semi-mdps: Learning, planning, and representing knowledge at multiple temporal scales,” 1998.
- R. S. Sutton, S. P. Singh, D. Precup, and B. Ravindran, “Improved switching among temporally abstract actions,” in Advances in neural information processing systems, 1999, pp. 1066–1072.
- R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial intelligence, vol. 112, no. 1-2, pp. 181–211, 1999.
- G. Konidaris and A. G. Barto, “Skill discovery in continuous reinforcement learning domains using skill chaining,” in Advances in neural information processing systems, 2009, pp. 1015–1023.
- G. Konidaris, S. Kuindersma, R. Grupen, and A. Barto, “Cst: Constructing skill trees by demonstration,” Doctoral Dissertations, University of Massachuts, Ahmrest, 2011.
- D. Silver and K. Ciosek, “Compositional planning using optimal option models,” arXiv preprint arXiv:1206.6473, 2012.
- Y. Jinnai, D. Abel, D. Hershkowitz, M. Littman, and G. Konidaris, “Finding options that minimize planning time,” in International Conference on Machine Learning, 2019, pp. 3120–3129.
- S. James, B. Rosman, and G. Konidaris, “Learning to plan with portable symbols,” in Workshop on Planning and Learning (PAL@ ICML/IJCAI/AAMAS), 2018.
- A. G. Francis and A. Ram, “The utility problem in case-based reasoning,” in Case-Based Reasoning: Papers from the 1993 Workshop, 1993, pp. 160–161.
- G. Konidaris, “Constructing abstraction hierarchies using a skill-symbol loop,” in IJCAI: proceedings of the conference, vol. 2016. NIH Public Access, 2016, p. 1648.
- A. Sharma, S. Gu, S. Levine, V. Kumar, and K. Hausman, “Dynamics-aware unsupervised discovery of skills,” arXiv preprint arXiv:1907.01657, 2019.
- J. Liang, M. Sharma, A. LaGrassa, S. Vats, S. Saxena, and O. Kroemer, “Search-based task planning with learned skill effect models for lifelong robotic manipulation,” arXiv preprint arXiv:2109.08771, 2021.
- S. Fujimoto and S. S. Gu, “A minimalist approach to offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
- A. Singh, A. Yu, J. Yang, J. Zhang, A. Kumar, and S. Levine, “Cog: Connecting new skills to past experience with offline reinforcement learning,” arXiv preprint arXiv:2010.14500, 2020.
- A. Gupta, V. Kumar, C. Lynch, S. Levine, and K. Hausman, “Relay policy learning: Solving long horizon tasks via imitation and reinforcement learning,” Conference on Robot Learning (CoRL), 2019.
- E. Coumans and Y. Bai, “Pybullet, a python module for physics simulation for games, robotics and machine learning,” http://pybullet.org, 2016–2019.
- A. Goldenberg, B. Benhabib, and R. Fenton, “A complete generalized solution to the inverse kinematics of robots,” IEEE Journal on Robotics and Automation, vol. 1, no. 1, pp. 14–20, 1985.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.