Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 73 tok/s Pro
Kimi K2 199 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Creating Multi-Level Skill Hierarchies in Reinforcement Learning (2306.09980v2)

Published 16 Jun 2023 in cs.LG and cs.AI

Abstract: What is a useful skill hierarchy for an autonomous agent? We propose an answer based on a graphical representation of how the interaction between an agent and its environment may unfold. Our approach uses modularity maximisation as a central organising principle to expose the structure of the interaction graph at multiple levels of abstraction. The result is a collection of skills that operate at varying time scales, organised into a hierarchy, where skills that operate over longer time scales are composed of skills that operate over shorter time scales. The entire skill hierarchy is generated automatically, with no human intervention, including the skills themselves (their behaviour, when they can be called, and when they terminate) as well as the hierarchical dependency structure between them. In a wide range of environments, this approach generates skill hierarchies that are intuitively appealing and that considerably improve the learning performance of the agent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1-2):181–211, 1999a.
  2. D. Precup. Temporal abstraction in reinforcement learning. PhD thesis, University of Massachusetts Amherst, 2000.
  3. M. E. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E, 69(2):026113, 2004.
  4. Community structure in directed networks. Physical Review Letters, 100(11):118703, 2008.
  5. Size reduction of complex networks preserving modularity. New Journal of Physics, 9(6):176, 2007.
  6. Maximizing modularity is hard. arXiv preprint:physics/0608255, 2006.
  7. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):10008, 2008.
  8. A. Lancichinetti and S. Fortunato. Community detection algorithms: a comparative analysis. Physical Review E, 80(5):056117, 2009.
  9. Q-Cut — Dynamic discovery of sub-goals in reinforcement learning. In Proceedings of the 13th European Conference on Machine Learning, ECML ’02, pages 295–306. Springer-Verlag, 2002.
  10. Ö. Şimşek and A. G. Barto. Using relative novelty to identify useful temporal abstractions in reinforcement learning. In Proceedings of the 21st International Conference on Machine Learning, ICML ’04, pages 95–102. ACM, 2004.
  11. Ö. Şimşek and A. G. Barto. Skill characterization based on betweenness. In Advances in Neural Information Processing Systems 21, NeurIPS ’09, pages 1497–1504. Curran Associates, Inc., 2009.
  12. Automatic skill acquisition in reinforcement learning agents using connection bridge centrality. In International Conference on Future Generation Communication and Networking, pages 51–62. Springer, 2010.
  13. Automatic skill acquisition in reinforcement learning using connection graph stability centrality. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pages 697–700. IEEE, 2010.
  14. M. A. Imanian and P. Moradi. Autonomous subgoal discovery in reinforcement learning agents using bridgeness centrality measure. International Journal of Electrical and Computer Sciences, 11(5):54–62, 2011.
  15. Automatic skill acquisition in reinforcement learning using graph centrality measures. Intelligent Data Analysis, 16(1):113–135, 2012.
  16. J. H. Metzen. Online skill discovery using graph-based clustering. In Proceedings of the Tenth European Workshop on Reinforcement Learning, EWRL ’13, pages 77–88. PMLR, 2013.
  17. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the 22nd International Conference on Machine learning, ICML ’05, pages 816–823. ACM, 2005.
  18. S. J. Kazemitabar and H. Beigy. Using strongly connected components as a basis for autonomous skill acquisition in reinforcement learning. In 6th International Symposium on Neural Networks, ISSN ’09, pages 794–803. Springer, 2009.
  19. A local graph clustering algorithm for discovering subgoals in reinforcement learning. In International Conference on Future Generation Communication and Networking, pages 41–50. Springer, 2010.
  20. P.-L. Bacon and D. Precup. Using label propagation for learning temporally abstract actions in reinforcement learning. In Proceedings of the Workshop on Multiagent Interaction Networks, MAIN ’13, 2013.
  21. N. Taghizadeh and H. Beigy. A novel graphical approach to automatic abstraction in reinforcement learning. Robotics and Autonomous Systems, 61(8):821–835, 2013.
  22. A graph-theoretic approach toward autonomous skill acquisition in reinforcement learning. Evolving Systems, 9(3):227–244, 2018.
  23. Successor options: an option discovery framework for reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI ’19, pages 3304–3310. AAAI Press, 2019.
  24. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the 21st International Conference on Machine Learning, ICML ’04, pages 71–78. ACM, 2004.
  25. M. Davoodabadi and H. Beigy. A new method for discovering subgoals and constructing options in reinforcement learning. In Proceedings of the 5th Indian International Conference on Artificial Intelligence, pages 441–450, 2011.
  26. F. Shoeleh and M. Asadpour. Graph based skill acquisition and transfer learning for continuous reinforcement learning domains. Pattern Recognition Letters, 87:104–116, 2017.
  27. Constructing temporally extended actions through incremental community detection. Computational Intelligence and Neuroscience, 2018.
  28. M. D. Farahani and N. Mozayani. Automatic construction and evaluation of macro-actions in reinforcement learning. Applied Soft Computing, 82:105574, 2019.
  29. A Laplacian framework for option discovery in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning, ICML ’17, pages 2295–2304. PMLR, 2017.
  30. Discovering options for exploration by minimizing cover time. In Proceedings of the 36th International Conference on Machine Learning, ICML ’19, pages 3130–3139. PMLR, 2019.
  31. The option-critic architecture. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, 2017.
  32. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, NeurIPS ’99. MIT Press, 1999b.
  33. Learning abstract options. In Advances in Neural Information Processing Systems 31, NeurIPS ’18. Curran Associates, Inc., 2018.
  34. Multi-level discovery of deep options. arXiv preprint:1703.08294, 2017.
  35. Learning multi-level hierarchies with hindsight. In Proceedings of the 7th International Conference on Learning Representations, ICLR ’19, 2019.
  36. T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Resesearch, 13(1):227–303, 2000.
  37. Roles of macro-actions in accelerating reinforcement learning. In Grace Hopper Celebration of Women in Computing, volume 1, pages 13–18, 1997.
  38. R. S. Sutton and D. Precup. Intra-option learning about temporally abstract actions. In Proceedings of the 15th International Conference on Machine Learning, pages 556–564. Morgan Kaufman, 1998.
  39. C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, King’s College, Cambridge United Kingdom, 1989.
  40. S. Mahadevan and M. Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8(10), 2007.
  41. G. Konidaris and A. Barto. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems 22, NeurIPS ’09, pages 1015–1023. Curran Associates, Inc., 2009.
  42. D. M. Farahani and N. Mozayani. Evaluating skills in hierarchical reinforcement learning. International Journal of Machine Learning and Cybernetics, 11(10):2407–2420, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.