Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions (2410.18416v1)

Published 24 Oct 2024 in cs.LG and cs.RO

Abstract: Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover diverse states. However, in complex environments with many state factors (e.g., household environments with many objects), learning skills that cover all possible states is impossible, and naively encouraging state diversity often leads to simple skills that are not ideal for solving downstream tasks. This work introduces Skill Discovery from Local Dependencies (Skild), which leverages state factorization as a natural inductive bias to guide the skill learning process. The key intuition guiding Skild is that skills that induce <b>diverse interactions</b> between state factors are often more valuable for solving downstream tasks. To this end, Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that effectively induce different interactions within an environment. We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain, where Skild successfully learns skills with clear semantic meaning and shows superior performance compared to existing unsupervised reinforcement learning methods that only maximize state coverage.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Modular multitask reinforcement learning with policy sketches. In International conference on machine learning, pages 166–175. PMLR, 2017.
  2. Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
  3. The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
  4. Effectively learning initiation sets in hierarchical reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  5. A causal analysis of harm. Advances in Neural Information Processing Systems, 35:2365–2376, 2022.
  6. From dependency to causality: a machine learning approach. J. Mach. Learn. Res., 16(1):2437–2457, 2015.
  7. The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5920–5929, 2023.
  8. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1–94, 1999.
  9. Context-specific independence in bayesian networks. arXiv preprint arXiv:1302.3562, 2013.
  10. A causal approach to tool affordance learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8394–8399. IEEE, 2020.
  11. Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pages 1317–1327. PMLR, 2020.
  12. Specializing versatile skill libraries using local mixture of experts. In Conference on Robot Learning, pages 1423–1433. PMLR, 2022.
  13. Hypothesis-driven skill discovery for hierarchical deep reinforcement learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5572–5579. IEEE, 2020.
  14. Granger-causal hierarchical skill discovery. arXiv preprint arXiv:2306.09509, 2023.
  15. Automated discovery of functional actual causes in complex environments. arXiv preprint arXiv:2404.10883, 2024.
  16. Attention option-critic. arXiv preprint arXiv:2201.02628, 2022.
  17. Disentangling controlled effects for hierarchical reinforcement learning. In Bernhard Schölkopf, Caroline Uhler, and Kun Zhang, editors, Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 of Proceedings of Machine Learning Research, pages 178–200. PMLR, 11–13 Apr 2022. URL https://proceedings.mlr.press/v177/corcoll22a.html.
  18. What can ai learn from human exploration? intrinsically-motivated humans and agents in open-world exploration. In NeurIPS 2023 workshop: Information-Theoretic Principles in Cognitive Systems, 2023.
  19. Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
  20. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930):47–53, 2022.
  21. Learning dynamic attribute-factored world models for efficient multi-object reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
  22. Clic: Curriculum learning and imitation for object control in nonrewarding environments. IEEE Transactions on Cognitive and Developmental Systems, 13(2):239–248, 2019.
  23. Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 1851–1860. PMLR, 2018.
  24. Joseph Y Halpern. Actual causality. MIT Press, 2016.
  25. Causes and explanations: A structural-model approach. part i: Causes. The British journal for the philosophy of science, 2005.
  26. When waiting is not an option: Learning options with a deliberation cost. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
  27. Disentangled unsupervised skill discovery for efficient hierarchical reinforcement learning. In Workshop on Reinforcement Learning Beyond Rewards@ Reinforcement Learning Conference 2024.
  28. Causal policy gradient for whole-body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023.
  29. Causality-driven hierarchical structure discovery for reinforcement learning. Advances in Neural Information Processing Systems, 35:20064–20076, 2022.
  30. Planning for multi-object manipulation with graph neural network relational classifiers. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1822–1829. IEEE, 2023.
  31. Object-centric slot diffusion. arXiv preprint arXiv:2303.10834, 2023.
  32. Mini-behavior: A procedurally generated benchmark for long-horizon decision-making in embodied ai. arXiv preprint arXiv:2310.01824, 2023.
  33. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
  34. Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4444–4451, 2020.
  35. Unsupervised skill discovery with bottleneck option learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5572–5582. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/kim21j.html.
  36. Deep Laplacian-based options for temporally-extended exploration. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17198–17217. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/klissarov23a.html.
  37. Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22, 2022. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2022.03.003. URL https://www.sciencedirect.com/science/article/pii/S1566253522000288.
  38. Urlb: Unsupervised reinforcement learning benchmark, 2021.
  39. Cic: Contrastive intrinsic control for unsupervised skill discovery. arXiv preprint arXiv:2202.00161, 2022.
  40. Hierarchical reinforcement learning with hindsight. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryzECoAcY7.
  41. Hierarchical empowerment: Towards tractable empowerment-based skill-learning. arXiv preprint arXiv:2307.02728, 2023.
  42. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks, 2021.
  43. igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 455–465. PMLR, 08–11 Nov 2022. URL https://proceedings.mlr.press/v164/li22b.html.
  44. Dynamics-aware quality-diversity for efficient learning of skill repertoires. In 2022 International Conference on Robotics and Automation (ICRA), pages 5360–5366. IEEE, 2022.
  45. Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems, 34:18459–18473, 2021.
  46. Learning to identify critical states for reinforcement learning from videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1955–1965, 2023.
  47. Weakly-supervised disentanglement without compromises. In International Conference on Machine Learning, pages 6348–6359. PMLR, 2020.
  48. Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems, 31, 2018.
  49. Lipschitz-constrained unsupervised skill discovery. In International Conference on Learning Representations, 2021.
  50. Controllability-aware unsupervised skill discovery. arXiv preprint arXiv:2302.05103, 2023.
  51. End-to-end hierarchical reinforcement learning with integrated subgoal discovery. IEEE Transactions on Neural Networks and Learning Systems, 33(12):7778–7790, 2021.
  52. Judea Pearl. Causality. Cambridge university press, 2009.
  53. Counterfactual data augmentation using locally factored dynamics. Advances in Neural Information Processing Systems, 33:3976–3990, 2020.
  54. Mocoda: Model-based counterfactual data augmentation. Advances in Neural Information Processing Systems, 35:18143–18156, 2022.
  55. Exploiting contextual independence in probabilistic inference. Journal of Artificial Intelligence Research, 18:263–313, 2003.
  56. Learning abstract world models for value-preserving planning with options. In NeurIPS 2023 Workshop on Generalization in Planning, 2023.
  57. Proximal policy optimization algorithms, 2017.
  58. Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems, 34:22905–22918, 2021.
  59. Learning disentangled skills for hierarchical reinforcement learning through trajectory autoencoder with weak labels. Expert Systems with Applications, page 120625, 2023.
  60. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  61. Deep reinforcement learning for robotics: A survey of real-world successes. arXiv preprint arXiv:2408.03539, 2024.
  62. Feudal networks for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 3540–3549. PMLR, 2017.
  63. Elden: Exploration via local dependencies. Advances in Neural Information Processing Systems, 36, 2024.
  64. Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL http://jmlr.org/papers/v23/21-1127.html.
  65. Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
  66. Self-supervised visual reinforcement learning with object-centric representations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=xppLmXCbOw1.
  67. Hierarchical reinforcement learning by discovering intrinsic options. arXiv preprint arXiv:2101.06521, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Zizhao Wang (18 papers)
  2. Jiaheng Hu (16 papers)
  3. Caleb Chuck (11 papers)
  4. Stephen Chen (9 papers)
  5. Roberto Martín-Martín (79 papers)
  6. Amy Zhang (99 papers)
  7. Scott Niekum (67 papers)
  8. Peter Stone (184 papers)

Summary

  • The paper introduces SkiLD, a novel unsupervised method that discovers skills by leveraging local dependencies among state factors.
  • It combines a state dependency graph with a diversity indicator to guide exploration and enhance skill utility.
  • Empirical results show SkiLD outperforms methods like DIAYN and CSD on complex downstream tasks.

Analysis of "SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions"

This essay provides an exposition of the paper titled "SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions." The authors present SkiLD, a novel methodology for unsupervised skill discovery in complex environments characterized by multiple state factors. The approach aims to enhance skill diversity and applicability by leveraging interactions between state factors rather than solely focusing on achieving diverse states.

Technical Overview

SkiLD tackles the limitations of existing unsupervised skill discovery methods, which typically emphasize state diversity but struggle in environments with myriad state factors. In such scenarios, achieving thorough state coverage is computationally infeasible and results in simplistic skills less relevant to downstream tasks.

To address these issues, SkiLD introduces Skill Discovery from Local Dependencies, leveraging state factorization to facilitate the learning process. The core intuition is that skills inciting diverse interactions among state factors are more valuable for downstream tasks. SkiLD incorporates a novel skill learning objective to explicitly cultivate these interaction-inducing skills.

Methodological Contributions

  1. Skill Representation: Skills are defined as a combination of a state-specific dependency graph and a diversity indicator. The dependency graph encodes desired interactions among state factors during skill execution.
  2. Graph-Selection Policy: A high-level policy selects target dependency graphs, guiding the exploration and training of the skill policy. This approach ensures exploration focuses on both novel and underdeveloped skills.
  3. Skill Policy: Conditioned on the chosen skills, the low-level skill policy learns to actualize the desired interactions, augmented by a diversity reward to ensure reachability of varied states for each interaction type.
  4. Learning Local Dependencies: SkiLD identifies local dependencies using a learned dynamics model, employing pointwise conditional mutual information to ascertain interactions.

Empirical Evaluation

The proposed SkiLD is evaluated in environments with considerable state complexity, such as realistic household simulations and long-horizon tasks. Numerical results reveal that SkiLD outperforms existing methods like DIAYN and CSD in terms of interaction diversity and downstream task performance, achieving superior success rates across diverse complex tasks like mixing ingredients or manipulating household objects.

Implications and Future Work

Practically, SkiLD offers a structured approach for developing skill repertoires in environments with numerous interactive elements, such as robotics and digital game worlds. Theoretically, it opens pathways for integrating more sophisticated causality models to further delineate and leverage the intricacies of state space factorization in RL environments.

Future research could extend SkiLD's applicability by employing advanced disentangled representation learning techniques to overcome the assumption of accessible state factorization. Furthermore, improvements in local dependency identification methods might enhance the robustness and applicability of the framework across varied domains.

In summary, SkiLD represents a significant advancement in unsupervised skill discovery, enriching the agent's skill set by focusing on factor interactions rather than mere state diversity. This work contributes to bridging the gap between unsupervised skill discovery and its effective utilization in complex real-world tasks.

X Twitter Logo Streamline Icon: https://streamlinehq.com