SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions (2410.18416v1)
Abstract: Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free environment interaction. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover diverse states. However, in complex environments with many state factors (e.g., household environments with many objects), learning skills that cover all possible states is impossible, and naively encouraging state diversity often leads to simple skills that are not ideal for solving downstream tasks. This work introduces Skill Discovery from Local Dependencies (Skild), which leverages state factorization as a natural inductive bias to guide the skill learning process. The key intuition guiding Skild is that skills that induce <b>diverse interactions</b> between state factors are often more valuable for solving downstream tasks. To this end, Skild develops a novel skill learning objective that explicitly encourages the mastering of skills that effectively induce different interactions within an environment. We evaluate Skild in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain, where Skild successfully learns skills with clear semantic meaning and shows superior performance compared to existing unsupervised reinforcement learning methods that only maximize state coverage.
- Modular multitask reinforcement learning with policy sketches. In International conference on machine learning, pages 166–175. PMLR, 2017.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- The option-critic architecture. In Proceedings of the AAAI conference on artificial intelligence, volume 31, 2017.
- Effectively learning initiation sets in hierarchical reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- A causal analysis of harm. Advances in Neural Information Processing Systems, 35:2365–2376, 2022.
- From dependency to causality: a machine learning approach. J. Mach. Learn. Res., 16(1):2437–2457, 2015.
- The perils of trial-and-error reward design: misdesign through overfitting and invalid task specifications. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 5920–5929, 2023.
- Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11:1–94, 1999.
- Context-specific independence in bayesian networks. arXiv preprint arXiv:1302.3562, 2013.
- A causal approach to tool affordance learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 8394–8399. IEEE, 2020.
- Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pages 1317–1327. PMLR, 2020.
- Specializing versatile skill libraries using local mixture of experts. In Conference on Robot Learning, pages 1423–1433. PMLR, 2022.
- Hypothesis-driven skill discovery for hierarchical deep reinforcement learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5572–5579. IEEE, 2020.
- Granger-causal hierarchical skill discovery. arXiv preprint arXiv:2306.09509, 2023.
- Automated discovery of functional actual causes in complex environments. arXiv preprint arXiv:2404.10883, 2024.
- Attention option-critic. arXiv preprint arXiv:2201.02628, 2022.
- Disentangling controlled effects for hierarchical reinforcement learning. In Bernhard Schölkopf, Caroline Uhler, and Kun Zhang, editors, Proceedings of the First Conference on Causal Learning and Reasoning, volume 177 of Proceedings of Machine Learning Research, pages 178–200. PMLR, 11–13 Apr 2022. URL https://proceedings.mlr.press/v177/corcoll22a.html.
- What can ai learn from human exploration? intrinsically-motivated humans and agents in open-world exploration. In NeurIPS 2023 workshop: Information-Theoretic Principles in Cognitive Systems, 2023.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930):47–53, 2022.
- Learning dynamic attribute-factored world models for efficient multi-object reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- Clic: Curriculum learning and imitation for object control in nonrewarding environments. IEEE Transactions on Cognitive and Developmental Systems, 13(2):239–248, 2019.
- Latent space policies for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 1851–1860. PMLR, 2018.
- Joseph Y Halpern. Actual causality. MIT Press, 2016.
- Causes and explanations: A structural-model approach. part i: Causes. The British journal for the philosophy of science, 2005.
- When waiting is not an option: Learning options with a deliberation cost. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Disentangled unsupervised skill discovery for efficient hierarchical reinforcement learning. In Workshop on Reinforcement Learning Beyond Rewards@ Reinforcement Learning Conference 2024.
- Causal policy gradient for whole-body mobile manipulation. arXiv preprint arXiv:2305.04866, 2023.
- Causality-driven hierarchical structure discovery for reinforcement learning. Advances in Neural Information Processing Systems, 35:20064–20076, 2022.
- Planning for multi-object manipulation with graph neural network relational classifiers. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 1822–1829. IEEE, 2023.
- Object-centric slot diffusion. arXiv preprint arXiv:2303.10834, 2023.
- Mini-behavior: A procedurally generated benchmark for long-horizon decision-making in embodied ai. arXiv preprint arXiv:2310.01824, 2023.
- Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023.
- Options of interest: Temporal abstraction with interest functions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4444–4451, 2020.
- Unsupervised skill discovery with bottleneck option learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5572–5582. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/kim21j.html.
- Deep Laplacian-based options for temporally-extended exploration. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 17198–17217. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/klissarov23a.html.
- Exploration in deep reinforcement learning: A survey. Information Fusion, 85:1–22, 2022. ISSN 1566-2535. doi: https://doi.org/10.1016/j.inffus.2022.03.003. URL https://www.sciencedirect.com/science/article/pii/S1566253522000288.
- Urlb: Unsupervised reinforcement learning benchmark, 2021.
- Cic: Contrastive intrinsic control for unsupervised skill discovery. arXiv preprint arXiv:2202.00161, 2022.
- Hierarchical reinforcement learning with hindsight. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryzECoAcY7.
- Hierarchical empowerment: Towards tractable empowerment-based skill-learning. arXiv preprint arXiv:2307.02728, 2023.
- igibson 2.0: Object-centric simulation for robot learning of everyday household tasks, 2021.
- igibson 2.0: Object-centric simulation for robot learning of everyday household tasks. In Aleksandra Faust, David Hsu, and Gerhard Neumann, editors, Proceedings of the 5th Conference on Robot Learning, volume 164 of Proceedings of Machine Learning Research, pages 455–465. PMLR, 08–11 Nov 2022. URL https://proceedings.mlr.press/v164/li22b.html.
- Dynamics-aware quality-diversity for efficient learning of skill repertoires. In 2022 International Conference on Robotics and Automation (ICRA), pages 5360–5366. IEEE, 2022.
- Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems, 34:18459–18473, 2021.
- Learning to identify critical states for reinforcement learning from videos. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1955–1965, 2023.
- Weakly-supervised disentanglement without compromises. In International Conference on Machine Learning, pages 6348–6359. PMLR, 2020.
- Data-efficient hierarchical reinforcement learning. Advances in neural information processing systems, 31, 2018.
- Lipschitz-constrained unsupervised skill discovery. In International Conference on Learning Representations, 2021.
- Controllability-aware unsupervised skill discovery. arXiv preprint arXiv:2302.05103, 2023.
- End-to-end hierarchical reinforcement learning with integrated subgoal discovery. IEEE Transactions on Neural Networks and Learning Systems, 33(12):7778–7790, 2021.
- Judea Pearl. Causality. Cambridge university press, 2009.
- Counterfactual data augmentation using locally factored dynamics. Advances in Neural Information Processing Systems, 33:3976–3990, 2020.
- Mocoda: Model-based counterfactual data augmentation. Advances in Neural Information Processing Systems, 35:18143–18156, 2022.
- Exploiting contextual independence in probabilistic inference. Journal of Artificial Intelligence Research, 18:263–313, 2003.
- Learning abstract world models for value-preserving planning with options. In NeurIPS 2023 Workshop on Generalization in Planning, 2023.
- Proximal policy optimization algorithms, 2017.
- Causal influence detection for improving efficiency in reinforcement learning. Advances in Neural Information Processing Systems, 34:22905–22918, 2021.
- Learning disentangled skills for hierarchical reinforcement learning through trajectory autoencoder with weak labels. Expert Systems with Applications, page 120625, 2023.
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
- Deep reinforcement learning for robotics: A survey of real-world successes. arXiv preprint arXiv:2408.03539, 2024.
- Feudal networks for hierarchical reinforcement learning. In International Conference on Machine Learning, pages 3540–3549. PMLR, 2017.
- Elden: Exploration via local dependencies. Advances in Neural Information Processing Systems, 36, 2024.
- Tianshou: A highly modularized deep reinforcement learning library. Journal of Machine Learning Research, 23(267):1–6, 2022. URL http://jmlr.org/papers/v23/21-1127.html.
- Outracing champion gran turismo drivers with deep reinforcement learning. Nature, 602(7896):223–228, 2022.
- Self-supervised visual reinforcement learning with object-centric representations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=xppLmXCbOw1.
- Hierarchical reinforcement learning by discovering intrinsic options. arXiv preprint arXiv:2101.06521, 2021.
- Zizhao Wang (18 papers)
- Jiaheng Hu (16 papers)
- Caleb Chuck (11 papers)
- Stephen Chen (9 papers)
- Roberto Martín-Martín (79 papers)
- Amy Zhang (99 papers)
- Scott Niekum (67 papers)
- Peter Stone (184 papers)