Goal Exploration via Adaptive Skill Distribution for Goal-Conditioned Reinforcement Learning (2404.12999v1)
Abstract: Exploration efficiency poses a significant challenge in goal-conditioned reinforcement learning (GCRL) tasks, particularly those with long horizons and sparse rewards. A primary limitation to exploration efficiency is the agent's inability to leverage environmental structural patterns. In this study, we introduce a novel framework, GEASD, designed to capture these patterns through an adaptive skill distribution during the learning process. This distribution optimizes the local entropy of achieved goals within a contextual horizon, enhancing goal-spreading behaviors and facilitating deep exploration in states containing familiar structural patterns. Our experiments reveal marked improvements in exploration efficiency using the adaptive skill distribution compared to a uniform skill distribution. Additionally, the learned skill distribution demonstrates robust generalization capabilities, achieving substantial exploration progress in unseen tasks containing similar local structures.
- Never give up: Learning directed exploration strategies. arXiv preprint arXiv:2002.06038, 2020.
- Variational dynamic for self-supervised exploration in deep reinforcement learning. IEEE Transactions on neural networks and learning systems, 2021.
- Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems, 29, 2016.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Learning with amigo: Adversarially motivated intrinsic goals. arXiv preprint arXiv:2006.12122, 2020.
- Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pages 1317–1327. PMLR, 2020.
- Learning phrase representations using RNN encoder–decoder for statistical machine translation. In Alessandro Moschitti, Bo Pang, and Walter Daelemans, editors, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734, Doha, Qatar, October 2014. Association for Computational Linguistics. doi: 10.3115/v1/D14-1179. URL https://aclanthology.org/D14-1179.
- Dora the explorer: Directed outreaching reinforcement action-selection. arXiv preprint arXiv:1804.04012, 2018.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Stochastic neural networks for hierarchical reinforcement learning. arXiv preprint arXiv:1704.03012, 2017.
- Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pages 1515–1528. PMLR, 2018.
- Hierarchical skills for efficient exploration. Advances in Neural Information Processing Systems, 34:11553–11564, 2021.
- Skill-critic: Refining learned skills for reinforcement learning. arXiv preprint arXiv:2306.08388, 2023.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Successor feature landmarks for long-horizon goal-conditioned reinforcement learning. Advances in Neural Information Processing Systems, 34:26963–26975, 2021.
- Planning goals for exploration. arXiv preprint arXiv:2303.13002, 2023.
- Emi: Exploration with mutual information. arXiv preprint arXiv:1810.01176, 2018.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Count-based exploration with the successor representation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 5125–5133, 2020.
- Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
- Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016.
- Deep exploration via randomized value functions. J. Mach. Learn. Res., 20(124):1–62, 2019.
- Count-based exploration with neural density models. In International conference on machine learning, pages 2721–2730. PMLR, 2017.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Self-supervised exploration via disagreement. In International conference on machine learning, pages 5062–5071. PMLR, 2019.
- Accelerating reinforcement learning with learned skill priors. In Conference on robot learning, pages 188–204. PMLR, 2021.
- Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In International Conference on Machine Learning, pages 7750–7761. PMLR, 2020.
- Skew-fit: State-covering self-supervised reinforcement learning. arXiv preprint arXiv:1903.03698, 2019.
- Murray Rosenblatt. Remarks on some nonparametric estimates of a density function. The annals of mathematical statistics, pages 832–837, 1956.
- Episodic curiosity through reachability. arXiv preprint arXiv:1810.02274, 2018.
- Value function spaces: Skill-centric state abstractions for long-horizon reasoning. arXiv preprint arXiv:2111.03189, 2021.
- Model-based active exploration. In International conference on machine learning, pages 5779–5788. PMLR, 2019.
- Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv:1507.00814, 2015.
- Deep curiosity search: Intra-life exploration improves performance on challenging deep reinforcement learning problems. arXiv preprint arXiv:1806.00553, 2018.
- # exploration: A study of count-based exploration for deep reinforcement learning. Advances in neural information processing systems, 30, 2017.
- Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards. Advances in Neural Information Processing Systems, 32, 2019.
- Lisheng Wu and Ke Chen. Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning. Machine Learning, pages 1–31, 2024.
- Aspire: Adaptive skill priors for reinforcement learning. Advances in Neural Information Processing Systems, 35:38600–38613, 2022.
- Noveld: A simple yet effective exploration criterion. Advances in Neural Information Processing Systems, 34:25217–25230, 2021.