Balancing Both Behavioral Quality and Diversity in Unsupervised Skill Discovery (2309.17203v2)
Abstract: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Unsupervised skill discovery seeks to dig out diverse and exploratory skills without extrinsic reward, with the discovered skills efficiently adapting to multiple downstream tasks in various ways. However, recent advanced methods struggle to well balance behavioral exploration and diversity, particularly when the agent dynamics are complex and potential skills are hard to discern (e.g., robot behavior discovery). In this paper, we propose \textbf{Co}ntrastive \textbf{m}ulti-objective \textbf{S}kill \textbf{D}iscovery \textbf{(ComSD)} which discovers exploratory and diverse behaviors through a novel intrinsic incentive, named contrastive multi-objective reward. It contains a novel diversity reward based on contrastive learning to effectively drive agents to discern existing skills, and a particle-based exploration reward to access and learn new behaviors. Moreover, a novel dynamic weighting mechanism between the above two rewards is proposed for diversity-exploration balance, which further improves behavioral quality. Extensive experiments and analysis demonstrate that ComSD can generate diverse behaviors at different exploratory levels for complex multi-joint robots, enabling state-of-the-art performance across 32 challenging downstream adaptation tasks, which recent advanced methods cannot. Codes will be opened after publication.
- Variational option discovery algorithms. arXiv preprint arXiv:1807.10299, 2018.
- The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- Richard Bellman. A markovian decision process. Journal of mathematics and mechanics, pages 679–684, 1957.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Exploration by random network distillation. arXiv preprint arXiv:1810.12894, 2018.
- Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pages 1317–1327. PMLR, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems, 33:9912–9924, 2020.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Bnas: Efficient neural architecture search using broad scalable architecture. IEEE Transactions on Neural Networks and Learning Systems, 2021.
- Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070, 2018.
- Variational intrinsic control. arXiv preprint arXiv:1611.07507, 2016.
- Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the thirteenth international conference on artificial intelligence and statistics, pages 297–304. JMLR Workshop and Conference Proceedings, 2010.
- Fast task inference with variational intrinsic successor features. arXiv preprint arXiv:1906.05030, 2019.
- Unsupervised skill discovery via recurrent skill training. Advances in Neural Information Processing Systems, 35:39034–39046, 2022.
- Variational curriculum reinforcement learning for unsupervised discovery of skills. arXiv preprint arXiv:2310.19424, 2023.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. arXiv preprint arXiv:2004.13649, 2020.
- Curl: Contrastive unsupervised representations for reinforcement learning. In International Conference on Machine Learning, pages 5639–5650. PMLR, 2020.
- Urlb: Unsupervised reinforcement learning benchmark. arXiv preprint arXiv:2110.15191, 2021.
- Cic: Contrastive intrinsic control for unsupervised skill discovery. arXiv preprint arXiv:2202.00161, 2022.
- Unsupervised reinforcement learning with contrastive intrinsic control. Advances in Neural Information Processing Systems, 35:34478–34491, 2022.
- Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274, 2019.
- End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016.
- Deep reinforcement learning-based automatic exploration for navigation in unknown environment. IEEE transactions on neural networks and learning systems, 31(6):2064–2076, 2019.
- Internally rewarded reinforcement learning. arXiv preprint arXiv:2302.00270, 2023.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Aps: Active pretraining with successor features. In International Conference on Machine Learning, pages 6736–6747. PMLR, 2021.
- Behavior from the void: Unsupervised active pre-training. Advances in Neural Information Processing Systems, 34:18459–18473, 2021.
- Cross-domain random pre-training with prototypes for reinforcement learning. arXiv preprint arXiv:2302.05614, 2023.
- Human-level control through deep reinforcement learning. nature, 518(7540):529–533, 2015.
- Autonomous task sequencing for customized curriculum design in reinforcement learning. In IJCAI, pages 2536–2542, 2017.
- Lipschitz-constrained unsupervised skill discovery. In International Conference on Learning Representations, 2021.
- Controllability-aware unsupervised skill discovery. arXiv preprint arXiv:2302.05103, 2023.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Self-supervised exploration via disagreement. In International conference on machine learning, pages 5062–5071. PMLR, 2019.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018.
- One after another: Learning incremental skills for a changing world. arXiv preprint arXiv:2203.11176, 2022.
- Dynamics-aware unsupervised discovery of skills. arXiv preprint arXiv:1907.01657, 2019.
- Nearest neighbor estimates of entropy. American journal of mathematical and management sciences, 23(3-4):301–321, 2003.
- Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, pages 9870–9879. PMLR, 2021.
- Learning more skills through optimistic exploration. arXiv preprint arXiv:2107.14226, 2021.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Behavior contrastive learning for unsupervised skill discovery. arXiv preprint arXiv:2305.04477, 2023.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pages 11920–11931. PMLR, 2021.
- A mixture of surprises for unsupervised reinforcement learning. Advances in Neural Information Processing Systems, 35:26078–26090, 2022.
- Xin Liu (820 papers)
- Yaran Chen (23 papers)
- Dongbin Zhao (62 papers)