Constrained Ensemble Exploration for Unsupervised Skill Discovery (2405.16030v1)
Abstract: Unsupervised Reinforcement Learning (RL) provides a promising paradigm for learning useful behaviors via reward-free per-training. Existing methods for unsupervised RL mainly conduct empowerment-driven skill discovery or entropy-based exploration. However, empowerment often leads to static skills, and pure exploration only maximizes the state coverage rather than learning useful behaviors. In this paper, we propose a novel unsupervised RL framework via an ensemble of skills, where each skill performs partition exploration based on the state prototypes. Thus, each skill can explore the clustered area locally, and the ensemble skills maximize the overall state coverage. We adopt state-distribution constraints for the skill occupancy and the desired cluster for learning distinguishable skills. Theoretical analysis is provided for the state entropy and the resulting skill distributions. Based on extensive experiments on several challenging tasks, we find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in neural information processing systems, 34:29304–29320, 2021.
- Opal: Offline primitive discovery for accelerating offline reinforcement learning. In International Conference on Learning Representations, 2020.
- Uncertainty-based offline reinforcement learning with diversified q-ensemble. In Advances in Neural Information Processing Systems, 2021.
- Efficient exploration through bayesian deep q-networks. In 2018 Information Theory and Applications Workshop (ITA), pp. 1–9. IEEE, 2018.
- Dynamic bottleneck for robust self-supervised exploration. Advances in Neural Information Processing Systems, 34:17007–17020, 2021a.
- Principled exploration via optimistic bootstrapping and backward induction. In International Conference on Machine Learning, pp. 577–587. PMLR, 2021b.
- Pessimistic bootstrapping for uncertainty-driven offline reinforcement learning. In International Conference on Learning Representations, 2022.
- Pessimistic value iteration for multi-task data sharing in offline reinforcement learning. Artificial Intelligence, 326:104048, 2024.
- Barbour, A. D. Coupling, stationarity, and regeneration. Journal of the American Statistical Association, 96(454):780–780, 2001. doi: 10.1198/jasa.2001.s401.
- Mutual information neural estimation. In International Conference on Machine Learning, volume 80, pp. 531–540, 2018.
- Extrapolating beyond suboptimal demonstrations via inverse reinforcement learning from observations. In International conference on machine learning, pp. 783–792. PMLR, 2019.
- Exploration by random network distillation. In International Conference on Learning Representations, 2019.
- Explore, discover and learn: Unsupervised discovery of state-covering skills. In International Conference on Machine Learning, pp. 1317–1327. PMLR, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. Advances in neural information processing systems, 33:9912–9924, 2020.
- Specializing versatile skill libraries using local mixture of experts. In Conference on Robot Learning, pp. 1423–1433. PMLR, 2022.
- Goal-conditioned reinforcement learning with imagined subgoals. In International Conference on Machine Learning, pp. 1430–1440. PMLR, 2021.
- A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, volume 119, pp. 1597–1607, 2020a.
- Randomized ensembled double q-learning: Learning fast without a model. In International Conference on Learning Representations, 2020b.
- Text-to-image diffusion models are zero-shot classifiers. In Neural Information Processing Systems, 2023.
- Cover, T. M. Elements of information theory. John Wiley & Sons, 2006.
- Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
- Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.
- False correlation reduction for offline reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations, 2019.
- Fano, R. M. Fano inequality. Scholarpedia, 3(10):6648, 2008. doi: 10.4249/scholarpedia.6648.
- Adversarially guided actor-critic. In International Conference on Learning Representations, 2020.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
- Reinforcement learning from passive data via latent intentions. In International Conference on Machine Learning, pp. 11321–11339. PMLR, 2023.
- Variational intrinsic control. arXiv preprint arXiv:1611.07507, 2016.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Watch and match: Supercharging imitation with regularized optimal transport. In 6th Annual Conference on Robot Learning, 2022.
- Pre-trained models: Past, present and future. AI Open, 2:225–250, 2021.
- Temporal difference learning for model predictive control. In International Conference on Machine Learning, pp. 8387–8406. PMLR, 2022.
- Exploration in deep reinforcement learning: From single-agent to multiagent domain. IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Large-scale actionless video pre-training via discrete diffusion for efficient policy learning. arXiv preprint arXiv:2402.14407, 2024a.
- Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning. Advances in neural information processing systems, 36, 2024b.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29, 2016.
- Unsupervised skill discovery via recurrent skill training. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.
- Learning to discover skills with guidance. In Advances in Neural Information Processing Systems, 2023.
- Reward design with language models. In The Eleventh International Conference on Learning Representations, 2023.
- URLB: Unsupervised reinforcement learning benchmark. In Neural Information Processing Systems (Datasets and Benchmarks Track), 2021.
- Unsupervised reinforcement learning with contrastive intrinsic control. In Advances in Neural Information Processing Systems, 2022.
- Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning, pp. 6131–6141. PMLR, 2021.
- Efficient exploration via state marginal matching. arXiv preprint arXiv:1906.05274, 2019.
- Efficient exploration via state marginal matching, 2020. URL https://openreview.net/forum?id=Hkla1eHFvS.
- HyperDQN: A randomized exploration method for deep reinforcement learning. In International Conference on Learning Representations, 2022.
- Aps: Active pretraining with successor features. In International Conference on Machine Learning, pp. 6736–6747. PMLR, 2021a.
- Behavior from the void: Unsupervised active pre-training. In Advances in Neural Information Processing Systems, volume 34, pp. 18459–18473, 2021b.
- Versatile offline imitation from observations and examples via regularized state-occupancy matching. In International Conference on Machine Learning, pp. 14639–14663. PMLR, 2022.
- Cross-trajectory representation learning for zero-shot generalization in rl. In International Conference on Learning Representations, 2021.
- Choreographer: Learning and adapting skills in imagination. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=PhkWyijGi5b.
- Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
- Awac: Accelerating online reinforcement learning with offline datasets. arXiv preprint arXiv:2006.09359, 2020.
- Lipschitz-constrained unsupervised skill discovery. In International Conference on Learning Representations, 2022.
- Controllability-aware unsupervised skill discovery. In International Conference on Machine Learning, volume 202, pp. 27225–27245, 2023.
- METRA: Scalable unsupervised RL with metric-aware abstraction. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=c5pwL0Soay.
- Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning, pp. 2778–2787. PMLR, 2017.
- Self-supervised exploration via disagreement. In International Conference on Machine Learning, pp. 5062–5071. PMLR, 2019.
- Contrastive ucb: Provably efficient contrastive self-supervised learning in online reinforcement learning. In International Conference on Machine Learning, pp. 18168–18210. PMLR, 2022.
- Mastering the unsupervised reinforcement learning benchmark from pixels. In International Conference on Machine Learning, pp. 28598–28617. PMLR, 2023.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. Robotics: Science and Systems XIV, 2018.
- Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604–609, 2020.
- Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations, 2020.
- Robust quadrupedal locomotion via risk-averse policy learning. In 2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024.
- Understanding the limitations of variational mutual information estimators. In International Conference on Learning Representations, 2020.
- Learning more skills through optimistic exploration. In International Conference on Learning Representations, 2022.
- Reinforcement learning: An introduction. MIT press, 2018.
- Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Ensemble successor representations for task generalization in offline-to-online reinforcement learning. arXiv preprint arXiv:2405.07223, 2024.
- Critic regularized regression. Advances in Neural Information Processing Systems, 33:7768–7778, 2020.
- Towards robust offline-to-online reinforcement learning via uncertainty and smoothness. arXiv preprint arXiv:2309.16973, 2023.
- Contrastive representation for data filtering in cross-domain offline reinforcement learning. In International Conference on Machine Learning, 2024.
- Uncertainty-aware model-based reinforcement learning: Methodology and application in autonomous driving. IEEE Transactions on Intelligent Vehicles, 8(1):194–203, 2022.
- Pretraining in deep reinforcement learning: A survey. arXiv preprint arXiv:2211.03959, 2022.
- Discriminator-weighted offline imitation learning from suboptimal demonstrations. In International Conference on Machine Learning, pp. 24725–24742. PMLR, 2022.
- On the value of myopic behavior in policy reuse. arXiv preprint arXiv:2305.17623, 2023.
- Cross-domain policy adaptation via value-guided data filtering. Advances in Neural Information Processing Systems, 36, 2024.
- Representation matters: Offline pretraining for sequential decision making. In International Conference on Machine Learning, pp. 11784–11794. PMLR, 2021.
- Behavior contrastive learning for unsupervised skill discovery. In Proceedings of the 40th International Conference on Machine Learning, pp. 39183–39204, 2023.
- Reinforcement learning with prototypical representations. In International Conference on Machine Learning, pp. 11920–11931. PMLR, 2021.
- Don’t change the algorithm, change the data: Exploratory data for offline reinforcement learning. arXiv preprint arXiv:2201.13425, 2022.
- Mastering atari games with limited data. Advances in Neural Information Processing Systems, 34:25476–25488, 2021.
- Pre-trained image encoder for generalizable visual reinforcement learning. Advances in Neural Information Processing Systems, 35:13022–13037, 2022.
- Made: Exploration via maximizing deviation from explored regions. In Advances in Neural Information Processing Systems, volume 34, pp. 9663–9680, 2021.
- Zhang, Z. Estimating mutual information via kolmogorov distance. IEEE Transactions on Information Theory, 53(9):3280–3282, 2007.
- Chenjia Bai (47 papers)
- Rushuai Yang (4 papers)
- Qiaosheng Zhang (35 papers)
- Kang Xu (34 papers)
- Yi Chen (176 papers)
- Ting Xiao (42 papers)
- Xuelong Li (268 papers)