Dataset Clustering for Improved Offline Policy Learning (2402.09550v1)
Abstract: Offline policy learning aims to discover decision-making policies from previously-collected datasets without additional online interactions with the environment. As the training dataset is fixed, its quality becomes a crucial determining factor in the performance of the learned policy. This paper studies a dataset characteristic that we refer to as multi-behavior, indicating that the dataset is collected using multiple policies that exhibit distinct behaviors. In contrast, a uni-behavior dataset would be collected solely using one policy. We observed that policies learned from a uni-behavior dataset typically outperform those learned from multi-behavior datasets, despite the uni-behavior dataset having fewer examples and less diversity. Therefore, we propose a behavior-aware deep clustering approach that partitions multi-behavior datasets into several uni-behavior subsets, thereby benefiting downstream policy learning. Our approach is flexible and effective; it can adaptively estimate the number of clusters while demonstrating high clustering accuracy, achieving an average Adjusted Rand Index of 0.987 across various continuous control task datasets. Finally, we present improved policy learning examples using dataset clustering and discuss several potential scenarios where our approach might benefit the offline policy learning community.
- Uncertainty-based offline reinforcement learning with diversified q-ensemble. Advances in neural information processing systems, 34:7436–7447, 2021.
- Averaged-dqn: Variance reduction and stabilization for deep reinforcement learning. In International conference on machine learning, pp. 176–185. PMLR, 2017.
- A framework for behavioural cloning. In Machine Intelligence 15, pp. 103–129, 1995.
- Learning from positive and unlabeled data: A survey. Machine Learning, 109:719–760, 2020.
- Imitation learning by estimating expertise of demonstrators. In International Conference on Machine Learning, pp. 1732–1748. PMLR, 2022.
- Cost-effective ensemble models selection using deep reinforcement learning. Information Fusion, 77:133–148, 2022.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3(1):1–27, 1974.
- Policy improvement via imitation of multiple oracles. Advances in Neural Information Processing Systems, 33:5587–5598, 2020.
- A cluster separation measure. IEEE transactions on pattern analysis and machine intelligence, (2):224–227, 1979.
- A density-based algorithm for discovering clusters in large spatial databases with noise. In kdd, volume 96, pp. 226–231, 1996.
- D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
- A minimalist approach to offline reinforcement learning. Advances in neural information processing systems, 34:20132–20145, 2021.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
- Benchmarking batch deep reinforcement learning algorithms. arXiv preprint arXiv:1910.01708, 2019a.
- Off-policy deep reinforcement learning without exploration. In International conference on machine learning, pp. 2052–2062. PMLR, 2019b.
- A survey on deep learning for multimodal data fusion. Neural Computation, 32(5):829–864, 2020.
- Benchmarking offline reinforcement learning on real-robot hardware. arXiv preprint arXiv:2307.15690, 2023a.
- Real robot challenge 2022: Learning dexterous manipulation from offline data in the real world. In NeurIPS 2022 Competition Track, pp. 133–150. PMLR, 2023b.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Comparing partitions. Journal of classification, 2:193–218, 1985.
- Imitation learning: A survey of learning methods. ACM Computing Surveys (CSUR), 50(2):1–35, 2017.
- An oppositional-cauchy based gsk evolutionary algorithm with a novel deep ensemble reinforcement learning strategy for covid-19 diagnosis. Applied Soft Computing, 111:107675, 2021.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
- Density-based clustering. Wiley interdisciplinary reviews: data mining and knowledge discovery, 1(3):231–240, 2011.
- Sunrise: A simple unified framework for ensemble learning in deep reinforcement learning. In International Conference on Machine Learning, pp. 6131–6141. PMLR, 2021.
- Threshold optimization and random undersampling for imbalanced credit card data. Journal of Big Data, 10(1):58, 2023.
- Levine, S. Supervised Learning of Behaviors, 2022. URL http://rail.eecs.berkeley.edu/deeprlcourse-fa21/static/slides/lec-2.pdf. (Accessed 2022, Oct 10).
- Clustering experience replay for the effective exploitation in reinforcement learning. Pattern Recognition, 131:108875, 2022.
- Clue: Calibrated latent guidance for offline reinforcement learning. arXiv preprint arXiv:2306.13412, 2023.
- Clustered reinforcement learning. arXiv preprint arXiv:1906.02457, 2019.
- MacQueen, J. et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, volume 1, pp. 281–297. Oakland, CA, USA, 1967.
- Madhulatha, T. S. An overview on clustering methods. arXiv preprint arXiv:1205.1117, 2012.
- Solving the real robot challenge using deep reinforcement learning. arXiv preprint arXiv:2109.15233, 2021.
- Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- Q-ensemble for offline rl: Don’t scale the ensemble, scale the batch size. arXiv preprint arXiv:2211.11092, 2022.
- Deep exploration via bootstrapped dqn. Advances in neural information processing systems, 29, 2016.
- Reinforcement learning based dynamic weighing of ensemble models for time series forecasting. arXiv preprint arXiv:2008.08878, 2020.
- Out-of-dynamics imitation learning from multimodal demonstrations. In Conference on Robot Learning, pp. 1071–1080. PMLR, 2023.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- A method for initialising the k-means clustering algorithm using kd-trees. Pattern recognition letters, 28(8):965–973, 2007.
- Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20:53–65, 1987.
- Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(4):e1249, 2018.
- Seerl: Sample efficient ensemble reinforcement learning. arXiv preprint arXiv:2001.05209, 2020.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- d3rlpy: An offline deep reinforcement learning library. The Journal of Machine Learning Research, 23(1):14205–14224, 2022.
- Pebl: Pessimistic ensembles for offline deep reinforcement learning. In Robust and Reliable Autonomy in the Wild Workshop at the 30th International Joint Conference of Artificial Intelligence, 2021.
- Ensemble reinforcement learning: A survey. Applied Soft Computing, pp. 110975, 2023.
- Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles. npj Computational Materials, 9(1):225, 2023.
- Tate, A. When To Choose Density-Based Methods, 2023. URL https://hex.tech/blog/comparing-density-based-methods/. (Accessed 2023, Oct 8).
- Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Improving behavioural cloning with positive unlabeled learning. In 7th Annual Conference on Robot Learning, 2023a.
- Identifying expert behavior in offline training datasets improves behavioral cloning of robotic manipulation policies. arXiv preprint arXiv:2301.13019v2, 2023b.
- Dexterous robotic manipulation using deep reinforcement learning and knowledge transfer for complex sparse reward-based tasks. Expert Systems, 40(6):e13205, 2023c.
- Critic regularized regression. Advances in Neural Information Processing Systems, 33:7768–7778, 2020.
- Reducing conservativeness oriented offline reinforcement learning. arXiv preprint arXiv:2103.00098, 2021a.
- Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control, pp. 321–384, 2021b.
- An overview of multi-task learning. National Science Review, 5(1):30–43, 2018.
- A comprehensive survey on deep clustering: Taxonomy, challenges, and future directions. arXiv preprint arXiv:2206.07579, 2022.