Behavioral Cloning via Search in Embedded Demonstration Dataset (2306.09082v1)
Abstract: Behavioural cloning uses a dataset of demonstrations to learn a behavioural policy. To overcome various learning and policy adaptation problems, we propose to use latent space to index a demonstration dataset, instantly access similar relevant experiences, and copy behavior from these situations. Actions from a selected similar situation can be performed by the agent until representations of the agent's current situation and the selected experience diverge in the latent space. Thus, we formulate our control problem as a search problem over a dataset of experts' demonstrations. We test our approach on BASALT MineRL-dataset in the latent representation of a Video PreTraining model. We compare our model to state-of-the-art Minecraft agents. Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment in a wide variety of scenarios. Experimental results reveal that performance of our search-based approach is comparable to trained models, while allowing zero-shot task adaptation by changing the demonstration examples.
- S. Schaal, “Learning from demonstration,” in Advances in Neural Information Processing Systems (M. Mozer, M. Jordan, and T. Petsche, eds.), vol. 9, MIT Press, 1996.
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. The MIT Press, second ed., 2018.
- M. Schilling and A. Melnik, “An approach to hierarchical deep reinforcement learning for a decentralized walking control architecture,” in Biologically Inspired Cognitive Architectures 2018: Proceedings of the Ninth Annual Meeting of the BICA Society, pp. 272–282, Springer, 2019.
- N. Bach, A. Melnik, M. Schilling, T. Korthals, and H. Ritter, “Learn to move through a combination of policy gradient algorithms: Ddpg, d4pg, and td3,” in International Conference on Machine Learning, Optimization, and Data Science, pp. 631–644, Springer, 2020.
- M. Schilling, A. Melnik, F. W. Ohl, H. J. Ritter, and B. Hammer, “Decentralized control and local information for robust and adaptive decentralized deep reinforcement learning,” Neural Networks, vol. 144, pp. 699–725, 2021.
- L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al., “Training language models to follow instructions with human feedback,” Advances in Neural Information Processing Systems, vol. 35, pp. 27730–27744, 2022.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
- K. Rana, A. Melnik, and N. Sünderhauf, “Contrastive language, action, and state pre-training for robot learning,” arXiv preprint arXiv:2304.10782, 2023.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al., “Learning transferable visual models from natural language supervision,” in International conference on machine learning, pp. 8748–8763, PMLR, 2021.
- B. Baker, I. Akkaya, P. Zhokov, J. Huizinga, J. Tang, A. Ecoffet, B. Houghton, R. Sampedro, and J. Clune, “Video pretraining (vpt): Learning to act by watching unlabeled online videos,” Advances in Neural Information Processing Systems, vol. 35, pp. 24639–24654, 2022.
- S. Beohar, F. Heinrich, R. Kala, H. Ritter, and A. Melnik, “Solving learn-to-race autonomous racing challenge by planning in latent space,” arXiv preprint arXiv:2207.01275, 2022.
- R. Shah, C. Wild, S. H. Wang, N. Alex, B. Houghton, W. H. Guss, S. P. Mohanty, A. Kanervisto, S. Milani, N. Topin, P. Abbeel, S. Russell, and A. D. Dragan, “The minerl BASALT competition on learning from human feedback,” CoRR, vol. abs/2107.01969, 2021.
- S. Milani, A. Kanervisto, K. Ramanauskas, S. Schulhoff, , B. Houghton, S. Mohanty, B. Galbraith, K. Chen, Y. Song, T. Zhou, B. Yu, H. Liu, K. Guan, Y. Hu, T. Lv, F. Malato, F. Leopold, A. Raut, V. Hautamäki, A. Melnik, S. Ishida, J. F. Henriques, R. Klassert, W. Laurito, E. Novoseller, V. G. Goecks, N. Waytowich, D. Watkins, J. Miller, and R. Shah, “A retrospective of the minerl basalt 2022 competition]Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition,” arXiv, 2023.
- S. K. Saksena, B. Navaneethkrishnan, S. Hegde, P. Raja, and R. M. Vishwanath, “Towards behavioural cloning for autonomous driving,” 2019.
- T. V. Samak, C. V. Samak, and S. Kandhasamy, “Robust behavioral cloning for autonomous vehicles using end-to-end imitation learning,” SAE International Journal of Connected and Automated Vehicles, vol. 4, 2021.
- S. Beohar and A. Melnik, “Planning with rl and episodic-memory behavioral priors,” arXiv preprint arXiv:2207.01845, 2022.
- A. Kanervisto, J. Karttunen, and V. Hautamäki, “Playing minecraft with behavioural cloning,” CoRR, vol. abs/2005.03374, 2020.
- A. Kanervisto, J. Pussinen, and V. Hautamaki, “Benchmarking End-to-End Behavioural Cloning on Video Games,” in IEEE Conference on Computatonal Intelligence and Games, CIG, vol. 2020-August, 2020.
- O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev, J. Oh, D. Horgan, M. Kroiss, I. Danihelka, A. Huang, L. Sifre, T. Cai, J. P. Agapiou, M. Jaderberg, A. S. Vezhnevets, R. Leblond, T. Pohlen, V. Dalibard, D. Budden, Y. Sulsky, J. Molloy, T. L. Paine, C. Gulcehre, Z. Wang, T. Pfaff, Y. Wu, R. Ring, D. Yogatama, D. Wünsch, K. McKinney, O. Smith, T. Schaul, T. Lillicrap, K. Kavukcuoglu, D. Hassabis, C. Apps, and D. Silver, “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, 2019.
- P. de Haan, D. Jayaraman, and S. Levine, “Causal confusion in imitation learning,” vol. 32, 2019.
- A. Y. Ng and S. J. Russell, “Algorithms for inverse reinforcement learning,” in Proceedings of the Seventeenth International Conference on Machine Learning, ICML ’00, (San Francisco, CA, USA), p. 663–670, Morgan Kaufmann Publishers Inc., 2000.
- J. Ho and S. Ermon, “Generative adversarial imitation learning,” 2016.
- L. Espeholt, H. Soyer, R. Munos, K. Simonyan, V. Mnih, T. Ward, Y. Doron, V. Firoiu, T. Harley, I. Dunning, S. Legg, and K. Kavukcuoglu, “IMPALA: scalable distributed deep-rl with importance weighted actor-learner architectures,” CoRR, vol. abs/1802.01561, 2018.
- “Video-pre-training.” https://github.com/openai/Video-Pre-Training/tree/main/lib. Accessed: 2023-05-14.
- S. Russell, Human Compatible. Penguin, 2019.
- L. Fan, G. Wang, Y. Jiang, A. Mandlekar, Y. Yang, H. Zhu, A. Tang, D.-A. Huang, Y. Zhu, and A. Anandkumar, “Minedojo: Building open-ended embodied agents with internet-scale knowledge,” in Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
- W. H. Guss, B. Houghton, N. Topin, P. Wang, C. Codel, M. Veloso, and R. Salakhutdinov, “MinerL: A large-scale dataset of minecraft demonstrations,” in IJCAI International Joint Conference on Artificial Intelligence, vol. 2019-August, 2019.
- A. Kanervisto, S. Milani, K. Ramanauskas, N. Topin, Z. Lin, J. Li, J. Shi, D. Ye, Q. Fu, W. Yang, W. Hong, Z. Huang, H. Chen, G. Zeng, Y. Lin, V. Micheli, E. Alonso, F. Fleuret, A. Nikulin, Y. Belousov, O. Svidchenko, and A. Shpilman, “Minerl diamond 2021 competition: Overview, results, and lessons learned,” 2022.
- L. V. D. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of Machine Learning Research, vol. 9, 2008.
- T. Minka, R. Cleven, and Y. Zaykov, “Trueskill 2: An improved bayesian skill rating system,” Tech. Rep. MSR-TR-2018-8, Microsoft, March 2018.
- Federico Malato (7 papers)
- Florian Leopold (4 papers)
- Ville Hautamaki (110 papers)
- Andrew Melnik (33 papers)