Transformer-based deep imitation learning for dual-arm robot manipulation (2108.00385v2)
Abstract: Deep imitation learning is promising for solving dexterous manipulation tasks because it does not require an environment model and pre-programmed robot behavior. However, its application to dual-arm manipulation tasks remains challenging. In a dual-arm manipulation setup, the increased number of state dimensions caused by the additional robot manipulators causes distractions and results in poor performance of the neural networks. We address this issue using a self-attention mechanism that computes dependencies between elements in a sequential input and focuses on important elements. A Transformer, a variant of self-attention architecture, is applied to deep imitation learning to solve dual-arm manipulation tasks in the real world. The proposed method has been tested on dual-arm manipulation tasks using a real robot. The experimental results demonstrated that the Transformer-based deep imitation learning architecture can attend to the important features among the sensory inputs, therefore reducing distractions and improving manipulation performance when compared with the baseline architecture without the self-attention mechanisms.
- P.-C. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata, “Repeatable folding task by humanoid robot worker using deep learning,” Robotics and Automation Letters, vol. 2, no. 2, pp. 397–403, 2016.
- T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in International Conference on Robotics and Automation, 2018, pp. 1–8.
- H. Kim, Y. Ohmura, and Y. Kuniyoshi, “Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation,” Robotics and Automation Letters, pp. 1–1, 2021.
- C. Smith, Y. Karayiannidis, L. Nalpantidis, X. Gratal, P. Qi, D. V. Dimarogonas, and D. Kragic, “Dual arm manipulation—a survey,” Robotics and Autonomous systems, vol. 60, no. 10, pp. 1340–1353, 2012.
- R. Suárez, J. Rosell, and N. Garcia, “Using synergies in dual-arm manipulation tasks,” in International Conference on Robotics and Automation, 2015, pp. 5655–5661.
- J. Zhu, B. Navarro, P. Fraisse, A. Crosnier, and A. Cherubini, “Dual-arm robotic manipulation of flexible cables,” in International Conference on Intelligent Robots and Systems, 2018, pp. 479–484.
- D. SepúLveda, R. Fernández, E. Navas, M. Armada, and P. González-De-Santos, “Robotic aubergine harvesting using dual-arm manipulation,” IEEE Access, vol. 8, pp. 121 889–121 904, 2020.
- T. Asfour, P. Azad, F. Gyarfas, and R. Dillmann, “Imitation learning of dual-arm manipulation tasks in humanoid robots,” International Journal of Humanoid Robotics, vol. 5, no. 02, pp. 183–202, 2008.
- R. Caccavale, M. Saveriano, G. A. Fontanelli, F. Ficuciello, D. Lee, and A. Finzi, “Imitation learning and attentional supervision of dual-arm structured tasks,” in International Conference on Development and Learning and Epigenetic Robotics, 2017, pp. 66–71.
- F. Xie, A. Chowdhury, M. Kaluza, L. Zhao, L. L. Wong, and R. Yu, “Deep imitation learning for bimanual robotic manipulation,” in Neural Information Processing Systems, 2020.
- H. Kim, Y. Ohmura, and Y. Kuniyoshi, “Using human gaze to improve robustness against irrelevant objects in robot manipulation tasks,” Robotics and Automation Letters, vol. 5, no. 3, pp. 4415–4422, 2020.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Neural Information Processing Systems, 2017, pp. 5998–6008.
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
- T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., “Language models are few-shot learners,” arXiv preprint arXiv:2005.14165, 2020.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European Conference on Computer Vision, 2020, pp. 213–229.
- P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high-resolution image synthesis,” arXiv preprint arXiv:2012.09841, 2020.
- L. Kaiser, A. N. Gomez, N. Shazeer, A. Vaswani, N. Parmar, L. Jones, and J. Uszkoreit, “One model to learn them all,” arXiv preprint arXiv:1706.05137, 2017.
- S. Dasari and A. Gupta, “Transformers for one-shot visual imitation,” in Conference on Robot Learning, 2020.
- T. Cachet, J. Perez, and S. Kim, “Transformer-based meta-Imitation learning for robotic manipulation,” in Neural Information Processing Systems, Workshop on Robot Learning, 2020.
- C. M. Bishop, “Mixture density networks,” Neural Computing Research Group, Aston University, Tech. Rep., 1994.
- M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
- L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” in International Conference on Learning Representations, 2020, pp. 1–14.
- S. Abnar and W. Zuidema, “Quantifying attention flow in transformers,” in Annual Meeting of the Association for Computational Linguistics, 2020, pp. 4190–4197.
- Heecheol Kim (8 papers)
- Yoshiyuki Ohmura (17 papers)
- Yasuo Kuniyoshi (35 papers)