Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Imitation Learning Inputting Image Feature to Each Layer of Neural Network (2401.09691v2)

Published 18 Jan 2024 in cs.RO, cs.AI, and cs.LG

Abstract: Imitation learning enables robots to learn and replicate human behavior from training data. Recent advances in machine learning enable end-to-end learning approaches that directly process high-dimensional observation data, such as images. However, these approaches face a critical challenge when processing data from multiple modalities, inadvertently ignoring data with a lower correlation to the desired output, especially when using short sampling periods. This paper presents a useful method to address this challenge, which amplifies the influence of data with a relatively low correlation to the output by inputting the data into each neural network layer. The proposed approach effectively incorporates diverse data sources into the learning process. Through experiments using a simple pick-and-place operation with raw images and joint information as input, significant improvements in success rates are demonstrated even when dealing with data from short sampling periods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. R. Rahmatizadeh, P. Abolghasemi, L. Bölöni, and S. Levine, “Vision-based multi-task manipulation for inexpensive robots using end-to-end learning from demonstration,” in 2018 IEEE international conference on robotics and automation (ICRA), pp. 3758–3765, IEEE, 2018.
  2. T. Hara, T. Sato, T. Ogata, and H. Awano, “Uncertainty-aware haptic shared control with humanoid robots for flexible object manipulation,” IEEE Robotics and Automation Letters, vol. 8, no. 10, pp. 6435–6442, 2023.
  3. B. Zitkovich, T. Yu, S. Xu, P. Xu, T. Xiao, F. Xia, J. Wu, P. Wohlhart, S. Welker, A. Wahid, et al., “Rt-2: Vision-language-action models transfer web knowledge to robotic control,” in Conference on Robot Learning, pp. 2165–2183, PMLR, 2023.
  4. T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning fine-grained bimanual manipulation with low-cost hardware,” Robotics: Science and Systems, 2023.
  5. K. Takeuchi, S. Sakaino, and T. Tsuji, “Motion generation based on contact state estimation using two-stage clustering,” IEEJ Journal of Industry Applications, p. 22012635, 2023.
  6. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  7. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  8. H. Kim, G. Papamakarios, and A. Mnih, “The lipschitz constant of self-attention,” in International Conference on Machine Learning, pp. 5562–5571, PMLR, 2021.
  9. T.-H. Wen, M. Gašić, N. Mrkšić, P.-H. Su, D. Vandyke, and S. Young, “Semantically conditioned LSTM-based natural language generation for spoken dialogue systems,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (L. Màrquez, C. Callison-Burch, and J. Su, eds.), (Lisbon, Portugal), pp. 1711–1721, Association for Computational Linguistics, Sept. 2015.
  10. M. Shridhar, L. Manuelli, and D. Fox, “Cliport: What and where pathways for robotic manipulation,” in Conference on Robot Learning, pp. 894–906, PMLR, 2022.
  11. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
  12. C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep spatial autoencoders for visuomotor learning,” in 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 512–519, IEEE, 2016.
  13. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of the 27th international conference on machine learning (ICML-10), pp. 807–814, 2010.
  14. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  15. K. Yamane, Y. Saigusa, S. Sakaino, and T. Tsuji, “Soft and rigid object grasping with cross-structure hand using bilateral control-based imitation learning,” IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1198–1205, 2024.
  16. Y. Saigusa, S. Sakaino, and T. Tsuji, “Imitation learning for nonprehensile manipulation through self-supervised learning considering motion speed,” IEEE Access, vol. 10, pp. 68291–68306, 2022.
  17. S. Sakaino, T. Sato, and K. Ohnishi, “Multi-dof micro-macro bilateral controller using oblique coordinate control,” IEEE Transactions on Industrial Informatics, vol. 7, no. 3, pp. 446–454, 2011.
  18. K. Shikata and S. Katsura, “Modal space control of bilateral system with elasticity for stable contact motion,” IEEJ Journal of Industry Applications, vol. 12, no. 2, pp. 131–144, 2023.
  19. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Published as a conference paper at the 3rd International Conference for Learning Representations, 2015.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com