Gaze-based dual resolution deep imitation learning for high-precision dexterous robot manipulation (2102.01295v3)
Abstract: A high-precision manipulation task, such as needle threading, is challenging. Physiological studies have proposed connecting low-resolution peripheral vision and fast movement to transport the hand into the vicinity of an object, and using high-resolution foveated vision to achieve the accurate homing of the hand to the object. The results of this study demonstrate that a deep imitation learning based method, inspired by the gaze-based dual resolution visuomotor control system in humans, can solve the needle threading task. First, we recorded the gaze movements of a human operator who was teleoperating a robot. Then, we used only a high-resolution image around the gaze to precisely control the thread position when it was close to the target. We used a low-resolution peripheral image to reach the vicinity of the target. The experimental results obtained in this study demonstrate that the proposed method enables precise manipulation tasks using a general-purpose robot manipulator and improves computational efficiency.
- T. Zhang, Z. McCarthy, O. Jow, D. Lee, X. Chen, K. Goldberg, and P. Abbeel, “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in International Conference on Robotics and Automation, 2018, pp. 1–8.
- P.-C. Yang, K. Sasaki, K. Suzuki, K. Kase, S. Sugano, and T. Ogata, “Repeatable folding task by humanoid robot worker using deep learning,” Robotics and Automation Letters, vol. 2, no. 2, pp. 397–403, 2016.
- J. Paillard, “Fast and slow feedback loops for the visual correction of spatial errors in a pointing task: a reappraisal,” Canadian Journal of Physiology and Pharmacology, vol. 74, no. 4, pp. 401–417, 1996.
- H. Kolb, “Simple anatomy of the retina,” Webvision: The Organization of the Retina and Visual System [Internet]., pp. 13–36, 1995. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK11533/
- M. Hayhoe and D. Ballard, “Eye movements in natural behavior,” Trends in Cognitive Sciences, vol. 9, pp. 188–94, 2005.
- A. J. de Brouwer, J. P. Gallivan, and J. R. Flanagan, “Visuomotor feedback gains are modulated by gaze position,” Journal of Neurophysiology, vol. 120, no. 5, pp. 2522–2531, 2018.
- U. Sailer, J. R. Flanagan, and R. S. Johansson, “Eye–hand coordination during learning of a novel visuomotor task,” Journal of Neuroscience, vol. 25, no. 39, pp. 8833–8842, 2005.
- D. Säfström, R. S. Johansson, and J. R. Flanagan, “Gaze behavior when learning to link sequential action phases in a manual task,” Journal of Vision, vol. 14, no. 4, pp. 3–3, 2014.
- F. Sarlegna, J. Blouin, J.-L. Vercher, J.-P. Bresciani, C. Bourdin, and G. M. Gauthier, “Online control of the direction of rapid reaching movements,” Experimental Brain Research, vol. 157, no. 4, pp. 468–471, 2004.
- C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel, “Deep spatial autoencoders for visuomotor learning,” in International Conference on Robotics and Automation, 2016, pp. 512–519.
- S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” The Journal of Machine Learning Research, vol. 17, no. 1, pp. 1334–1373, 2016.
- H. Kim, Y. Ohmura, and Y. Kuniyoshi, “Using human gaze to improve robustness against irrelevant objects in robot manipulation tasks,” Robotics and Automation Letters, vol. 5, no. 3, pp. 4415–4422, 2020.
- T. Tang, H.-C. Lin, Y. Zhao, W. Chen, and M. Tomizuka, “Autonomous alignment of peg and hole by force/torque measurement for robotic assembly,” in International Conference on Automation Science and Engineering, 2016, pp. 162–167.
- H. Inoue, “Force feedback in precise assembly tasks,” Massachusetts Institute of Technology, no. AIM-308, 1974.
- I.-W. Kim, D.-J. Lim, and K.-I. Kim, “Active peg-in-hole of chamferless parts using force/moment sensor,” in International Conference on Intelligent Robots and Systems, vol. 2, 1999, pp. 948–953.
- K. Sharma, V. Shirwalkar, and P. K. Pal, “Intelligent and environment-independent peg-in-hole search strategies,” in International Conference on Control, Automation, Robotics and Embedded Systems, 2013, pp. 1–6.
- S. Huang, K. Murakami, Y. Yamakawa, T. Senoo, and M. Ishikawa, “Fast peg-and-hole alignment using visual compliance,” in International Conference on Intelligent Robots and Systems, 2013, pp. 286–292.
- M. Majors and R. Richards, “A neural-network-based flexible assembly controller,” in International Conference on Artificial Neural Networks, 1995, pp. 268–273.
- T. Inoue, G. De Magistris, A. Munawar, T. Yokoya, and R. Tachibana, “Deep reinforcement learning for high precision assembly tasks,” in International Conference on Intelligent Robots and Systems, 2017, pp. 819–825.
- M. Inaba and H. Inoue, “Hand eye coordination in rope handling,” Journal of the Robotics Society of Japan, vol. 3, no. 6, pp. 538–547, 1985.
- J. Silvério and S. Calinon, “A laser-based dual-arm system for precise control of collaborative robots,” arXiv preprint arXiv:2011.01573, 2020.
- S. Huang, Y. Yamakawa, T. Senoo, and M. Ishikawa, “Robotic needle threading manipulation based on high-speed motion strategy using high-speed visual feedback,” in International Conference on Intelligent Robots and Systems, 2015, pp. 4041–4046.
- J. Aloimonos, I. Weiss, and A. Bandyopadhyay, “Active vision,” International Journal of Computer Vision, vol. 1, no. 4, pp. 333–356, 1988.
- R. Bajcsy, “Active perception,” Proceedings of the IEEE, vol. 76, no. 8, pp. 966–1005, 1988.
- D. H. Ballard, “Animate vision,” Artificial Intelligence, vol. 48, no. 1, pp. 57–86, 1991.
- J. Soong and C. Brown, “Inverse kinematics and gaze stabilization for the rochester robot head,” University of Rochester, Tech. Rep., 1991.
- J. L. Crowley, P. Bobet, and M. Mesrabi, “Gaze control for a binocular camera head,” in European Conference on Computer Vision, 1992, pp. 588–596.
- G. Sandini and V. Tagliasco, “An anthropomorphic retina-like structure for scene analysis,” Computer Graphics and Image Processing, vol. 14, no. 4, pp. 365–372, 1980.
- Y. Kuniyoshi, N. Kita, T. Suehiro, and S. Rougeaux, “Active stereo vision system with foveated wide angle lenses,” in Asian Conference on Computer Vision, 1995, pp. 191–200.
- K. Kuniyoshi, N. Kita, K. Sugimoto, S. Nakamura, and T. Suehiro, “A foveated wide angle lens for active vision,” in International Conference on Robotics and Automation, vol. 3, 1995, pp. 2982–2988.
- S. D. Whitehead and D. H. Ballard, “Learning to perceive and act by trial and error,” Machine Learning, vol. 7, no. 1, pp. 45–83, 1991.
- R. Zhang, Z. Liu, L. Zhang, J. A. Whritner, K. S. Muller, M. M. Hayhoe, and D. H. Ballard, “Agil: Learning attention from human for visuomotor tasks,” in European Conference on Computer Vision, 2018, pp. 663–679.
- D. Chotrov, Z. Uzunova, Y. Yordanov, and S. Maleshkov, “Mixed-reality spatial configuration with a zed mini stereoscopic camera,” 11 2018.
- C. M. Bishop, “Mixture density networks,” Neural Computing Research Group, Aston University, Tech. Rep., 1994.
- S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International Conference on Machine Learning., 2015.
- X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in International Conference on Artificial Intelligence and Statistics, 2011, pp. 315–323.
- L. Liu, H. Jiang, P. He, W. Chen, X. Liu, J. Gao, and J. Han, “On the variance of the adaptive learning rate and beyond,” in International Conference on Learning Representations, 2020, pp. 1–14.