Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing (2312.01853v3)

Published 4 Dec 2023 in cs.RO, cs.CV, and cs.LG

Abstract: Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. J. Simner and V. U. Ludwig, “The color of touch: A case of tactile–visual synaesthesia,” Neurocase, vol. 18, no. 2, pp. 167–180, 2012.
  2. A. M. A. Davies and R. C. White, “A sensational illusion: vision-touch synaesthesia and the rubber hand paradigm,” Cortex, vol. 49, no. 3, pp. 806–818, 2013.
  3. C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” CoRR, vol. abs/1612.00593, 2016.
  4. A. M. Okamura, N. Smaby, and M. R. Cutkosky, “An overview of dexterous manipulation,” in IEEE International Conference on Robotics and Automation. Symposia Proceedings, 2000.
  5. N. Chavan-Dafle and A. Rodriguez, “Sampling-based planning of in-hand manipulation with external pushes,” in Robotics Research: The 18th International Symposium ISRR.   Springer, 2020, pp. 523–539.
  6. O. M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray et al., “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research (IJRR), vol. 39, no. 1, pp. 3–20, 2020.
  7. Y. Qin, W. Yang, B. Huang, K. Van Wyk, H. Su, X. Wang, Y.-W. Chao, and D. Fox, “Anyteleop: A general vision-based dexterous robot arm-hand teleoperation system,” arXiv preprint arXiv:2307.04577, 2023.
  8. K. Shaw and D. Pathak, “Leap hand: Low-cost, efficient, and anthropomorphic hand for robot learning,” Submission, ICRA, 2023.
  9. J. Ye, J. Wang, B. Huang, Y. Qin, and X. Wang, “Learning continuous grasping function with a dexterous hand from human demonstrations,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023.
  10. M. Cherif and K. K. Gupta, “Planning quasi-static fingertip manipulations for reconfiguring objects,” IEEE Transactions on Robotics and Automation, vol. 15, no. 5, pp. 837–848, 1999.
  11. J. Shi, J. Z. Woodruff, P. B. Umbanhowar, and K. M. Lynch, “Dynamic in-hand sliding manipulation,” IEEE Transactions on Robotics, vol. 33, no. 4, pp. 778–795, 2017.
  12. L. Han, Y.-S. Guan, Z. Li, Q. Shi, and J. C. Trinkle, “Dextrous manipulation with rolling contacts,” in International Conference on Robotics and Automation (ICRA), 1997.
  13. L. Han and J. Trinkle, “Dextrous manipulation by rolling and finger gaiting,” in IEEE International Conference on Robotics and Automation (ICRA), 1998.
  14. A. Bicchi and R. Sorrentino, “Dexterous manipulation through rolling,” in Proceedings of 1995 IEEE International Conference on Robotics and Automation, vol. 1, 1995, pp. 452–457 vol.1.
  15. Z. Doulgeri and L. Droukas, “On rolling contact motion by robotic fingers via prescribed performance control,” in IEEE International Conference on Robotics and Automation (ICRA), 2013.
  16. M. Lepert, C. Pan, S. Yuan, R. Antonova, and J. Bohg, “In-hand manipulation of unknown objects with tactile sensing for insertion,” in Embracing Contacts-Workshop at ICRA 2023, 2023.
  17. Y. Aiyama, M. Inaba, and H. Inoue, “Pivoting: A new method of graspless manipulation of object by robot fingers,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1993.
  18. E. Yoshida, P. Blazevic, and V. Hugel, “Pivoting manipulation of a large object: A study of application using humanoid platform,” in Proceedings of the 2005 IEEE International Conference on Robotics and Automation.   IEEE, 2005, pp. 1040–1045.
  19. P. Tournassoud, T. Lozano-Pérez, and E. Mazer, “Regrasping,” in IEEE International Conference on Robotics and Automation (ICRA), 1987.
  20. P. Vinayavekhin, S. Kudoh, and K. Ikeuchi, “Towards an automatic robot regrasping movement based on human demonstration using tangle topology,” in 2011 IEEE International Conference on Robotics and Automation.   IEEE, 2011, pp. 3332–3339.
  21. A. A. Cole, P. Hsu, and S. S. Sastry, “Dynamic control of sliding by robot hands for regrasping,” IEEE Transactions on robotics and automation, vol. 8, no. 1, pp. 42–52, 1992.
  22. V. Kumar, Y. Tassa, T. Erez, and E. Todorov, “Real-time behaviour synthesis for dynamic hand-manipulation,” in 2014 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2014, pp. 6808–6815.
  23. Y. Bai and C. K. Liu, “Dexterous manipulation using both palm and fingers,” in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 1560–1565.
  24. T. Chen, J. Xu, and P. Agrawal, “A system for general in-hand object re-orientation,” in Conference on Robot Learning (CoRL), 2022, pp. 297–307.
  25. Y. Qin, B. Huang, Z.-H. Yin, H. Su, and X. Wang, “Dexpoint: Generalizable point cloud reinforcement learning for sim-to-real dexterous manipulation,” in Conference on Robot Learning (CoRL), 2022.
  26. G. Khandate, M. Haas-Heger, and M. Ciocarlie, “On the feasibility of learning finger-gaiting in-hand manipulation with intrinsic sensing,” in International Conference on Robotics and Automation (ICRA), 2022.
  27. C. Bao, H. Xu, Y. Qin, and X. Wang, “Dexart: Benchmarking generalizable dexterous manipulation with articulated objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 190–21 200.
  28. B. Huang, Y. Chen, T. Wang, Y. Qin, Y. Yang, N. Atanasov, and X. Wang, “Dynamic handover: Throw and catch with bimanual hands,” in 7th Annual Conference on Robot Learning, 2023.
  29. Y. Qin, Y.-H. Wu, S. Liu, H. Jiang, R. Yang, Y. Fu, and X. Wang, “Dexmv: Imitation learning for dexterous manipulation from human videos,” in European Conference on Computer Vision (ECCV), 2022.
  30. A. Rajeswaran, V. Kumar, A. Gupta, G. Vezzani, J. Schulman, E. Todorov, and S. Levine, “Learning complex dexterous manipulation with deep reinforcement learning and demonstrations,” in Robotics: Science and Systems (RSS), 2018.
  31. S. P. Arunachalam, S. Silwal, B. Evans, and L. Pinto, “Dexterous imitation made easy: A learning-based framework for efficient dexterous manipulation,” arXiv preprint arXiv:2203.13251, 2022.
  32. Y. Qin, H. Su, and X. Wang, “From one hand to multiple hands: Imitation learning for dexterous manipulation from single-camera teleoperation,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10 873–10 881, 2022.
  33. Y.-H. Wu, J. Wang, and X. Wang, “Learning generalizable dexterous manipulation from human grasp affordance,” in Conference on Robot Learning (CoRL), 2022.
  34. X. Liu, D. Pathak, and K. M. Kitani, “Herd: Continuous human-to-robot evolution for learning from human demonstration,” arXiv preprint arXiv:2212.04359, 2022.
  35. A. Patel, A. Wang, I. Radosavovic, and J. Malik, “Learning to imitate object interactions from internet videos,” arXiv preprint arXiv:2211.13225, 2022.
  36. S. P. Arunachalam, I. Güzey, S. Chintala, and L. Pinto, “Holo-dex: Teaching dexterity with immersive mixed reality,” arXiv preprint arXiv:2210.06463, 2022.
  37. A. Sivakumar, K. Shaw, and D. Pathak, “Robotic telekinesis: Learning a robotic hand imitator by watching humans on youtube,” arXiv preprint arXiv:2202.10448, 2022.
  38. A. Bhatt, A. Sieler, S. Puhlmann, and O. Brock, “Surprisingly robust in-hand manipulation: An empirical study,” in Robotics: Science and Systems (RSS), 2021.
  39. A. Nagabandi, K. Konolige, S. Levine, and V. Kumar, “Deep dynamics models for learning dexterous manipulation,” in Conference on Robot Learning.   PMLR, 2020, pp. 1101–1112.
  40. A. S. Morgan, K. Hang, B. Wen, K. Bekris, and A. M. Dollar, “Complex in-hand manipulation via compliance-enabled finger gaiting and multi-modal planning,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 4821–4828, 2022.
  41. H. Qi, A. Kumar, R. Calandra, Y. Ma, and J. Malik, “In-hand object rotation via rapid motor adaptation,” in Conference on Robot Learning.   PMLR, 2023, pp. 1722–1732.
  42. Z.-H. Yin, B. Huang, Y. Qin, Q. Chen, and X. Wang, “Rotating without seeing: Towards in-hand dexterity through touch,” arXiv preprint arXiv:2303.10880, 2023.
  43. A. Handa, A. Allshire, V. Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. Van Wyk, A. Zhurkevich, B. Sundaralingam et al., “Dextreme: Transfer of agile in-hand manipulation from simulation to reality,” in International Conference on Robotics and Automation (ICRA), 2023.
  44. T. Chen, M. Tippur, S. Wu, V. Kumar, E. Adelson, and P. Agrawal, “Visual dexterity: In-hand dexterous manipulation from depth,” in Icml workshop on new frontiers in learning, control, and dynamical systems, 2023.
  45. I. Guzey, B. Evans, S. Chintala, and L. Pinto, “Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play,” arXiv preprint arXiv:2303.12076, 2023.
  46. H. Qi, B. Yi, S. Suresh, M. Lambeta, Y. Ma, R. Calandra, and J. Malik, “General in-hand object rotation with vision and touch,” in 7th Annual Conference on Robot Learning, 2023.
  47. P. Jenmalm and R. S. Johansson, “Visual and somatosensory information about object shape control manipulative fingertip forces,” Journal of Neuroscience, vol. 17, no. 11, pp. 4486–4499, 1997.
  48. Y. Chen, A. Sipos, M. Van der Merwe, and N. Fazeli, “Visuo-tactile transformers for manipulation,” arXiv preprint arXiv:2210.00121, 2022.
  49. P. Falco, S. Lu, A. Cirillo, C. Natale, S. Pirozzi, and D. Lee, “Cross-modal visuo-tactile object recognition using robotic active exploration,” in 2017 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2017, pp. 5273–5280.
  50. S. Wang, J. Wu, X. Sun, W. Yuan, W. T. Freeman, J. B. Tenenbaum, and E. H. Adelson, “3d shape perception from monocular vision, touch, and shape priors,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 1606–1613.
  51. A. Billard and D. Kragic, “Trends and challenges in robot manipulation,” Science, vol. 364, no. 6446, p. eaat8414, 2019.
  52. R. Calandra, A. Owens, D. Jayaraman, J. Lin, W. Yuan, J. Malik, E. H. Adelson, and S. Levine, “More than a feeling: Learning to grasp and regrasp using vision and touch,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 3300–3307, 2018.
  53. M. A. Lee, Y. Zhu, K. Srinivasan, P. Shah, S. Savarese, L. Fei-Fei, A. Garg, and J. Bohg, “Making sense of vision and touch: Self-supervised learning of multimodal representations for contact-rich tasks,” in 2019 International Conference on Robotics and Automation (ICRA).   IEEE, 2019, pp. 8943–8950.
  54. R. Bischoff and V. Graefe, “Integrating vision, touch and natural language in the control of a situation-oriented behavior-based humanoid robot,” in IEEE SMC’99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No. 99CH37028), vol. 2.   IEEE, 1999, pp. 999–1004.
  55. I. Guzey, Y. Dai, B. Evans, S. Chintala, and L. Pinto, “See to touch: Learning tactile dexterity through visual incentives,” arXiv preprint arXiv:2309.12300, 2023.
  56. A. Petrovskaya and O. Khatib, “Global localization of objects via touch,” IEEE Transactions on Robotics, vol. 27, no. 3, pp. 569–585, 2011.
  57. D. Driess, P. Englert, and M. Toussaint, “Active learning with query paths for tactile object shape exploration,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
  58. J. Liang, A. Handa, K. Van Wyk, V. Makoviychuk, O. Kroemer, and D. Fox, “In-hand object pose tracking via contact feedback and gpu-accelerated robotic simulation,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2020, pp. 6203–6209.
  59. M. Lambeta, P.-W. Chou, S. Tian, B. Yang, B. Maloon, V. R. Most, D. Stroud, R. Santos, A. Byagowi, G. Kammerer et al., “Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 3838–3845, 2020.
  60. W. Yuan, S. Dong, and E. H. Adelson, “Gelsight: High-resolution robot tactile sensors for estimating geometry and force,” Sensors, vol. 17, no. 12, p. 2762, 2017.
  61. J. Hansen, F. Hogan, D. Rivkin, D. Meger, M. Jenkin, and G. Dudek, “Visuotactile-rl: learning multimodal manipulation policies with deep reinforcement learning,” in 2022 International Conference on Robotics and Automation (ICRA).   IEEE, 2022, pp. 8298–8304.
  62. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
  63. Y. Qin, B. Huang, Z.-H. Yin, H. Su, and X. Wang, “Dexpoint: Generalizable point cloud reinforcement learning for sim-to-real dexterous manipulation,” in Conference on Robot Learning.   PMLR, 2023, pp. 594–605.
  64. M. Liu, X. Li, Z. Ling, Y. Li, and H. Su, “Frame mining: a free lunch for learning robotic manipulation from 3d point clouds,” arXiv preprint arXiv:2210.07442, 2022.
  65. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  66. S. Ross, G. J. Gordon, and J. A. Bagnell, “No-regret reductions for imitation learning and structured prediction,” CoRR, vol. abs/1011.0686, 2010.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ying Yuan (95 papers)
  2. Haichuan Che (3 papers)
  3. Yuzhe Qin (37 papers)
  4. Binghao Huang (10 papers)
  5. Zhao-Heng Yin (17 papers)
  6. Kang-Won Lee (2 papers)
  7. Yi Wu (171 papers)
  8. Soo-Chul Lim (2 papers)
  9. Xiaolong Wang (243 papers)
Citations (26)

Summary

Introduction

Executing contact-rich manipulation tasks with robotic systems requires a nuanced integration of sensory inputs, specifically vision and touch. The fusion of these modalities is complicated due to their fundamentally different natures. Whereas tactile information tends to be sparse and low-dimensional, providing localized contact data, visual feedback is usually dense and high-dimensional, offering a wide array of environmental cues. A significant challenge lies in not only processing these dissimilar data streams effectively but also in integrating them to enable a robot to make informed and dexterous manipulations.

Visuotactile Representation

To address the integration challenge, a novel approach named Robot Synesthesia is introduced. It draws inspiration from human tactile-visual synesthesia, wherein certain individuals can perceive colors when they touch objects. Robot Synesthesia represents tactile data from Force-Sensing Resistor (FSR) sensors as a point cloud, which is then combined with a camera-generated point cloud into a singular three-dimensional space. This method accurately preserves the spatial relationships between the robot's links, sensors, and objects, melding vision and touch into a cohesive sensory experience. Tactile point clouds are easily generated in both simulated and real-world settings, offering advantages for in-hand manipulation tasks, such as lowering the Sim2Real transfer gap and enhancing spatial reasoning.

Training Pipeline

The training process involves two stages. The first stage involves training a 'teacher' policy within a simulation, which uses reinforcement learning (RL) and has access to precise state information including the robot's joint positions and object pose. This teacher model provides high-level guidance for the 'student' policy, which uses the tactile and visual point cloud data. The student policy is initially subjected to Behavior Cloning from the teacher policy's dataset, followed by Dataset Aggregation for refinement. The employed PointNet encoder with visual and tactile inputs underpins the student policy's architecture, allowing the system to process the integrated sensory data.

Experimentation and Outcomes

The system's capabilities are demonstrated through a series of benchmark problems involving in-hand object rotation. The tasks range from single object manipulation to more complex scenarios, such as rotating double balls concurrently. Experiments conducted in a simulated environment and then transferred to a real robot hand showcase the method's applicability to various in-hand rotation tasks without additional real-world data. Comprehensively, the system achieves potent Sim2Real performance, with abilities extending to generalization from trained geometries to novel, real-world objects.

The research reveals that the presented Robot Synesthesia approach affords a significant leap toward sophisticated robotic manipulation. The integrated visuotactile methodology facilitates a higher level of dexterous in-hand manipulation, robust to occlusions and variations in object shape and size. Moreover, the findings indicate a promising path for the progression of robotic interaction with real-world environments, with applications possibly extending into more complex domains where tactile and visual feedback is paramount.

Github Logo Streamline Icon: https://streamlinehq.com