Papers
Topics
Authors
Recent
2000 character limit reached

Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation (2312.13469v1)

Published 20 Dec 2023 in cs.RO, cs.CV, and cs.LG

Abstract: To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of $81$% and average pose drifts of $4.7\,\text{mm}$, further reduced to $2.3\,\text{mm}$ with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to $94$% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (97)
  1. Soft-bubble: A highly compliant dense geometry tactile sensor for robot manipulation. In Proc. IEEE Intl. Conf. on Soft Robotics (RoboSoft), pages 597–604. IEEE, 2019.
  2. Monocular depth estimation for soft visuotactile sensors. In Proc. IEEE Intl. Conf. on Soft Robotics (RoboSoft), 2021.
  3. Neural rgb-d surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6290–6301, 2022.
  4. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
  5. simple: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects. arXiv preprint arXiv:2307.13133, 2023.
  6. Tactile mapping and localization from high-resolution tactile imprints. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 3811–3817. IEEE, 2019.
  7. ContactGrasp: Functional multi-finger grasp synthesis from contact. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 2386–2393. IEEE, 2019.
  8. Past, present, and future of Simultaneous Localization and Mapping: Toward the robust-perception age. IEEE Trans. on Robotics (TRO), 32(6):1309–1332, 2016.
  9. Yale-CMU-Berkeley dataset for robotic manipulation research. Intl. J. of Robotics Research (IJRR), 36(3):261–268, 2017.
  10. Visual dexterity: In-hand dexterous manipulation from depth. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
  11. Sliding touch-based exploration for modeling unknown object shape with multi-fingered hands. arXiv preprint arXiv:2308.00576, 2023.
  12. Andrew J Davison. FutureMapping: The computational structure of spatial AI systems. arXiv preprint arXiv:1803.11288, 2018.
  13. Factor graphs for robot perception. Foundations and Trends in Robotics, 6(1-2):1–139, 2017.
  14. Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
  15. Learning a depth covariance function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13122–13131, 2023.
  16. GelSlim: A high-resolution, compact, robust, and calibrated tactile-sensing finger. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1927–1934. IEEE, 2018.
  17. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  18. Learning models as functionals of signed-distance fields for manipulation planning. In Conference on Robot Learning, pages 245–255. PMLR, 2022.
  19. Unified temporal and spatial calibration for multi-sensor systems. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1280–1286. IEEE, 2013.
  20. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6):2280–2292, 2014.
  21. Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  22. Neural field representations of articulated objects for robotic manipulation planning. arXiv preprint arXiv:2309.07620, 2023.
  23. Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play. arXiv preprint arXiv:2303.12076, 2023.
  24. DeXtreme: Transfer of agile in-hand manipulation from simulation to reality. arXiv, 2022.
  25. A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In 2014 IEEE international conference on Robotics and automation (ICRA), pages 1524–1531. IEEE, 2014.
  26. Optimal integration of shape information from vision and touch. Experimental brain research, 179(4):595–606, 2007.
  27. Learning to read braille: Bridging the tactile reality gap with diffusion models. arXiv preprint arXiv:2304.01182, 2023.
  28. BOP: Benchmark for 6D object pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 19–34, 2018.
  29. Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023.
  30. RLbench: The robot learning benchmark and learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
  31. 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
  32. Evo-NeRF: Evolving NeRF for sequential robot grasping of transparent objects. In 6th Annual Conference on Robot Learning, 2022.
  33. Learning self-supervised representations from vision and touch for active sliding perception of deformable surfaces. arXiv preprint arXiv:2209.13042, 2022.
  34. Optimizing algorithms from pairwise user preferences. arXiv preprint arXiv:2308.04571, 2023.
  35. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  36. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  37. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
  38. Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870, 2022.
  39. Joint inference of kinematic and force trajectories with visuo-tactile sensing. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 3165–3171. IEEE, 2019.
  40. DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation. IEEE Robotics and Automation Letters (RA-L), 5(3):3838–3845, 2020.
  41. Differentiable physics simulation of dynamics-augmented neural objects. IEEE Robotics and Automation Letters, 8(5):2780–2787, 2023.
  42. In-hand manipulation of unknown objects with tactile sensing for insertion. In Embracing Contacts-Workshop at ICRA 2023, 2023.
  43. Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
  44. BARF: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5741–5751, 2021.
  45. Isaac Gym: High performance GPU-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
  46. NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  47. Reconstructing the shape and motion of unknown objects with active tactile sensors. In Algorithmic Foundations of Robotics V, pages 293–309. Springer, 2004.
  48. Hans Moravec. Mind children: The future of robot and human intelligence. Harvard University Press, 1988.
  49. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
  50. Solving Rubik’s Cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
  51. Learning dexterous in-hand manipulation. CoRR, 2018.
  52. iSDF: Real-time neural signed distance fields for robot perception. arXiv preprint arXiv:2204.02296, 2022.
  53. OmniTact: A multi-directional high-resolution touch sensor. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 618–624. IEEE, 2020.
  54. DeepSDF: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
  55. Theseus: A Library for Differentiable Nonlinear Optimization. Advances in Neural Information Processing Systems, 2022.
  56. In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning, pages 1722–1732. PMLR, 2022.
  57. General in-hand object rotation with vision and touch. In Conference on Robot Learning, pages 1722–1732. PMLR, 2023.
  58. Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
  59. Revopoint. Revopoint POP 3 3D Scanner, 2023.
  60. NeRF-SLAM: Real-time dense monocular SLAM with neural radiance fields. arXiv preprint arXiv:2210.13641, 2022.
  61. Structure-from-Motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
  62. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  63. Cable manipulation with a tactile-reactive gripper. Intl. J. of Robotics Research (IJRR), 40(12-14):1385–1401, 2021.
  64. 3D shape reconstruction from vision and touch. In Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2020.
  65. Active 3D shape reconstruction from vision and touch. In Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2021.
  66. Learning tactile models for factor graph-based estimation. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 13686–13692. IEEE, 2021.
  67. Patchgraph: In-hand tactile tracking with learned surface normals. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022.
  68. Correcting pose estimates during tactile exploration of object shape: a neuro-robotic study. In 4th International Conference on Development and Learning and on Epigenetic Robotics, pages 26–33. IEEE, 2014.
  69. iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6229–6238, 2021.
  70. In-hand object-dynamics inference using tactile fingertips. IEEE Transactions on Robotics, 37(4):1115–1126, 2021.
  71. Tactile SLAM: Real-time inference of shape and pose from planar pushing. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), May 2021.
  72. Midastouch: Monte-carlo inference over distributions across sliding touch. In 6th Annual Conference on Robot Learning, 2022.
  73. ShapeMap 3-D: Efficient shape mapping through dense touch and vision. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), Philadelphia, PA, USA, May 2022.
  74. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
  75. What do single-view 3D reconstruction networks learn? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019.
  76. Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790, 2018.
  77. Diff-dope: Differentiable deep object pose estimation. arXiv preprint arXiv:2310.00463, 2023.
  78. TACTO: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors. IEEE Robotics and Automation Letters (RA-L), 2022.
  79. GelSight Wedge: Measuring high-resolution 3D contact geometry with a compact robot finger. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA). IEEE, 2021.
  80. 3D shape perception from monocular vision, touch, and shape priors. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1606–1613. IEEE, 2018.
  81. The TacTip family: Soft optical tactile sensors with 3D-printed biomimetic morphologies. Soft robotics, 5(2):216–227, 2018.
  82. Bundlesdf: Neural 6-dof tracking and 3d reconstruction of unknown objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 606–617, 2023.
  83. Virdo++: Real-world, visuo-tactile dynamics and perception of deformable objects. arXiv preprint arXiv:2210.03701, 2022.
  84. Wonik Robotics. Allegro Hand, 2023.
  85. Multiview compressive coding for 3D reconstruction. arXiv:2301.08247, 2023.
  86. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017.
  87. Neural fields in visual computing and beyond. In Computer Graphics Forum, volume 41, pages 641–676. Wiley Online Library, 2022.
  88. iNeRF: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021.
  89. Rotating without seeing: Towards in-hand dexterity through touch. arXiv preprint arXiv:2303.10880, 2023.
  90. PixelNeRF: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
  91. Shape and pose recovery from planar pushing. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1208–1215. IEEE, 2015.
  92. Realtime state estimation with tactile and visual sensing: application to planar manipulation. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 7778–7785. IEEE, 2018.
  93. GelSight: High-resolution robot tactile sensors for estimating geometry and force. Sensors, 17(12):2762, 2017.
  94. Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
  95. FingerSLAM: Closed-loop unknown object localization and reconstruction from visuo-tactile feedback. arXiv preprint arXiv:2303.07997, 2023.
  96. Touching a NeRF: Leveraging neural radiance fields for tactile sensory data generation. In 6th Annual Conference on Robot Learning, 2022.
  97. NICE-SLAM: Neural implicit scalable encoding for SLAM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.
Citations (21)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 7 tweets and received 6 likes.

Upgrade to Pro to view all of the tweets about this paper: