Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation (2312.13469v1)
Abstract: To achieve human-level dexterity, robots must infer spatial awareness from multimodal sensing to reason over contact interactions. During in-hand manipulation of novel objects, such spatial awareness involves estimating the object's pose and shape. The status quo for in-hand perception primarily employs vision, and restricts to tracking a priori known objects. Moreover, visual occlusion of objects in-hand is imminent during manipulation, preventing current systems to push beyond tasks without occlusion. We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. We study multimodal in-hand perception in simulation and the real-world, interacting with different objects via a proprioception-driven policy. Our experiments show final reconstruction F-scores of $81$% and average pose drifts of $4.7\,\text{mm}$, further reduced to $2.3\,\text{mm}$ with known CAD models. Additionally, we observe that under heavy visual occlusion we can achieve up to $94$% improvements in tracking compared to vision-only methods. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation. We release our evaluation dataset of 70 experiments, FeelSight, as a step towards benchmarking in this domain. Our neural representation driven by multimodal sensing can serve as a perception backbone towards advancing robot dexterity. Videos can be found on our project website https://suddhu.github.io/neural-feels/
- Soft-bubble: A highly compliant dense geometry tactile sensor for robot manipulation. In Proc. IEEE Intl. Conf. on Soft Robotics (RoboSoft), pages 597–604. IEEE, 2019.
- Monocular depth estimation for soft visuotactile sensors. In Proc. IEEE Intl. Conf. on Soft Robotics (RoboSoft), 2021.
- Neural rgb-d surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6290–6301, 2022.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5855–5864, 2021.
- simple: a visuotactile method learned in simulation to precisely pick, localize, regrasp, and place objects. arXiv preprint arXiv:2307.13133, 2023.
- Tactile mapping and localization from high-resolution tactile imprints. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 3811–3817. IEEE, 2019.
- ContactGrasp: Functional multi-finger grasp synthesis from contact. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 2386–2393. IEEE, 2019.
- Past, present, and future of Simultaneous Localization and Mapping: Toward the robust-perception age. IEEE Trans. on Robotics (TRO), 32(6):1309–1332, 2016.
- Yale-CMU-Berkeley dataset for robotic manipulation research. Intl. J. of Robotics Research (IJRR), 36(3):261–268, 2017.
- Visual dexterity: In-hand dexterous manipulation from depth. In ICML Workshop on New Frontiers in Learning, Control, and Dynamical Systems, 2023.
- Sliding touch-based exploration for modeling unknown object shape with multi-fingered hands. arXiv preprint arXiv:2308.00576, 2023.
- Andrew J Davison. FutureMapping: The computational structure of spatial AI systems. arXiv preprint arXiv:1803.11288, 2018.
- Factor graphs for robot perception. Foundations and Trends in Robotics, 6(1-2):1–139, 2017.
- Superpoint: Self-supervised interest point detection and description. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 224–236, 2018.
- Learning a depth covariance function. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13122–13131, 2023.
- GelSlim: A high-resolution, compact, robust, and calibrated tactile-sensing finger. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1927–1934. IEEE, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Learning models as functionals of signed-distance fields for manipulation planning. In Conference on Robot Learning, pages 245–255. PMLR, 2022.
- Unified temporal and spatial calibration for multi-sensor systems. In 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1280–1286. IEEE, 2013.
- Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recognition, 47(6):2280–2292, 2014.
- Vision meets robotics: The KITTI dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
- Neural field representations of articulated objects for robotic manipulation planning. arXiv preprint arXiv:2309.07620, 2023.
- Dexterity from touch: Self-supervised pre-training of tactile representations with robotic play. arXiv preprint arXiv:2303.12076, 2023.
- DeXtreme: Transfer of agile in-hand manipulation from simulation to reality. arXiv, 2022.
- A benchmark for rgb-d visual odometry, 3d reconstruction and slam. In 2014 IEEE international conference on Robotics and automation (ICRA), pages 1524–1531. IEEE, 2014.
- Optimal integration of shape information from vision and touch. Experimental brain research, 179(4):595–606, 2007.
- Learning to read braille: Bridging the tactile reality gap with diffusion models. arXiv preprint arXiv:2304.01182, 2023.
- BOP: Benchmark for 6D object pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 19–34, 2018.
- Lrm: Large reconstruction model for single image to 3d. arXiv preprint arXiv:2311.04400, 2023.
- RLbench: The robot learning benchmark and learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
- 3D Gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics (ToG), 42(4):1–14, 2023.
- Evo-NeRF: Evolving NeRF for sequential robot grasping of transparent objects. In 6th Annual Conference on Robot Learning, 2022.
- Learning self-supervised representations from vision and touch for active sliding perception of deformable surfaces. arXiv preprint arXiv:2209.13042, 2022.
- Optimizing algorithms from pairwise user preferences. arXiv preprint arXiv:2308.04571, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
- Megapose: 6d pose estimation of novel objects via render & compare. arXiv preprint arXiv:2212.06870, 2022.
- Joint inference of kinematic and force trajectories with visuo-tactile sensing. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 3165–3171. IEEE, 2019.
- DIGIT: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation. IEEE Robotics and Automation Letters (RA-L), 5(3):3838–3845, 2020.
- Differentiable physics simulation of dynamics-augmented neural objects. IEEE Robotics and Automation Letters, 8(5):2780–2787, 2023.
- In-hand manipulation of unknown objects with tactile sensing for insertion. In Embracing Contacts-Workshop at ICRA 2023, 2023.
- Neuralangelo: High-fidelity neural surface reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8456–8465, 2023.
- BARF: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5741–5751, 2021.
- Isaac Gym: High performance GPU-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470, 2021.
- NeRF: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Reconstructing the shape and motion of unknown objects with active tactile sensors. In Algorithmic Foundations of Robotics V, pages 293–309. Springer, 2004.
- Hans Moravec. Mind children: The future of robot and human intelligence. Harvard University Press, 1988.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Solving Rubik’s Cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
- Learning dexterous in-hand manipulation. CoRR, 2018.
- iSDF: Real-time neural signed distance fields for robot perception. arXiv preprint arXiv:2204.02296, 2022.
- OmniTact: A multi-directional high-resolution touch sensor. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 618–624. IEEE, 2020.
- DeepSDF: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
- Theseus: A Library for Differentiable Nonlinear Optimization. Advances in Neural Information Processing Systems, 2022.
- In-hand object rotation via rapid motor adaptation. In Conference on Robot Learning, pages 1722–1732. PMLR, 2022.
- General in-hand object rotation with vision and touch. In Conference on Robot Learning, pages 1722–1732. PMLR, 2023.
- Vision transformers for dense prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12179–12188, 2021.
- Revopoint. Revopoint POP 3 3D Scanner, 2023.
- NeRF-SLAM: Real-time dense monocular SLAM with neural radiance fields. arXiv preprint arXiv:2210.13641, 2022.
- Structure-from-Motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4104–4113, 2016.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Cable manipulation with a tactile-reactive gripper. Intl. J. of Robotics Research (IJRR), 40(12-14):1385–1401, 2021.
- 3D shape reconstruction from vision and touch. In Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2020.
- Active 3D shape reconstruction from vision and touch. In Proc. Conf. on Neural Information Processing Systems (NeurIPS), 2021.
- Learning tactile models for factor graph-based estimation. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 13686–13692. IEEE, 2021.
- Patchgraph: In-hand tactile tracking with learned surface normals. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022.
- Correcting pose estimates during tactile exploration of object shape: a neuro-robotic study. In 4th International Conference on Development and Learning and on Epigenetic Robotics, pages 26–33. IEEE, 2014.
- iMAP: Implicit mapping and positioning in real-time. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6229–6238, 2021.
- In-hand object-dynamics inference using tactile fingertips. IEEE Transactions on Robotics, 37(4):1115–1126, 2021.
- Tactile SLAM: Real-time inference of shape and pose from planar pushing. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), May 2021.
- Midastouch: Monte-carlo inference over distributions across sliding touch. In 6th Annual Conference on Robot Learning, 2022.
- ShapeMap 3-D: Efficient shape mapping through dense touch and vision. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), Philadelphia, PA, USA, May 2022.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
- What do single-view 3D reconstruction networks learn? In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3405–3414, 2019.
- Deep object pose estimation for semantic robotic grasping of household objects. arXiv preprint arXiv:1809.10790, 2018.
- Diff-dope: Differentiable deep object pose estimation. arXiv preprint arXiv:2310.00463, 2023.
- TACTO: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors. IEEE Robotics and Automation Letters (RA-L), 2022.
- GelSight Wedge: Measuring high-resolution 3D contact geometry with a compact robot finger. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA). IEEE, 2021.
- 3D shape perception from monocular vision, touch, and shape priors. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1606–1613. IEEE, 2018.
- The TacTip family: Soft optical tactile sensors with 3D-printed biomimetic morphologies. Soft robotics, 5(2):216–227, 2018.
- Bundlesdf: Neural 6-dof tracking and 3d reconstruction of unknown objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 606–617, 2023.
- Virdo++: Real-world, visuo-tactile dynamics and perception of deformable objects. arXiv preprint arXiv:2210.03701, 2022.
- Wonik Robotics. Allegro Hand, 2023.
- Multiview compressive coding for 3D reconstruction. arXiv:2301.08247, 2023.
- PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv preprint arXiv:1711.00199, 2017.
- Neural fields in visual computing and beyond. In Computer Graphics Forum, volume 41, pages 641–676. Wiley Online Library, 2022.
- iNeRF: Inverting neural radiance fields for pose estimation. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1323–1330. IEEE, 2021.
- Rotating without seeing: Towards in-hand dexterity through touch. arXiv preprint arXiv:2303.10880, 2023.
- PixelNeRF: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021.
- Shape and pose recovery from planar pushing. In Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages 1208–1215. IEEE, 2015.
- Realtime state estimation with tactile and visual sensing: application to planar manipulation. In Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 7778–7785. IEEE, 2018.
- GelSight: High-resolution robot tactile sensors for estimating geometry and force. Sensors, 17(12):2762, 2017.
- Faster segment anything: Towards lightweight sam for mobile applications. arXiv preprint arXiv:2306.14289, 2023.
- FingerSLAM: Closed-loop unknown object localization and reconstruction from visuo-tactile feedback. arXiv preprint arXiv:2303.07997, 2023.
- Touching a NeRF: Leveraging neural radiance fields for tactile sensory data generation. In 6th Annual Conference on Robot Learning, 2022.
- NICE-SLAM: Neural implicit scalable encoding for SLAM. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12786–12796, 2022.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.