Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
113 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
35 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Tactile-Augmented Radiance Fields (2405.04534v1)

Published 7 May 2024 in cs.CV

Abstract: We present a scene representation, which we call a tactile-augmented radiance field (TaRF), that brings vision and touch into a shared 3D space. This representation can be used to estimate the visual and tactile signals for a given 3D position within a scene. We capture a scene's TaRF from a collection of photos and sparsely sampled touch probes. Our approach makes use of two insights: (i) common vision-based touch sensors are built on ordinary cameras and thus can be registered to images using methods from multi-view geometry, and (ii) visually and structurally similar regions of a scene share the same tactile features. We use these insights to register touch signals to a captured visual scene, and to train a conditional diffusion model that, provided with an RGB-D image rendered from a neural radiance field, generates its corresponding tactile signal. To evaluate our approach, we collect a dataset of TaRFs. This dataset contains more touch samples than previous real-world datasets, and it provides spatially aligned visual signals for each captured touch signal. We demonstrate the accuracy of our cross-modal generative model and the utility of the captured visual-tactile data on several downstream tasks. Project page: https://dou-yiming.github.io/TaRF

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, 2020.
  2. Sim-to-real transfer for vision-and-language navigation. In Conference on Robot Learning, pages 671–681. PMLR, 2021.
  3. What am i touching? learning to classify terrain via haptic sensing. In 2019 International Conference on Robotics and Automation (ICRA), pages 7187–7193. IEEE, 2019.
  4. Estimating the material properties of fabric from video. In Proceedings of the IEEE international conference on computer vision, pages 1984–1991, 2013.
  5. Alexander Burka. Instrumentation, data, and algorithms for visually understanding haptic surface properties. 2018.
  6. The feeling of success: Does touch sensing help predict grasp outcomes? Conference on Robot Learning (CoRL), 2017.
  7. More than a feeling: Learning to grasp and regrasp using vision and touch. Robotics and Automation Letters (RA-L), 2018a.
  8. More than a feeling: Learning to grasp and regrasp using vision and touch. IEEE Robotics and Automation Letters, 3:3300–3307, 2018b.
  9. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR), pages 510–517. IEEE, 2015.
  10. Scanrefer: 3d object localization in rgb-d scans using natural language. In European conference on computer vision, 2020.
  11. Deep reinforcement learning for tactile robotics: Learning to type on a braille keyboard. IEEE Robotics and Automation Letters, 5(4):6145–6152, 2020.
  12. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.
  13. Image-space modal bases for plausible manipulation of objects in video. ACM Transactions on Graphics (TOG), 2015.
  14. Vq-gnn: A universal framework to scale up graph neural networks using vector quantization. Advances in Neural Information Processing Systems, 34:6733–6746, 2021.
  15. Accurate, dense, and robust multiview stereopsis. IEEE transactions on pattern analysis and machine intelligence, 32(8):1362–1376, 2009.
  16. Objectfolder: A dataset of objects with implicit visual, auditory, and tactile representations. In CoRL, 2021.
  17. Objectfolder 2.0: A multisensory object dataset for sim2real transfer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10598–10608, 2022.
  18. The objectfolder benchmark: Multisensory learning with neural and real objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17276–17286, 2023.
  19. The representation of extrapersonal space: A possible role for bimodal, visual-tactile neurons. 1995.
  20. Multiple view geometry in computer vision. Cambridge university press, 2003.
  21. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  22. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  23. Learning to read braille: Bridging the tactile reality gap with diffusion models. arXiv preprint arXiv:2304.01182, 2023.
  24. Haptic terrain classification for legged robots. In 2010 IEEE International Conference on Robotics and Automation, pages 2828–2833. IEEE, 2010.
  25. Retrographic sensing for the measurement of surface texture and shape. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1070–1077. IEEE, 2009.
  26. Microgeometry capture using an elastomeric sensor. ACM Transactions on Graphics (TOG), 2011.
  27. Learning self-supervised representations from vision and touch for active sliding perception of deformable surfaces. arXiv preprint arXiv:2209.13042, 2022.
  28. Self-supervised visuo-tactile pretraining to locate and follow garment features. In Robotics: Science and Systems, 2023a.
  29. Lerf: Language embedded radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023b.
  30. Adam: A method for stochastic optimization. ICLR, 2015.
  31. Haptic inspection of planetary soils with legged robots. IEEE Robotics and Automation Letters, 4(2):1626–1632, 2019.
  32. Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation. IEEE Robotics and Automation Letters, 2020.
  33. A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE transactions on pattern analysis and machine intelligence, 27(3):418–433, 2005.
  34. Vihope: Visuotactile in-hand object 6d pose estimation with shape completion. IEEE Robotics and Automation Letters, 8(11):6963–6970, 2023.
  35. Connecting touch and vision via cross-modal prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10609–10618, 2019.
  36. Learning to identify object instances by touch: Tactile recognition via multimodal matching. In 2019 International Conference on Robotics and Automation (ICRA), pages 3644–3650. IEEE, 2019.
  37. Learning visual locomotion with cross-modal supervision. In 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023.
  38. Learning physically grounded robot vision with active sensing motor policies. In 7th Annual Conference on Robot Learning, 2023.
  39. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  40. Visually indicated sounds. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2405–2413, 2016.
  41. Bounce and learn: Modeling scene dynamics with real-world bounces. arXiv preprint arXiv:1904.06827, 2019.
  42. General in-hand object rotation with vision and touch. arXiv preprint arXiv:2309.09979, 2023.
  43. Learning transferable visual models from natural language supervision, 2021.
  44. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  45. High-resolution image synthesis with latent diffusion models, 2021.
  46. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  47. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  48. Design, motivation and evaluation of a full-resolution optical tactile sensor. Sensors, 19(4):928, 2019.
  49. Indoor segmentation and support inference from rgbd images. In ECCV 2012, ECCV 2012.
  50. The development of embodied cognition: Six lessons from babies. Artificial life, 2005.
  51. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  52. Midastouch: Monte-carlo inference over distributions across sliding touch. In Conference on Robot Learning, pages 319–331. PMLR, 2023.
  53. Nerfstudio: A modular framework for neural radiance field development. arXiv preprint arXiv:2302.04264, 2023.
  54. 3d shape perception from monocular vision, touch, and shape priors. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.
  55. Sun3d: A database of big spaces reconstructed using sfm and object labels. 2013.
  56. Touch and go: Learning from human-collected vision and touch. Neural Information Processing Systems (NeurIPS) - Datasets and Benchmarks Track, 2022.
  57. Generating visual scenes from touch. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22070–22080, 2023.
  58. Binding touch to everything: Learning unified multimodal tactile representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
  59. Scannet++: A high-fidelity dataset of 3d indoor scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12–22, 2023.
  60. Rotating without seeing: Towards in-hand dexterity through touch. arXiv preprint arXiv:2303.10880, 2023.
  61. Connecting look and feel: Associating the visual and tactile properties of physical materials. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5580–5588, 2017.
  62. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.
  63. Touching a nerf: Leveraging neural radiance fields for tactile sensory data generation. In Conference on Robot Learning, pages 1618–1628. PMLR, 2023.
Citations (13)

Summary

  • The paper introduces TaRF, which fuses sparse touch data and visual inputs to predict tactile features across unseen 3D regions.
  • It employs multi-view geometry and a conditional diffusion model to precisely align and infer tactile information from RGB-D images.
  • The approach shows promise for robotics, VR/AR, and design, supported by a dataset of over 19,000 aligned visual-tactile pairs.

Tactile-Augmented Radiance Fields: Merging Touch and Vision in 3D Space

Introduction to TaRF

The concept of a tactile-augmented radiance field (TaRF) merges the sensations of touch and vision within a unified 3D model. By harnessing both sparsely sampled touch data and extensive visual input, TaRF enables the prediction of tactile textures and features across various unseen 3D points in a given space. This multisensory approach not only increases the richness of scene understanding but also addresses some of the practical challenges associated with data acquisition in tactile sensing.

Key Insights and Methodology

Two main insights drive the effectiveness of TaRF:

  1. Visual-Tactile Sensor Integration: Leveraging common vision-based touch sensors mounted on cameras, the alignment between touch and visual data is achieved using multi-view geometry techniques. This integration allows for spatial consistency between seen and touched surfaces.
  2. Diffusion Models for Touch Prediction: A conditional diffusion model is employed, using RGB-D images derived from neural radiance fields to predict the tactile signal at various points, effectively 'filling in' tactile data between sampled points.

The process involves capturing visuals of a scene with an RGB-D camera while simultaneously recording touch data using a sensor mounted on the camera. The positional data is synthesized into a 3D space, facilitating combined visual-tactile mapping.

Dataset and Findings

The researchers introduced a new dataset featuring 19,300 aligned visual-tactile pairs, surpassing previous datasets in both size and the quality of alignment. Several key findings from the paper include:

  • 3D Touch Localization: The model could accurately infer the location in 3D space where specific tactile sensations occur.
  • Material Property Analysis: It demonstrated reliable performance in identifying materials based on tactile data, proving its potential utility in automated systems needing to recognize material types or conditions without direct contact.

Practical Implications and Future Directions

Robotics and Physical Simulation

Robots that can 'see' and 'feel' their environments are pivotal for advanced automation. TaRF can potentially enhance robotic perception, allowing for more nuanced interactions with various surfaces and materials, from delicate medical uses to robust industrial applications.

Virtual Reality and Augmented Reality

In VR and AR, creating realistic simulations not just visually but also in terms of how objects feel can significantly enhance user experience. TaRF's ability to predict tactile characteristics in detailed 3D space holds promise for more immersive virtual environments.

Design and Testing

For product design and testing, being able to predict how different parts of an object will feel before physically creating it can save resources and allow for better optimization of materials and ergonomics.

The Road Ahead

While TaRF brings us closer to holistically understanding and interacting with our 3D world through combined senses, its current iteration assumes static scenes and may struggle with materials that change shape or property upon interaction. Future iterations could look into dynamic environments where real-time feedback from touch and visual inputs continually updates the model, increasing both accuracy and application scope.

In conclusion, TaRF presents a compelling development in the fusion of touch and visual data in full 3D scenes, providing a richer, more detailed understanding of our physical environment. Its continued development will likely catalyze innovations across multiple fields, from robotics to interactive digital media.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com