Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
60 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting (2403.09875v3)

Published 14 Mar 2024 in cs.RO and cs.CV

Abstract: In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021.
  2. T. Chen, P. Culbertson, and M. Schwager, “Catnips: Collision avoidance through neural implicit probabilistic scenes,” 2023.
  3. S. Suresh, H. Qi, T. Wu, T. Fan, L. Pineda, M. Lambeta, J. Malik, M. Kalakrishnan, R. Calandra, M. Kaess, J. Ortiz, and M. Mukadam, “Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation,” 2023.
  4. B. Kerbl, G. Kopanas, T. Leimkühler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering,” ACM Transactions on Graphics (ToG), vol. 42, no. 4, pp. 1–14, 2023.
  5. W. K. Do, B. Jurewicz, and M. Kennedy, “Densetact 2.0: Optical tactile sensor for shape and force reconstruction,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 12 549–12 555.
  6. E. Donlon, S. Dong, M. Liu, J. Li, E. Adelson, and A. Rodriguez, “Gelslim: A high-resolution, compact, robust, and calibrated tactile-sensing finger,” 2018.
  7. W. K. Do, B. Aumann, C. Chungyoun, and M. Kennedy, “Inter-finger small object manipulation with densetact optical tactile sensor,” IEEE Robotics and Automation Letters, 2023.
  8. H. Qi, B. Yi, S. Suresh, M. Lambeta, Y. Ma, R. Calandra, and J. Malik, “General in-hand object rotation with vision and touch,” 2023.
  9. M. Comi, Y. Lin, A. Church, A. Tonioni, L. Aitchison, and N. F. Lepora, “Touchsdf: A deepsdf approach for 3d shape reconstruction using vision-based tactile sensing,” 2023.
  10. J. Zhao, M. Bauza, and E. H. Adelson, “Fingerslam: Closed-loop unknown object localization and reconstruction from visuo-tactile feedback,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 8033–8039.
  11. J. Chung, J. Oh, and K. M. Lee, “Depth-regularized optimization for 3d gaussian splatting in few-shot images,” 2024.
  12. K. Deng, A. Liu, J.-Y. Zhu, and D. Ramanan, “Depth-supervised nerf: Fewer views and faster training for free,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 882–12 891.
  13. B. Roessle, J. T. Barron, B. Mildenhall, P. P. Srinivasan, and M. Nießner, “Dense depth priors for neural radiance fields from sparse input views,” 2022.
  14. Y. Fu, S. Liu, A. Kulkarni, J. Kautz, A. A. Efros, and X. Wang, “Colmap-free 3d gaussian splatting,” 2023.
  15. W. Xu, Z. Yu, H. Xue, R. Ye, S. Yao, and C. Lu, “Visual-tactile sensing for in-hand object reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8803–8812.
  16. C. Pan, M. Lepert, S. Yuan, R. Antonova, and J. Bohg, “In-hand manipulation of unknown objects with tactile sensing for insertion,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2023, pp. 8765–8771.
  17. J. A. Solano-Castellanos, W. K. Do, and M. Kennedy III, “Embedded object detection and mapping in soft materials using optical tactile sensing,” arXiv preprint arXiv:2308.11087, 2023.
  18. J. J. Park, P. Florence, J. Straub, R. Newcombe, and S. Lovegrove, “Deepsdf: Learning continuous signed distance functions for shape representation,” 2019.
  19. O. Williams and A. Fitzgibbon, “Gaussian process implicit surfaces,” in Gaussian Processes in Practice, April 2007. [Online]. Available: https://www.microsoft.com/en-us/research/publication/gaussian-process-implicit-surfaces-2/
  20. S. Dragiev, M. Toussaint, and M. Gienger, “Gaussian process implicit surfaces for shape estimation and grasping,” in 2011 IEEE International Conference on Robotics and Automation, 2011, pp. 2845–2850.
  21. Y. Chen, A. E. Tekden, M. P. Deisenroth, and Y. Bekiroglu, “Sliding touch-based exploration for modeling unknown object shape with multi-fingered hands,” in 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2023, pp. 8943–8950.
  22. F. Yang, C. Feng, Z. Chen, H. Park, D. Wang, Y. Dou, Z. Zeng, X. Chen, R. Gangopadhyay, A. Owens et al., “Binding touch to everything: Learning unified multimodal tactile representations,” arXiv preprint arXiv:2401.18084, 2024.
  23. L. Fu, G. Datta, H. Huang, W. C.-H. Panitch, J. Drake, J. Ortiz, M. Mukadam, M. Lambeta, R. Calandra, and K. Goldberg, “A touch, vision, and language dataset for multimodal alignment,” arXiv preprint arXiv:2402.13232, 2024.
  24. E. Smith, D. Meger, L. Pineda, R. Calandra, J. Malik, A. Romero Soriano, and M. Drozdzal, “Active 3d shape reconstruction from vision and touch,” Advances in Neural Information Processing Systems, vol. 34, pp. 16 064–16 078, 2021.
  25. D. Watkins-Valls, J. Varley, and P. Allen, “Multi-modal geometric learning for grasping and manipulation,” in 2019 International conference on robotics and automation (ICRA).   IEEE, 2019, pp. 7339–7345.
  26. E. Smith, R. Calandra, A. Romero, G. Gkioxari, D. Meger, J. Malik, and M. Drozdzal, “3d shape reconstruction from vision and touch,” Advances in Neural Information Processing Systems, vol. 33, pp. 14 193–14 206, 2020.
  27. M. Tancik, E. Weber, E. Ng, R. Li, B. Yi, J. Kerr, T. Wang, A. Kristoffersen, J. Austin, K. Salahi, A. Ahuja, D. McAllister, and A. Kanazawa, “Nerfstudio: A modular framework for neural radiance field development,” in ACM SIGGRAPH 2023 Conference Proceedings, ser. SIGGRAPH ’23, 2023.
  28. J. R. Gardner, G. Pleiss, D. Bindel, K. Q. Weinberger, and A. G. Wilson, “Gpytorch: Blackbox matrix-matrix gaussian process inference with gpu acceleration,” in Advances in Neural Information Processing Systems, 2018.
  29. S. F. Bhat, R. Birkl, D. Wofk, P. Wonka, and M. Müller, “Zoedepth: Zero-shot transfer by combining relative and metric depth,” 2023.
Citations (7)

Summary

  • The paper introduces a novel integration of tactile and visual data to enhance the quality of 3D Gaussian Splatting scene representations.
  • It employs a Gaussian Process Implicit Surface to fuse touch inputs with monocular depth estimation, yielding accurate depth and uncertainty maps.
  • Empirical results show significant improvements over baseline methods, validated by higher PSNR, SSIM, and LPIPS scores in diverse environments.

Touch-GS: Integrating Optical Tactile Sensing for Supervised 3DGS Scene Representation

Introduction to Touch-GS

The fusion of tactile and visual data presents a promising approach to enhancing 3D Gaussian Splatting (3DGS) scenes, crucial for robotic interactions with environments. This technique is particularly useful in situations where visual data alone is insufficient. The proposed method, Touch-GS, introduces a novel integration of optical tactile sensors for supervising 3DGS, leveraging the strengths of both sensory modalities to produce high-quality scene reproductions.

Optical Tactile Sensors and Gaussian Splatting

Optical tactile sensors have evolved, offering detailed touch data that complements visual representations. On the other hand, 3DGS has improved scene representation through efficient training and real-time rendering. The Touch-GS method synergizes these technologies, enhancing the depth and quality of scene representations beyond what is achievable with visual data alone.

Gaussian Process Implicit Surface (GPIS)

At the heart of Touch-GS lies the use of a Gaussian Process Implicit Surface (GPIS) to interpret tactile data. This process creates a unified representation of an object, handling uncertainty and combining multiple touch inputs into a coherent 3D model. This GPIS is then rendered into depth and uncertainty maps, which are essential for the nuanced supervision of 3DGS scenes.

Monocular Depth Estimation and Alignment

To complement the tactile data, monocular depth estimation offers contextual scene depth, which is aligned in two phases for accuracy. This alignment employs both touch-based GPIS outputs and depth data from standard sensors, providing a comprehensive depth perspective essential for precise scene reconstruction.

Depth and Touch Fusion

A novel aspect of Touch-GS is the fusion of monocular depth and tactile data, treated as a Bayesian update problem. This fusion produces a unified depth and uncertainty representation, optimizing the information from both sensory inputs to offer a highly accurate 3D scene reconstruction.

Empirical Validation

The methodology has been thoroughly validated through both simulated and real-world experiments. The results demonstrate Touch-GS's superiority in constructing accurate and detailed scenes compared with baseline methods that rely solely on visual or tactile data. This is quantitatively supported by improved scores in standard metrics such as PSNR, SSIM, and LPIPS across various testing scenes.

Implications and Future Directions

Touch-GS represents a significant advancement in the integration of tactile sensing in robotic vision, offering a methodological foundation for future research in multi-modal sensory input fusion for 3D scene reconstruction. The method's ability to deal with few-view problems and its adaptability for scenes with reflective and transparent objects open new possibilities for robotic interaction with complex environments. Future research might explore the dynamic representation of scenes, incorporating variables such as object deformability and surface friction, to move closer to realizing highly accurate digital twins for robotic systems.

X Twitter Logo Streamline Icon: https://streamlinehq.com