Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting (2403.09875v3)

Published 14 Mar 2024 in cs.RO and cs.CV

Abstract: In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs

References (29)

Citations (7)

View on Semantic Scholar

Summary

The paper introduces a novel integration of tactile and visual data to enhance the quality of 3D Gaussian Splatting scene representations.
It employs a Gaussian Process Implicit Surface to fuse touch inputs with monocular depth estimation, yielding accurate depth and uncertainty maps.
Empirical results show significant improvements over baseline methods, validated by higher PSNR, SSIM, and LPIPS scores in diverse environments.

Touch-GS: Integrating Optical Tactile Sensing for Supervised 3DGS Scene Representation

Introduction to Touch-GS

The fusion of tactile and visual data presents a promising approach to enhancing 3D Gaussian Splatting (3DGS) scenes, crucial for robotic interactions with environments. This technique is particularly useful in situations where visual data alone is insufficient. The proposed method, Touch-GS, introduces a novel integration of optical tactile sensors for supervising 3DGS, leveraging the strengths of both sensory modalities to produce high-quality scene reproductions.

Optical Tactile Sensors and Gaussian Splatting

Optical tactile sensors have evolved, offering detailed touch data that complements visual representations. On the other hand, 3DGS has improved scene representation through efficient training and real-time rendering. The Touch-GS method synergizes these technologies, enhancing the depth and quality of scene representations beyond what is achievable with visual data alone.

Gaussian Process Implicit Surface (GPIS)

At the heart of Touch-GS lies the use of a Gaussian Process Implicit Surface (GPIS) to interpret tactile data. This process creates a unified representation of an object, handling uncertainty and combining multiple touch inputs into a coherent 3D model. This GPIS is then rendered into depth and uncertainty maps, which are essential for the nuanced supervision of 3DGS scenes.

Monocular Depth Estimation and Alignment

To complement the tactile data, monocular depth estimation offers contextual scene depth, which is aligned in two phases for accuracy. This alignment employs both touch-based GPIS outputs and depth data from standard sensors, providing a comprehensive depth perspective essential for precise scene reconstruction.

Depth and Touch Fusion

A novel aspect of Touch-GS is the fusion of monocular depth and tactile data, treated as a Bayesian update problem. This fusion produces a unified depth and uncertainty representation, optimizing the information from both sensory inputs to offer a highly accurate 3D scene reconstruction.

Empirical Validation

The methodology has been thoroughly validated through both simulated and real-world experiments. The results demonstrate Touch-GS's superiority in constructing accurate and detailed scenes compared with baseline methods that rely solely on visual or tactile data. This is quantitatively supported by improved scores in standard metrics such as PSNR, SSIM, and LPIPS across various testing scenes.

Implications and Future Directions

Touch-GS represents a significant advancement in the integration of tactile sensing in robotic vision, offering a methodological foundation for future research in multi-modal sensory input fusion for 3D scene reconstruction. The method's ability to deal with few-view problems and its adaptability for scenes with reflective and transparent objects open new possibilities for robotic interaction with complex environments. Future research might explore the dynamic representation of scenes, incorporating variables such as object deformability and surface friction, to move closer to realizing highly accurate digital twins for robotic systems.

Related Papers

Tweets

https://twitter.com/janusch_patas/status/1769588882574635471