ShapeMap 3-D: Efficient shape mapping through dense touch and vision (2109.09884v3)

Published 20 Sep 2021 in cs.RO

Abstract: Knowledge of 3-D object shape is of great importance to robot manipulation tasks, but may not be readily available in unstructured environments. While vision is often occluded during robot-object interaction, high-resolution tactile sensors can give a dense local perspective of the object. However, tactile sensors have limited sensing area and the shape representation must faithfully approximate non-contact areas. In addition, a key challenge is efficiently incorporating these dense tactile measurements into a 3-D mapping framework. In this work, we propose an incremental shape mapping method using a GelSight tactile sensor and a depth camera. Local shape is recovered from tactile images via a learned model trained in simulation. Through efficient inference on a spatial factor graph informed by a Gaussian process, we build an implicit surface representation of the object. We demonstrate visuo-tactile mapping in both simulated and real-world experiments, to incrementally build 3-D reconstructions of household objects.

Citations (47)

View on Semantic Scholar

Summary

The paper introduces an innovative framework that integrates tactile and visual inputs to efficiently reconstruct 3D object shapes.
It employs a learned model with a Gaussian process factor graph that approximates local surface geometries, converging after 35–40 contacts.
The method shows robust real-world performance on household objects, enhancing scalability and robotic manipulation in unstructured environments.

Analysis of "ShapeMap 3-D: Efficient Shape Mapping through Dense Touch and Vision"

The paper under review, titled "ShapeMap 3-D: Efficient Shape Mapping through Dense Touch and Vision," addresses the challenge of reconstructing 3D object shapes using a fusion of tactile sensing and vision. The research proposes a novel approach that leverages a combination of the GelSight tactile sensor and a depth camera to construct accurate 3D shape maps incrementally. The methodology benefits from the complementary strengths of both tactile and visual inputs, optimizing the weakness of each when used in isolation.

Methodological Overview

The proposed framework employs an innovative integration of tactile and visual information, focusing on the effective incorporation of tactile measurements from the GelSight sensor with depth data captured from an overlooking camera. The approach centers around a learned model that spans tactile images to approximate local surface geometries. This model is trained in a simulated environment, ensuring robust performance in both controlled and real-world scenarios.

Key to their approach is the use of a Gaussian process (GP) framed within a spatial graph paradigm, allowing for efficient updates and queries during incremental surface reconstruction. Unlike traditional methods that could be bottlenecked by computational overhead due to the dense nature of the tactile data, the authors manage to maintain scalability through a factor graph that approximates the GP locally, offering a significant advantage over full GP implementations in terms of both memory and computation.

Numerical Results and Experimental Evaluation

The authors present extensive evaluations in both simulation and real-world environments to validate the robustness and scalability of their method. In simulations, the approach consistently achieves accurate 3D reconstructions after a relatively small number of tactile interactions, with convergence typically observed after 35–40 contacts. The use of the factor graph approach to approximate the GP is particularly effective, with the system demonstrating near-constant update times despite the potentially large number of factors involved.

In the real-world experiments, the authors showcase effective reconstructions of different household objects included in the YCB dataset. Despite challenges such as specular surfaces that adversely affect depth camera performance, the method reliably yields accurate shape reconstructions through the precise tactile sensing capabilities of the GelSight sensor.

Implications and Future Work

The implications of this work are substantial for the domain of robotic manipulation and perception, particularly in unstructured environments where comprehensive object models may be sparse or unavailable. The ability to incrementally build robust 3D shape maps using both touch and vision enables potentially more effective robotic interactions with unknown objects, enhancing capabilities in tasks such as grasping and manipulation.

Looking ahead, the authors mention several avenues for future research, including the development of active exploration strategies to optimize object coverage and further refinement of the factor graph technique to minimize computational requirements. Additionally, there is an interest in extending the current methodology to scenarios involving deformable objects or those with dynamically changing poses, which presents further complexities for perception systems.

In summary, "ShapeMap 3-D" provides a compelling framework for real-time 3D shape estimation that effectively combines advanced tactile and visual sensing techniques. The use of Gaussian processes within a spatial graph framework marks a significant contribution to efficient, scalable shape reconstruction methods, offering promising extensions to diverse applications in robotics and automated systems.

PDF Markdown

Related Papers

YouTube

Show All Videos