Contact SLAM: Tactile-Based Blind Manipulation
- Contact SLAM is a tactile-based SLAM framework using high-resolution contact sensing and geometric priors to precisely estimate gripper, object, and obstacle poses in blind tasks.
- The framework employs a factor-graph MAP estimator combined with particle-filtered tactile exploration to reduce pose uncertainty from ~30 mm to below 5 mm within 6–8 steps.
- It integrates active exploration and detailed sensor models to achieve millimeter-level localization for contact-rich, visionless manipulation despite assumptions of static environments.
Contact SLAM is a physically-driven simultaneous localization and mapping (SLAM) framework tailored for robots performing contact-rich manipulation where vision is unavailable or occluded (“blind manipulation”). It utilizes high-resolution tactile sensing in conjunction with known object geometries to estimate the state of both the manipulated objects and the surrounding environment. The framework integrates a factor-graph-based maximum a posteriori (MAP) estimator for tactile-only scene and pose inference with an active exploration policy that maximizes information gain, enabling precise and efficient manipulation in fine, contact-dominated blind tasks (Wang et al., 11 Dec 2025).
1. Mathematical Foundations
1.1 State Representation
At each time step , Contact SLAM maintains estimates for the following state variables:
- : Gripper pose in the world frame.
- : Pose of the grasped object.
- : Pose of static environment features, such as obstacles or receptacles.
- : Binary indicator of task completion.
Contact regions in the scene are abstracted as piecewise-linear polygonal boundaries. For each object or obstacle :
where are vertex coordinates in the object frame and the corresponding outward normals.
1.2 Tactile Sensor Model
Each gripper finger is instrumented with a Tac3D sensor, which outputs the net contact force and torque. The measured signal at time is modeled as
where is the analytic mapping from the object and gripper poses to predicted force/torque, and the sensor noise covariance. In the two-finger setting, contact forces are decomposed in the object frame:
The contact point is computed from:
where is the cross-product (skew-symmetric) matrix.
1.3 Transition Models
- Gripper motion follows measured robot kinematics: is updated with pose prior noise .
- The grasped object moves approximately rigidly with the gripper, modulo small slip: plus process noise.
- The environment is assumed static except when contact constraints are triggered: , .
1.4 Factor Graph MAP Inference
The joint inference is cast as a factor-graph-based MAP optimization:
with factors summarizing the above models:
- (gripper prior),
- (object pose constraint),
- (environment via contact region boundaries),
- (task completion/alignment).
2. Active Tactile Exploration Policy (ATEP)
ATEP is designed to reduce uncertainty in object-environment pose alignment before commencing fine manipulation. It operates as a particle-filtered, information-driven localizer:
2.1 Particle Representation
A set of particles captures pose hypotheses, each weighted as . Initialization is uniform.
2.2 Local-Peak Detection
Particles with , , are defined as “peaks”. Convergence is declared if the boundary of these peaks is within .
2.3 Information-Gain Criterion
For each action and particle , the predicted contact and travel distance are computed. The score for action is:
where is the entropy of predicted contact types and is travel distance variance. The optimal action maximizes .
2.4 Particle Update and Control
The selected action is executed until a tactile change is detected. Contact direction determines which region boundaries are updated. The particle set is accordingly restricted and weights are updated:
2.5 Termination
The process terminates when peak count and spatial spread , at which point a final alignment trajectory aims to satisfy .
3. Implementation and Experimental Evaluation
3.1 Robotic Setup
- 6-DOF robotic arm equipped with a two-finger parallel gripper.
- Tac3D tactile sensors on each fingertip, providing 3D force distribution and torque with contact-point localization error mm.
- Manipulated objects modeled from CAD meshes.
3.2 Demonstrated Tasks
- Blind Socket Assembly: Insertion of two- and three-pin plugs into matched sockets; no vision is used.
- Blind Block-Pushing: T-shaped tool pushes a movable block through obstacles relying strictly on geometric priors and tactile cues.
3.3 Metric Results
Table: Summary of Empirical Results
| Task | Final Pose Error | Mean Exploration Steps | Localization Error |
|---|---|---|---|
| Socket (two-pin) | 3.775 mm | 7.13 | mm (contact pt) |
| Socket (three-pin) | 1.815 mm | 7.67 | mm (contact pt) |
| Block pushing (obstacles) | n/a | 3–5 tactile events | mm (obstacle) |
After $6$–$8$ exploration steps, particle spread () reduced from approximately $30$ mm to mm.
3.4 Sensitivity and Ablation
- Task complexity (e.g., plug geometry) affects the number of required exploration steps.
- The thresholds and entropy weights influence the speed and reliability of localization.
4. Analysis: Contributions, Strengths, and Limitations
4.1 Main Contributions
- A tactile-based SLAM framework leveraging prior object geometry and physical reasoning, producing localization accuracy to the millimeter level.
- Modular factor-graph estimation that supports plug-and-play integration with existing solvers (e.g., GTSAM).
- An exploration heuristic balancing entropy (informativeness) and trajectory length for efficient contact localization.
4.2 Advantages
- Operation is fully visionless, enabling manipulation in visually occluded scenarios.
- Generalizable across multiple task classes, including peg-in-hole and pushing.
- Provides real-time localization and force-based inference without reliance on external sensors.
4.3 Limitations
- Assumes quasi-static, rigid, geometric contact models; does not account for fast dynamics, compliance, or deformable components.
- The environment is static; moving obstacles are not treated.
- Particle filter resampling can lead to several millimeters of pose uncertainty, particularly in ambiguous or flat-likelihood regimes.
5. Perspectives and Future Research Directions
Planned research directions include integrating dynamic and force–mass models for rapid manipulation, extending methods to deformable and articulated objects, and the fusion of sparse vision measurements to augment tactile-only SLAM for hybrid active perception scenarios (Wang et al., 11 Dec 2025). This suggests potential applicability in unstructured or partially observable environments common to advanced manufacturing, service robotics, and field deployment, contingent on overcoming limitations related to environment assumptions and contact modeling.