Papers
Topics
Authors
Recent
2000 character limit reached

Contact SLAM: Tactile-Based Blind Manipulation

Updated 18 December 2025
  • Contact SLAM is a tactile-based SLAM framework using high-resolution contact sensing and geometric priors to precisely estimate gripper, object, and obstacle poses in blind tasks.
  • The framework employs a factor-graph MAP estimator combined with particle-filtered tactile exploration to reduce pose uncertainty from ~30 mm to below 5 mm within 6–8 steps.
  • It integrates active exploration and detailed sensor models to achieve millimeter-level localization for contact-rich, visionless manipulation despite assumptions of static environments.

Contact SLAM is a physically-driven simultaneous localization and mapping (SLAM) framework tailored for robots performing contact-rich manipulation where vision is unavailable or occluded (“blind manipulation”). It utilizes high-resolution tactile sensing in conjunction with known object geometries to estimate the state of both the manipulated objects and the surrounding environment. The framework integrates a factor-graph-based maximum a posteriori (MAP) estimator for tactile-only scene and pose inference with an active exploration policy that maximizes information gain, enabling precise and efficient manipulation in fine, contact-dominated blind tasks (Wang et al., 11 Dec 2025).

1. Mathematical Foundations

1.1 State Representation

At each time step tt, Contact SLAM maintains estimates for the following state variables:

  • gtSE(3)g_t \in SE(3): Gripper pose in the world frame.
  • tSE(3)\ell_t \in SE(3): Pose of the grasped object.
  • ete_t: Pose of static environment features, such as obstacles or receptacles.
  • θali{0,1}\theta_{\text{ali}} \in \{0,1\}: Binary indicator of task completion.

Contact regions in the scene are abstracted as piecewise-linear polygonal boundaries. For each object or obstacle jj:

Slj={(nj,vj,vj+1)j=1M}S_{l_j} = \{ (n_j, v_j, v_{j+1}) \mid j=1\dots M \}

where vjv_j are vertex coordinates in the object frame and njn_j the corresponding outward normals.

1.2 Tactile Sensor Model

Each gripper finger is instrumented with a Tac3D sensor, which outputs the net contact force and torque. The measured signal ztz_t at time tt is modeled as

zt=h(t,)+ηt,ηtN(0,Σz)z_t = h(\ell_t, \ldots) + \eta_t, \quad \eta_t \sim \mathcal{N}(0, \Sigma_z)

where h()h(\cdot) is the analytic mapping from the object and gripper poses to predicted force/torque, and Σz\Sigma_z the sensor noise covariance. In the two-finger setting, contact forces FF are decomposed in the object frame:

F=[Fx Fy Fz],{Fx=FyRFyL Fy=FzLFzR Fz=FxLFxRF = \begin{bmatrix} F_x \ F_y \ F_z \end{bmatrix} ,\quad \begin{cases} F_x = F_y^R - F_y^L \ F_y = F_z^L - F_z^R \ F_z = F_x^L - F_x^R \end{cases}

The contact point CC is computed from:

(MOeL+MOeR)=[C]×F(M_{O_e}^L + M_{O_e}^R) = [C]_\times F

where [C]×[C]_\times is the cross-product (skew-symmetric) matrix.

1.3 Transition Models

  • Gripper motion follows measured robot kinematics: gtg_t is updated with pose prior noise Σg\Sigma_g.
  • The grasped object moves approximately rigidly with the gripper, modulo small slip: tTgwTsgTls(Δt)\ell_t \simeq T_g^w T_s^g T_l^s(\Delta_t) plus process noise.
  • The environment is assumed static except when contact constraints are triggered: et=et1+wte_t = e_{t-1} + w_t, wtN(0,Σe)w_t \sim \mathcal{N}(0, \Sigma_e).

1.4 Factor Graph MAP Inference

The joint inference is cast as a factor-graph-based MAP optimization:

x^=argminx12kFk(x)Σk2\hat x = \arg \min_{x} \frac{1}{2} \sum_k \|F_k(x)\|^2_{\Sigma_k}

with factors summarizing the above models:

  • FgriF_{\text{gri}} (gripper prior),
  • FobjF_{\text{obj}} (object pose constraint),
  • FenvF_{\text{env}} (environment via contact region boundaries),
  • FaliF_{\text{ali}} (task completion/alignment).

2. Active Tactile Exploration Policy (ATEP)

ATEP is designed to reduce uncertainty in object-environment pose alignment before commencing fine manipulation. It operates as a particle-filtered, information-driven localizer:

2.1 Particle Representation

A set of NN particles {pit}\{p_i^t\} captures pose hypotheses, each weighted as witw_i^t. Initialization is uniform.

2.2 Local-Peak Detection

Particles with wit>τw_i^t > \tau, τ=0.5/N\tau = 0.5/N, are defined as “peaks”. Convergence is declared if the boundary of these peaks is within δthr\delta_{\text{thr}}.

2.3 Information-Gain Criterion

For each action aa and particle pitp_i^t, the predicted contact and travel distance (zpred,i(a),di(a))(z_{\text{pred},i}^{(a)}, d_i^{(a)}) are computed. The score for action aa is:

J(a)=α1H(Za)+α2Var(Da)J(a) = \alpha_1 H(Z_a) + \alpha_2 \operatorname{Var}(D_a)

where H()H(\cdot) is the entropy of predicted contact types and Var()\operatorname{Var}(\cdot) is travel distance variance. The optimal action maximizes J(a)J(a).

2.4 Particle Update and Control

The selected action is executed until a tactile change is detected. Contact direction determines which region boundaries are updated. The particle set is accordingly restricted and weights are updated:

wit+1=witP(zobstzpred())w_i^{t+1} = w_i^t \cdot \mathcal{P}(z_{\text{obs}}^t | z_{\text{pred}} (\ldots))

2.5 Termination

The process terminates when peak count <Nthr<N_{\text{thr}} and spatial spread <δthr<\delta_{\text{thr}}, at which point a final alignment trajectory π\pi aims to satisfy θali=1\theta_{\text{ali}} = 1.

3. Implementation and Experimental Evaluation

3.1 Robotic Setup

  • 6-DOF robotic arm equipped with a two-finger parallel gripper.
  • Tac3D tactile sensors on each fingertip, providing 3D force distribution and torque with contact-point localization error <0.5<0.5 mm.
  • Manipulated objects modeled from CAD meshes.

3.2 Demonstrated Tasks

  • Blind Socket Assembly: Insertion of two- and three-pin plugs into matched sockets; no vision is used.
  • Blind Block-Pushing: T-shaped tool pushes a movable block through obstacles relying strictly on geometric priors and tactile cues.

3.3 Metric Results

Table: Summary of Empirical Results

Task Final Pose Error Mean Exploration Steps Localization Error
Socket (two-pin) 3.775 mm 7.13 <0.5<0.5 mm (contact pt)
Socket (three-pin) 1.815 mm 7.67 <0.5<0.5 mm (contact pt)
Block pushing (obstacles) n/a 3–5 tactile events <10<10 mm (obstacle)

After $6$–$8$ exploration steps, particle spread (σ\sigma) reduced from approximately $30$ mm to <5<5 mm.

3.4 Sensitivity and Ablation

  • Task complexity (e.g., plug geometry) affects the number of required exploration steps.
  • The thresholds NthrN_{\text{thr}} and entropy weights αi\alpha_i influence the speed and reliability of localization.

4. Analysis: Contributions, Strengths, and Limitations

4.1 Main Contributions

  • A tactile-based SLAM framework leveraging prior object geometry and physical reasoning, producing localization accuracy to the millimeter level.
  • Modular factor-graph estimation that supports plug-and-play integration with existing solvers (e.g., GTSAM).
  • An exploration heuristic balancing entropy (informativeness) and trajectory length for efficient contact localization.

4.2 Advantages

  • Operation is fully visionless, enabling manipulation in visually occluded scenarios.
  • Generalizable across multiple task classes, including peg-in-hole and pushing.
  • Provides real-time localization and force-based inference without reliance on external sensors.

4.3 Limitations

  • Assumes quasi-static, rigid, geometric contact models; does not account for fast dynamics, compliance, or deformable components.
  • The environment is static; moving obstacles are not treated.
  • Particle filter resampling can lead to several millimeters of pose uncertainty, particularly in ambiguous or flat-likelihood regimes.

5. Perspectives and Future Research Directions

Planned research directions include integrating dynamic and force–mass models for rapid manipulation, extending methods to deformable and articulated objects, and the fusion of sparse vision measurements to augment tactile-only SLAM for hybrid active perception scenarios (Wang et al., 11 Dec 2025). This suggests potential applicability in unstructured or partially observable environments common to advanced manufacturing, service robotics, and field deployment, contingent on overcoming limitations related to environment assumptions and contact modeling.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Contact SLAM.