Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

Published 27 May 2026 in cs.RO, cs.AI, and cs.LG | (2605.28812v1)

Abstract: A primary bottleneck in contact-rich manipulation is the difficulty of collecting real-world data. Sim-to-real reinforcement learning offers a scalable alternative, but the simulation-reality gap prevents information-dense modalities like touch from being effectively used. Existing sim-to-real methods often mitigate this gap by simplifying tactile data into coarse low-dimensional features -- sacrificing the richness required for complex manipulation. In this work, we introduce Center-of-Pressure (CoP), an effective tactile representation grounded in physical principles that preserves dense contact information while maintaining robustness for sim-to-real transfer. To support this representation, we propose a sensor calibration scheme based on differentiable dynamics, enabling the estimation of taxel orientations without requiring ground-truth force measurements. We evaluate CoP on two blind, challenging contact-rich manipulation tasks: peg-in-hole insertion and ball balancing. Across both tasks, policies conditioned on CoP achieve zero-shot sim-to-real transfer on a multi-fingered hand, and outperform both coarse binary-contact and raw-taxel baselines. Analysis of learned policy states further suggests that CoP-conditioned policies encode task-relevant physical properties, such as object mass, as an emergent byproduct of control.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces CoP, a physics-grounded tactile representation that improves sim-to-real transfer by encoding both force and contact location.
It leverages a differentiable sensor calibration method to align simulation signals with real-world hardware for robust policy learning.
Experiments demonstrate that CoP-based policies outperform raw tactile baselines in challenging tasks like peg-in-hole insertion and ball balancing.

Beyond Binary: Sim-to-Real Dexterous Manipulation with Physics-Grounded Contact Representation

Introduction and Motivation

Dexterous, contact-rich manipulation remains a foremost challenge in robotics, primarily due to the difficulties of gathering real-world tactile data at scale. Simulation-to-reality (sim-to-real) reinforcement learning (RL) is a promising paradigm, but the sim-real gap, especially for high-dimensional tactile sensing, impedes the transferability of rich touch-based policies. Previous approaches typically either simplify tactile signals into binary or coarse features, sacrificing representational expressivity, or attempt to use raw sensor outputs, incurring misalignment and lack of robustness across simulator and hardware.

This work introduces "Center-of-Pressure" (CoP), a physics-grounded tactile representation—a compact yet expressive fusion of force magnitude and contact location—that balances sim-to-real alignment with the necessary information density for precise manipulation. Coupled with a differentiable sensor calibration method to estimate taxel orientations absent ground-truth force measurements, CoP enables robust policy learning and zero-shot sim-to-real transfer in two blind manipulation tasks: peg-in-hole insertion and ball balancing.

Figure 1: CoP comprises a 3D contact force vector and contact point within the sensor frame; mappings align taxel array signals with compact contact descriptors.

Methodology

Physics-Aware Contact Representation

CoP summarizes tactile information as a 3D force vector and contact point—grounded in rigid-body physical principles—striking a balance between transferability and informativeness. The authors develop a differentiable mapping between the $N$ -taxel sensor array (force readings per taxel) and the CoP descriptor via a stress-distribution model incorporating normal and shear forces and accounting for silicone layer deformation. Mapping avoids naive aggregation, instead blending taxel normals with distance vectors via Gaussian weighting, and projects shear forces onto the tangent plane, maintaining differentiability and computational efficiency.

Figure 2: Two hardware examples visualizing CoP force vectors and contact positions computed from raw taxel readings.

Figure 3: Simulator-derived CoP contacts are mapped to taxel force vectors in the insertion and ball-balancing tasks, filtered by sensor regions.

Closed-form solutions for CoP inference use regularized least-squares minimization, allowing gradient-based alignment across simulation and hardware.

Differentiable Sensor Calibration

Taxel orientations are difficult to calibrate due to the complex geometry of fingertips. The proposed method leverages differentiable dynamics, matching externally inferred wrench signals from tactile readings with measured joint torques under static equilibrium, optimizing rotation parameters via SVD-projected matrices with backpropagation. This approach bypasses expensive ground-truth force sensors, relying instead on internal kinematics and recorded contact responses.

Figure 4: Actuator dynamics alignment via iterative probing with structured trajectories, bridging the sim-real gap in control responses.

Sim-Real Alignment and Policy Architecture

The CoP mapping, calibrated with paired rollout data, ensures consistent contact representations across simulation and real-world deployments. Actuator parameter identification employs Bayesian optimization for dynamic response tuning. Sensor delay is measured and injected during simulation training for robustness.

Policy architecture comparisons reveal that recurrent networks (GRU-based) outperform history-stacked MLPs, achieving superior sample efficiency and learning quality without increasing input dimensionality.

Figure 5: Illustration of temporal context strategies—MLP with stacked observations versus recurrent networks with hidden state fusion.

Experimental Evaluation

Tasks and Baselines

Experiments are conducted on two challenging manipulation tasks using a 16-DOF Allegro hand equipped with XELA uSkin sensors:

Peg-in-hole insertion with various shaped pegs requiring grasp and adaptive alignment (circle, diamond, ellipse, hexagon, square, triangle).
Ball balancing on a supported plate, maintaining centrality across diverse balls with varying mass, friction, and surface texture.

Baselines include proprioception-only, binary contact, force magnitude, force vector-only, contact position-only, raw taxel readings, and expert human performance.

Figure 6: Custom assets for the peg-in-hole and ball-balancing tasks, facilitating diverse contact-rich evaluation conditions.

Peg-in-Hole Insertion

CoP-conditioned policies achieve the highest overall success rate (0.78) across shapes, with superior robustness under out-of-distribution initializations and masked sensor signals. Force-only and position-only ablations demonstrate that both components are necessary for optimal performance. Raw taxel baselines performed worse, corroborating the view that sensor-specific raw signals are difficult to align and less effective for transfer unless paired with advanced calibration or encoding. Human performance remains a strong outlier, highlighting the gap attributable to additional cognitive reasoning.

Figure 7: Visual depiction of successful and failed peg-in-hole rollouts—CoP policy adapts under OOD conditions, binary-contact policy lacks recovery.

Ball Balancing

Only CoP, force vector, and taxel-based policies succeeded in learning the dynamic balancing task, with CoP and force vector policies yielding comparable time-to-fall metrics, and other baselines failing to maintain the ball. The robot policy exhibits distinct emergent movement patterns for ball centering, but human prediction and control agility still outperform policies, especially with objects exhibiting nonlinear rolling.

Figure 8: Emergent cop policy movement patterns in ball-centering—accelerate, decelerate, stabilize—reflecting fine control dynamics.

Latent State Analysis

Linear probing and PCA on recurrent policy latents reveal that CoP-based policies implicitly encode task-relevant physical properties, such as ball mass and position. Clustering analysis demonstrates that policies adapt latent embeddings to organize around dynamical parameters, despite these properties being unsupervised.

Figure 9: Policy latent predictions closely track ball position, but velocity prediction is degraded due to tactile-based estimation noise.

Figure 10: Latent embedding clusters correlate with object mass as trajectories evolve, evidencing emergent identification abilities.

Implications and Future Directions

The CoP representation substantively advances the sim-to-real paradigm for contact-rich manipulation, offering a compact, physically meaningful descriptor that retains sufficient fidelity for complex policy learning and transfer. Policy architectures leveraging CoP exhibit emergent inference of object dynamics without explicit supervision—a trait valuable for scalable, generalizable robotic manipulation. The methodology obviates the need for expensive force sensors in calibration and bypasses teacher-student distillation paradigms by aligning sensor contact representations directly.

Remaining limitations include loss of sensor-specific detail, restriction to normal force components, and limited sensor types. Extension to arm-hand systems, full-hand tactile coverage, and richer raw tactile mapping remains open. The approach has potential spillover into sample-efficient RL and vision-tactile fusion in imitation learning.

Conclusion

Center-of-Pressure (CoP) contact representation presents an effective, physics-grounded solution for sim-to-real transfer in dexterous manipulation, achieving robust learning and transfer across blind, contact-rich tasks. Its differentiable calibration and alignment with simulation dimensions empower emergent identification and control strategies. The results underscore the promise of physically grounded tactile abstraction in scalable robot learning and suggest fertile ground for future multimodal, sensor-general extensions.

Markdown Report Issue