- The paper introduces a two-stage simulation combining differentiable MPM-based deformation modeling with a residual neural rendering approach for high-fidelity calibration.
- It achieves significant improvements in physical and optical accuracy, reporting better Earth Mover’s Distance and PSNR metrics compared to existing baselines.
- The framework supports rapid reinforcement learning and precise tactile control, advancing capabilities in sim-to-real transfer for robotic manipulation.
Differentiable Simulation and Calibration of Optical Tactile Sensors: DOT-Sim
Motivation and Context
Optical tactile sensors, such as GelSight and DenseTact, have achieved widespread adoption owing to their high spatial resolution and compatibility with vision-based learning pipelines. However, their simulation is hampered by the highly deformable nature of their gel materials and complex optical responses. Accurate simulation is vital for data generation, algorithm development, and robust sim-to-real transfer, yet prevailing frameworks rely on imprecise approximations or limited modeling of physical and optical dynamics. DOT-Sim addresses these limitations by integrating differentiable physical modeling with data-driven optical rendering, enabling efficient and precise calibration with minimal real-world data.
Methodological Innovations
DOT-Sim introduces a two-stage simulation framework. First, sensor deformation is modeled using the Material Point Method (MPM), a particle-based continuum simulation well-suited for soft, elastic materials. This model is calibrated by aligning simulated deformations with a small set of real-world tactile data, using differentiable simulation and gradient-based optimization over Young’s modulus and Poisson’s ratio. Notably, the calibration completes in minutes with few demonstrations, a substantial operational improvement over prior art.
Second, DOT-Sim handles optical simulation by rendering depth and surface normals from the simulated mesh via a virtual camera, mirroring the real sensor’s image acquisition process. A neural network—based on DeepLabV3-ResNet50—predicts a residual image relative to the idle state, rather than the full contact frame, exploiting the observation that deformation-induced signals are localized. This residual approach enhances sample efficiency and image fidelity, as confirmed by ablation studies.
Quantitative Results
DOT-Sim demonstrates robust physical and optical accuracy across metrics adopted from prior work. For physical modeling, DOT-Sim achieves lower Earth Mover’s Distance (1.29 mm vs. 1.31 mm) and higher F-Score at 1 mm (69.89 vs. 64.69) relative to baselines such as Taxim and Tacto, indicating closer geometric alignment with real sensor surfaces. Significant error reductions are realized in regions of large deformation, which are critical for realistic contact modeling.
In optical simulation, DOT-Sim improves PSNR by up to 4 points and achieves a 17.34% gain over the strongest baseline. Ablation confirms the residual mapping's superiority: a direct regression produces noticeably blurrier outputs, while residual prediction yields sharper, artifact-free renderings.
Sim-to-Real Transfer, Classification, and Control
DOT-Sim enables effective sim-to-real transfer in downstream tasks without the need for real-image annotation. For zero-shot indenter classification on real tactile images, classifiers trained exclusively on DOT-Sim outputs achieve 90.48% (in-domain) and 81.18% (out-of-domain) accuracy, outperforming baselines by 28.24% and 44.83% respectively. For tactile-based tumor detection, DOT-Sim attains 80.56%–96.55% accuracy across varying skin stiffness, a substantial leap compared to DiffTactile and Tacto. In trajectory-following tasks, simulation-trained control policies transferred to the real xArm 7 robot exhibit <0.9 mm average error, demonstrating DOT-Sim’s capacity for precision manipulation.
Reinforcement Learning Applications
DOT-Sim’s differentiable simulation enables rapid policy training for manipulation tasks. In the box-repositioning scenario using PPO, DOT-Sim’s calibrated physical model facilitates fast convergence and stable sim-to-real transfer with only tactile images as input, showcasing its utility for RL-driven tactile control.
Limitations and Implications
DOT-Sim’s primary limitations lie in its reduced generalizability to highly out-of-distribution geometries (sharp features, intricate patterns), and its computational demand, currently supporting ~3 FPS on contemporary GPUs. These are amenable to improvement: higher MPM voxel resolution and denser indenter datasets could enhance geometric accuracy; streamlined simulation parameters may yield faster runtimes.
Practically, DOT-Sim’s modular architecture (MPM for deformation, plugin for optical rendering) supports integration into diverse simulation engines and applications. Theoretically, its differentiable pipeline advances the state-of-the-art in tactile sensor simulation, supporting analytic system identification and neural rendering. Future developments will likely focus on expanding indenter diversity, refining residual modeling for local deformation, and accelerating simulation to facilitate real-time control and more complex RL scenarios.
Conclusion
DOT-Sim (2604.27367) establishes a new paradigm for optical tactile sensor simulation via differentiable physical modeling and residual-based neural rendering. Its efficient calibration, high-fidelity physical and optical outputs, and robust sim-to-real transfer address major deficiencies in previous methods. The framework supports tactile perception, classification, and control tasks with strong quantitative outcomes and practical impact. Its limitations motivate continued research in real-time simulation, geometric generalization, and integration with broader robotic manipulation pipelines.