Papers
Topics
Authors
Recent
2000 character limit reached

Enhancing Tactile-based Reinforcement Learning for Robotic Control (2510.21609v1)

Published 24 Oct 2025 in cs.RO and cs.LG

Abstract: Achieving safe, reliable real-world robotic manipulation requires agents to evolve beyond vision and incorporate tactile sensing to overcome sensory deficits and reliance on idealised state information. Despite its potential, the efficacy of tactile sensing in reinforcement learning (RL) remains inconsistent. We address this by developing self-supervised learning (SSL) methodologies to more effectively harness tactile observations, focusing on a scalable setup of proprioception and sparse binary contacts. We empirically demonstrate that sparse binary tactile signals are critical for dexterity, particularly for interactions that proprioceptive control errors do not register, such as decoupled robot-object motions. Our agents achieve superhuman dexterity in complex contact tasks (ball bouncing and Baoding ball rotation). Furthermore, we find that decoupling the SSL memory from the on-policy memory can improve performance. We release the Robot Tactile Olympiad (RoTO) benchmark to standardise and promote future research in tactile-based manipulation. Project page: https://elle-miller.github.io/tactile_rl

Summary

  • The paper demonstrates that integrating self-supervised tactile objectives into RL significantly improves dexterous manipulation, yielding up to 36% faster object finding and enhanced performance on the RoTO benchmark.
  • The paper introduces four self-supervised objectives, including forward dynamics and tactile reconstruction, to robustly encode sparse tactile signals and predict future states.
  • The paper shows that decoupling the auxiliary memory from on-policy learning improves long-horizon performance, particularly benefiting complex tasks like Baoding that require precise contact resolution.

Enhancing Tactile-based Reinforcement Learning for Robotic Control

Introduction and Motivation

The paper "Enhancing Tactile-based Reinforcement Learning for Robotic Control" (2510.21609) addresses the persistent challenge of leveraging tactile sensing in reinforcement learning (RL) for dexterous robotic manipulation. While vision-based RL has achieved notable successes, manipulation tasks remain bottlenecked by the reliance on idealized state information and the lack of robust, scalable tactile integration. The authors hypothesize that the inconsistent efficacy of tactile feedback in RL arises from the unique data characteristics of tactile signals—namely, their sparsity and non-smoothness—which can destabilize representation learning and policy optimization. To address this, the paper introduces a suite of self-supervised learning (SSL) objectives tailored to tactile data, and empirically demonstrates their impact on dexterous manipulation tasks using only proprioception and sparse binary contacts. Figure 1

Figure 1: Tactile-based RL with self-supervision. Agents achieve superhuman dexterity in the RoTO benchmark using only proprioception and binary contacts, with self-supervised encoders capturing object positions and velocities.

Methodology

Problem Formulation

The authors formalize the problem as a POMDP, where the agent receives a kk-step history of proprioceptive and tactile observations. The observation encoder, policy, and value function are implemented as MLPs, with the encoder trained jointly via PPO and an auxiliary SSL loss. The total loss is:

L=LPPO+cauxLaux\mathcal{L} = \mathcal{L}_{\text{PPO}} + c_{\text{aux}} \mathcal{L}_{\text{aux}}

where Laux\mathcal{L}_{\text{aux}} is the self-supervised objective.

Self-Supervised Objectives

Four SSL objectives are proposed (see Figure 2):

  • Tactile Reconstruction (TR): Binary cross-entropy loss to reconstruct the current tactile observation from the latent representation.
  • Full Reconstruction (FR): Joint reconstruction of both proprioceptive and tactile observations.
  • Forward Dynamics (FD): Multi-step prediction of future latent states, using a target encoder as an EMA for stability.
  • Tactile Forward Dynamics (TFD): Combines FD with future tactile reconstruction from predicted latent states. Figure 2

    Figure 2: Proposed self-supervised objectives. Reconstruction losses (TR, FR) preserve tactile information; forward dynamics losses (FD, TFD) enforce predictive representations.

Separated Auxiliary Memory

To stabilize SSL training, the authors decouple the auxiliary memory from the on-policy RL buffer, allowing the auxiliary loss to be computed over a larger, more diverse dataset. This is particularly beneficial for tasks with long temporal dependencies.

Experimental Setup

Experiments are conducted in Isaac Lab on three custom tasks (the RoTO benchmark):

  • Find: Locate a fixed sphere using a Franka arm with two finger contact sensors.
  • Bounce: Maximize ball bounces with a Shadow Hand instrumented with 17 binary contact sensors.
  • Baoding: Rotate two balls in-hand as many times as possible with the Shadow Hand.

All agents are trained with 4096 parallel environments and extensive hyperparameter sweeps per method and environment.

Results

RL-only Baselines

Proprioceptive-only agents can solve Find and Bounce via implicit contact inference from control errors, but fail on Baoding. Adding explicit tactile signals yields marginal gains in Find, moderate gains in Bounce, and is essential for Baoding. Figure 3

Figure 3: RL-only. Mean evaluation returns for proprioceptive-tactile and proprioceptive-only agents.

RL + SSL

All four SSL objectives outperform RL-only baselines. TR and FD are consistently superior, with FD yielding the highest mean returns in Find and Bounce, and TR providing more reliable performance in Baoding. FR and TFD show environment-dependent performance and are less robust. Figure 4

Figure 4: RL+SSL. Mean evaluation returns of self-supervised agents.

RL + SSL + Memory

Decoupling the auxiliary memory has minimal effect in Find and Bounce, but substantially improves Baoding performance, indicating the importance of long-horizon dynamics for complex manipulation. Figure 5

Figure 5: RL+SSL+Memory. Mean evaluation returns of the FD agent with on-policy vs. extended auxiliary memory.

Representation Analysis

Mutual information estimation between the learned latent zt\mathbf{z}_t and ground-truth state st\mathbf{s}_t reveals that FD agents encode significantly more task-relevant information, especially in Baoding. Marginal MI analysis shows that FD uniquely captures object positions and velocities, while TR and TFD primarily encode contact events. Figure 6

Figure 6: Mutual information estimation. I(zt;st)I(\mathbf{z}_t; \mathbf{s}_t) between latent representations and ground-truth state.

Physical Metrics

Self-supervised agents achieve strong improvements in physical task metrics: 36% faster object finding, 10 more bounces in Bounce, and up to 25 Baoding rotations in 10 seconds—substantially exceeding human and prior robotic baselines. Figure 7

Figure 7: Physical metrics. Maximum (shaded) and mean (bold) across seeds.

Discussion

Task-Dependent Utility of Tactile Sensing

The utility of explicit tactile signals is highly task-dependent. They are critical when:

  • Object-robot dynamics are decoupled (e.g., Baoding).
  • Objects are low-inertia and proprioceptive signals are insufficient (e.g., Bounce).
  • Contact localization or multi-contact resolution is required.

Efficacy of Self-Supervision

Self-supervision enforces compression of task-critical information into the latent space, improving policy learning and robustness. Forward dynamics objectives are particularly effective at encoding predictive state information, while reconstruction objectives can suffer from negative interference when combining modalities.

Off-Policy Auxiliary Training

Decoupling the auxiliary memory enables leveraging a broader data distribution, which is especially beneficial for tasks with long temporal dependencies. This suggests a promising direction for integrating off-policy data into on-policy RL pipelines.

Practical Recommendations

  • For tactile-based RL, joint training with tactile reconstruction or forward dynamics objectives and a separated auxiliary memory is recommended.
  • Simpler tactile information formats (binary contacts) are surprisingly effective and computationally efficient compared to high-bandwidth pixel-based tactile signals.

Limitations

The primary limitation is the lack of real-world hardware validation. However, the focus on binary contacts mitigates sim-to-real transfer issues. Training with SSL increases computational requirements, especially for long-horizon dynamics objectives and larger auxiliary memories.

Implications and Future Directions

This work demonstrates that self-supervised representation learning can unlock the potential of tactile sensing for dexterous manipulation, even with minimalistic binary contact signals. The findings challenge the prevailing assumption that vision or privileged state information is necessary for high-level dexterity, and suggest that scalable, robust tactile-based RL is feasible with appropriate representation learning strategies.

Future research should focus on:

  • Real-world deployment and sim-to-real transfer of tactile-based RL agents.
  • Extending SSL objectives to richer tactile modalities (e.g., continuous force, contact pose, or tactile images).
  • Integrating off-policy and on-policy learning for improved sample efficiency and generalization.
  • Investigating the interplay between tactile, proprioceptive, and visual modalities in multi-sensory RL.

Conclusion

The paper provides a comprehensive empirical and methodological contribution to tactile-based RL, demonstrating that self-supervised objectives—especially forward dynamics—enable superhuman dexterity in complex manipulation tasks using only proprioception and sparse binary contacts. The introduction of the RoTO benchmark and the analysis of representation learning set a new standard for future research in tactile manipulation. The results have significant implications for the design of scalable, robust, and efficient robotic systems capable of dexterous, sensory-driven control.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We're still in the process of identifying open problems mentioned in this paper. Please check back in a few minutes.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com