Touch-based Curiosity (ToC) Framework

Updated 20 November 2025

Touch-based Curiosity (ToC) is a framework that rewards robots for novel tactile interactions, driving intrinsic exploration and learning.
It employs methods such as count-based touch novelty and cross-modal surprise to transform high-dimensional tactile data into actionable insights.
Applications span self-touch, object manipulation, and shape reconstruction, achieving robust performance even in sparse-reward settings.

Touch-based Curiosity (ToC) is a framework for intrinsically motivated robotic exploration leveraging tactile sensory input to drive learning, discovery, and manipulation behaviors. Unlike vision-centric curiosity approaches, ToC utilizes information-seeking behaviors induced by physical contact, rewarding novel or informative tactile experiences to facilitate both self-exploration and object understanding. The core paradigms include intrinsic reward design for touch novelty, neural representation of high-dimensional tactile data, task-agnostic and cross-modal surprise metrics, and specialized learning curricula. ToC has been instantiated both for robots learning self-touch and for exploration and manipulation of external objects.

1. Foundational Principles of Touch-based Curiosity

ToC operationalizes curiosity by rewarding interactions that yield novel or unpredicted tactile feedback. There are two principal classes of ToC mechanisms:

Count-based touch novelty: Intrinsic rewards increase for physical contacts with previously untapped body regions or object areas, driving broad, balanced sensorimotor exploration. This is exemplified in the Baby Sophia developmental agent, which segments raw high-dimensional tactile activations into semantically grouped regions. Novel contacts and achieving diversity across regions are heavily rewarded, penalizing redundancy and encouraging symmetric exploration (Zarifis et al., 12 Nov 2025).
Prediction-error and cross-modal surprise: Surprises emerge when an agent's internal model (often cross-modal, e.g., vision-to-touch) fails to anticipate current tactile feedback. Intrinsic rewards are computed as squared prediction errors between expected and actual touch signals or between predicted and realized future sensory representations. This approach motivates exploration of under-modeled and potentially informative regions in the sensorimotor state space (Rajeswar et al., 2021).

Both strategies support contact-rich task exploration in environments where extrinsic rewards are sparse or absent, providing effective scaffolding for subsequent supervised or downstream learning.

2. Methodological Architectures and Learning Algorithms

ToC frameworks are predominantly implemented within reinforcement learning (RL) paradigms, with architectures incorporating:

Tactile data embedding: High-dimensional tactile sensor arrays (e.g., 17,175 taxels in Baby Sophia) are first grouped anatomically (e.g., into 68 regions), producing low-dimensional semantic summaries. These are encoded via multi-layer perceptrons (MLPs) into compact touch embeddings, further fused with proprioceptive or visual features, serving as input to policy and value networks (Zarifis et al., 12 Nov 2025).
Cross-modal models: ToC may use a visual encoder generating latent representations, with a tactile decoder mapping these to predicted haptic readings. A forward dynamics model predicts future latent states given the current state and action (Rajeswar et al., 2021). The intrinsic reward combines touch-prediction error and future state prediction error.
Policy learning: Exploration policies are commonly optimized via Proximal Policy Optimization (PPO) or Soft Actor-Critic (SAC), maximizing the expected cumulative intrinsic reward. In Baby Sophia, a curriculum learning setup transitions agents from easy (fixed initial pose) to more difficult (randomized pose) exploration phases, promoting robustness and generalization (Zarifis et al., 12 Nov 2025). On-object exploration, as in tSLAM, encodes workspace occupancy grids and controls anthropomorphic hands in continuous, high-dimensional action spaces (Lu et al., 2022).
Algorithmic modules: Object exploration may be cast as a finite-horizon MDP, with actions controlling pose and joint trajectories, states comprising tactile occupancy and proprioception, and rewards driven by both discovery of new object parts and total workspace coverage (Lu et al., 2022). For shape exploration using sliding touch, Bayesian optimization guides fingertip motion along regions of maximal model uncertainty, efficiently refining object representations with minimal contacts (Chen et al., 2023).

3. Intrinsic Reward Formulation

Intrinsic reward signal design is central:

Body self-exploration (Baby Sophia) (Zarifis et al., 12 Nov 2025):
- Touch novelty: First contact of a body region in a lifetime, episode, or repeated (decayed) contact yields scaled rewards, with decay controlled by episode frequency.
- Contact novelty: Higher rewards for first contacts between a particular hand and body region.
- Diversity milestones: One-time elevations when the agent surpasses predefined counts of unique regions touched in an episode.
- Balance: End-of-episode bonus for nearly equal coverage between hands.
Object-centric curiosity (tSLAM) (Lu et al., 2022):
- Discovery: Increment for each previously unvisited object voxel touched.
- Coverage: Increment for every new workspace voxel entered, regardless of object occupancy.
- The total intrinsic reward at time $t$ is $r_{\rm int}^t = r_d^t + \lambda r_c^t$ .
Cross-modal surprise (MiniTouch tasks) (Rajeswar et al., 2021):
- Intrinsic reward $r_{\text{int} t} = (1-\alpha)\|\hat{h}_t - h_t\|^2 + \alpha\|\hat{z}_{t+1} - z_{t+1}\|^2$ , balancing instant touch prediction error and forward dynamics error, with $\alpha \in [0,1]$ .

These formulations encourage agents to seek out both novel contacts (discovery) and comprehensive coverage (avoidance of local redundancy or stasis).

4. Applications: Self-touch, Object Exploration, and Shape Modeling

ToC has been applied to diverse robotic contexts:

Self-exploration: The Baby Sophia agent uses ToC to achieve high-coverage, balanced self-touch behaviors without supervision, progressing through motor babbling to coordinated, purposeful action (Zarifis et al., 12 Nov 2025). Evaluation after $8$ million steps yields unique region coverage rates exceeding $94\%$ overall, with low left-right imbalance, evidencing the efficacy of multimodal curiosity-driven curricula.
Contact-rich manipulation: ToC enables efficient learning in sparse-reward tasks, such as pushing, opening objects, and pick-and-place, by biasing exploration toward informative haptic interactions (Rajeswar et al., 2021). In the PyBullet MiniTouch benchmark, ToC outperforms vision-only and ensemble curiosity methods in both exploration success and downstream manipulation skill acquisition, with ablation studies demonstrating the necessity of both touch and forward surprise for maximal efficiency.
Shape reconstruction: For unknown objects, tSLAM and Bayesian-optimization touch strategies integrate tactile discovery into 3D scene modeling pipelines. Occupancy grids or GPIS-based surface representations, updated with discovered surface contacts, facilitate highly data-efficient mesh reconstruction (Lu et al., 2022, Chen et al., 2023). Empirical metrics (IoU, Chamfer distance, normal consistency) confirm superior performance over random or heuristic exploration, with significantly fewer contacts required to achieve high-fidelity reconstructions.

5. Key Empirical Results and Comparative Analyses

Comprehensive quantitative results validate ToC approaches:

Task	ToC coverage/success	Baseline coverage/success	Highlight metric
Body self-touch	94.1% regions (Zarifis et al., 12 Nov 2025)	76.5%–88.2% (ablation)	2 region balance error
Pushing (MiniTouch)	0.891 downstream success (Rajeswar et al., 2021)	0.780 (ICM), 0.825 (SAC-only)	30.5 mean episode steps
Object recon (tSLAM)	0.4287 IoU, 0.0349 Chamfer (Lu et al., 2022)	0.3167–0.3653 (random/heuristic)	0.8352 normal consistency
Shape from sliding touch	4.7 mm Chamfer, 4.8 contacts (Chen et al., 2023)	7.2 mm, 6.7 contacts (random)	30–40% fewer touches

Ablation studies consistently show that omitting coverage or novelty severely degrades exploration quality and reconstruction fidelity. Cross-modal surprise yields higher-quality data than touch-only future prediction or vision-only curiosity. Representation learning driven by ToC generalizes well to novel object shapes and contexts.

6. Generalizations, Limitations, and Research Trajectories

ToC exhibits generality across different robot embodiments and tasks:

Portability: The semantic grouping and novelty-driven architecture can be adapted by redefining spatial region sets for novel robot morphologies or tactile arrays (Zarifis et al., 12 Nov 2025).
Combination with model-prediction and information gain: Count-based novelty rewards scale poorly with extremely high-dimensional or continuous contact manifolds; supplementing with prediction error or information gain terms addresses these challenges (Zarifis et al., 12 Nov 2025).
Curriculum and self-modeling strategies: Stagewise curricula, transitioning from clean to randomized setups, drive robust generalization and are applicable to modalities beyond touch. Integrated forward models (predicting tactile outcomes of actions) promise further advances by quantifying and exploiting real-time learning progress (Zarifis et al., 12 Nov 2025).

Key limitations include the need for reliable tactile feedback and careful hyperparameter selection (e.g., the tradeoff parameter $\alpha$ in cross-modal surprise rewards (Rajeswar et al., 2021)). Real-world deployment is sensitive to physical sensor characteristics.

A plausible implication is that future ToC work will incorporate self-supervised predictive modeling and hierarchical policies, supporting autonomous discovery and tool use in complex, dynamic environments.

7. Relationships to Broader Curiosity and Embodied AI Research

ToC extends the landscape of curiosity-driven exploration in embodied artificial agents, offering a complementary modality to vision and proprioception. By grounding information-seeking behavior in the physical substrate of touch, ToC enables agents to explore contact-rich, occluded, or visually ambiguous environments—domains where visual curiosity is insufficient. Furthermore, ToC-based pipelines, such as tSLAM and sliding-touch BO, establish connections to active learning, Bayesian optimization, and embodied scene understanding (Lu et al., 2022, Chen et al., 2023).

Results consistently demonstrate the superiority of ToC-guided exploration for efficient manipulation and object modeling, particularly in comparison to vision-only or heuristic methods. These findings underscore the importance of integrated, multimodal curiosity for the advancement of general-purpose, autonomous embodied systems.