Papers
Topics
Authors
Recent
Search
2000 character limit reached

SixthSense: Task-Agnostic Proprioception-Only Whole-Body Wrench Estimation for Humanoids

Published 2 May 2026 in cs.RO | (2605.01427v1)

Abstract: Humanoid robots are entering our physical world at scale, yet as oversized toys--good at singing and dancing, but short on force-interaction capabilities for practical tasks. Bridging this gap necessitates prioritizing reliable contact perception as a fundamental requirement. Estimating external wrenches in humanoids is complicated by floating-base dynamics and indeterminate contact locations. Existing analytical frameworks require idealistic assumptions and hard-to-obtain measurements, which are often unavailable in practice. To bridge this gap, we propose SixthSense, a task-agnostic approach that infers whole-body contact timing, location, and wrenches from proprioception and IMU data alone. To capture the multi-modal dynamics between unstructured contact inputs and the uncertain motion outputs, we employ conditional flow matching to tokenize proprioceptive histories and estimate a spatiotemporally sparse contact-event flow. SixthSense serves as a plug-and-play perception module for applications including collision detection, physical human-robot interaction, and force-feedback teleoperation. Experiments across standing, walking, and whole-body motion-tracking policies showcased unprecedented performance in diverse behaviors.

Summary

  • The paper introduces a novel task-agnostic, proprioception-only method for estimating whole-body contact wrenches in humanoid robots.
  • It leverages Conditional Flow Matching and a dual-head architecture to infer contact masks and wrench fields from joint and IMU data with high accuracy.
  • Experimental results demonstrate robust zero-shot generalization to multi-contact scenarios, outperforming traditional methods in noise resilience and timing precision.

SixthSense: Proprioception-Only Whole-Body Wrench Estimation for Humanoids

Motivation and Problem Formulation

The estimation of external wrenches is essential for contact-aware and force-interactive humanoid robotics, particularly given the complexity of floating-base dynamics, unstructured contact events, and the lack of dense whole-body sensing. Existing model-based wrench estimation methods rely on idealized assumptions or explicit contact measurements, rendering them impractical for general-purpose humanoid deployment. Analytical frameworks suffer from structural non-identifiability and error amplification when contact information is uncertain, especially in multi-contact scenarios.

SixthSense addresses this gap by recasting contact estimation as a probabilistic generative modeling problem: it infers whole-body contact event timing, location, and wrench magnitude directly from joint-level proprioception and IMU data alone, without explicit force or tactile sensors. This approach relaxes rigid assumptions and enables task-agnostic plug-and-play integration across diverse behaviors and control policies. Figure 1

Figure 1: SixthSense architecture enables robust, task-agnostic estimation of whole-body contact wrench fields via proprioception.

Conditional Flow Matching for Spatiotemporal Contact Estimation

The method discretizes the robot’s surface into NN sub-regions, framing the estimation of external wrenches as joint inference of a probabilistic contact mask M∈[0,1]N\mathbf{M} \in [0,1]^N and a per-subregion wrench field F∈RN×6\mathbf{F} \in \mathbb{R}^{N\times 6} at each timestep. Contacts applied at the surface are mapped to equivalent wrenches at the sub-region center of mass, preserving both force and torque components. Figure 2

Figure 2: Mapping of surface contact forces to equivalent wrenches at link centers of mass.

SixthSense processes a windowed proprioceptive observation stream—joint positions, velocities, torques, base orientation and angular velocity—to encode temporal dependencies. To capture the inherent multi-modality and ambiguity in contact perception, it employs Conditional Flow Matching (CFM), which iteratively refines an initial noisy contact estimate toward the true distribution, conditioned on the aligned proprioceptive history. CFM is particularly suited to the non-injective, distributional nature of the joint-torque-to-contact mapping in floating-base robots. Figure 3

Figure 3: Training procedure using controller rollouts to conditionally learn the estimation flow of wrench fields over discretized surface regions.

Figure 4

Figure 4: Information flow—proprioceptive tokenization, iterative CFM refinement, progressive mask and wrench inference.

A shared-backbone dual-head architecture allows efficient and coherent predictions of contact probability and corresponding wrenches, facilitating generalization across tasks and controllers.

Experimental Validation: Simulation and Real-Robot Deployment

Validation on the Unitree G1 platform demonstrates consistent high accuracy across standing, walking, and whole-body motion tracking controllers. In simulation, SixthSense achieves strict tolerant success rates of up to 98.1% for contact timing (within ±0.1s), mask detection rates above 84% in standing/walking, and mean wrench estimation errors comparable to low-cost force-torque sensors (force magnitude error <<2.1N, force direction <<29°, torque magnitude <<0.85N·m). Figure 5

Figure 5

Figure 5

Figure 5: Standing scenario evaluated for contact estimation performance.

Figure 6

Figure 6

Figure 6

Figure 6: Mask detection rates and temporal precision in standing scenarios.

SixthSense maintains performance during zero-shot generalization to multi-contact cases, despite training only on single-contact data. CFM distributions enable robust multi-modal posterior representation, outperforming deterministic MLP baselines—which collapse under multi-contact ambiguity—with a detection rate of 89.31% versus 17.24% (Top-1) for two simultaneous contacts. Figure 7

Figure 7

Figure 7: Bilateral wrist contact scenario illustrating multi-contact inference capability.

Figure 8

Figure 8

Figure 8: Predicted mask and wrench fields for multi-contact zero-shot generalization.

Its performance remains robust with increased sensor noise, ill-conditioned dynamics, and rapid motion, significantly outperforming GMO and CPF baselines by maintaining higher accuracy and lower false alarm rates in multi-contact and noisy regimes. Figure 9

Figure 9

Figure 9: Quantified localization accuracy across noise levels for mask prediction.

Figure 10

Figure 10

Figure 10

Figure 10: Non-identifiability illustrated; ambiguous solutions in multi-contact scene.

Ablation studies confirm the hypothesis that controller robustness directly improves the observability and performance of contact inference from proprioceptive data. Controllers trained for disturbance-resilience yield more informative signals, reducing false alarms and improving wrench estimation error.

Sim-to-Real Transfer and Practical Deployment

For real-world operation, sim-to-real transfer is accomplished via extensive domain randomization and noise augmentation during both data collection and training. SixthSense enables real-time estimation of spatiotemporally sparse wrench fields with a large (∼\sim100M parameter) model and efficient inference (0.5 s per forward pass). Figure 11

Figure 11: Real-robot contact data collection setup with instrumented force sensors.

Figure 12

Figure 12

Figure 12

Figure 12: Visualization of predicted contact mask chunk on real robot deployment.

Contact estimation success on hardware validates the practicality and generality of the approach. Applied forces manifest as localized mask activations and wrench magnitudes; neighboring links show correlated responses reflecting mechanical connectivity.

Application: Physical Human–Robot Interaction

Whole-body contact awareness fundamentally enables new modalities of safe and intuitive physical human–robot interaction (pHRI). SixthSense captures contact events anywhere on the torso or limb, allowing robots to interpret interactions as commands, plan environment-aware responses, and improve safety in collaborative scenarios. Figure 13

Figure 13: Use cases in pHRI—contact as command, feedback for planning, enhanced safety.

Limitations and Theoretical Implications

SixthSense pioneers the task-agnostic proprioception-driven estimation of whole-body external wrenches, but faces several practical and theoretical challenges:

  • The scarcity of dense real-world contact wrench ground-truth labels limits absolute benchmarking.
  • The exclusive reliance on proprioception restricts information flow; multimodal sensor fusion (vision, language, tactile) could enhance performance.
  • Current discretization granularity is sufficient for motion and simple manipulation, but finer spatial resolutions are required for advanced dexterity and nuanced pHRI.

Theoretically, this approach signals a shift toward distributional, generative perception pipelines in embodied intelligence. It exposes the latent structure of contact observability embedded in closed-loop controller dynamics, and hints at scalable, data-driven alternatives to brittle analytical estimators.

Conclusion

SixthSense formulates external wrench estimation for humanoids as a spatiotemporal probabilistic inference problem, utilizing windowed proprioceptive signals and conditional flow matching to predict whole-body contact masks and wrench fields. It achieves task-agnostic, zero-shot generalization across controllers, robust accuracy in simulation and real hardware, and enables advanced pHRI applications. This work opens avenues for further research in richer multimodal integration, dense real-world contact data acquisition, and finer spatial resolution in wrench estimation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.