- The paper introduces a novel task-agnostic, proprioception-only method for estimating whole-body contact wrenches in humanoid robots.
- It leverages Conditional Flow Matching and a dual-head architecture to infer contact masks and wrench fields from joint and IMU data with high accuracy.
- Experimental results demonstrate robust zero-shot generalization to multi-contact scenarios, outperforming traditional methods in noise resilience and timing precision.
SixthSense: Proprioception-Only Whole-Body Wrench Estimation for Humanoids
The estimation of external wrenches is essential for contact-aware and force-interactive humanoid robotics, particularly given the complexity of floating-base dynamics, unstructured contact events, and the lack of dense whole-body sensing. Existing model-based wrench estimation methods rely on idealized assumptions or explicit contact measurements, rendering them impractical for general-purpose humanoid deployment. Analytical frameworks suffer from structural non-identifiability and error amplification when contact information is uncertain, especially in multi-contact scenarios.
SixthSense addresses this gap by recasting contact estimation as a probabilistic generative modeling problem: it infers whole-body contact event timing, location, and wrench magnitude directly from joint-level proprioception and IMU data alone, without explicit force or tactile sensors. This approach relaxes rigid assumptions and enables task-agnostic plug-and-play integration across diverse behaviors and control policies.
Figure 1: SixthSense architecture enables robust, task-agnostic estimation of whole-body contact wrench fields via proprioception.
The method discretizes the robot’s surface into N sub-regions, framing the estimation of external wrenches as joint inference of a probabilistic contact mask M∈[0,1]N and a per-subregion wrench field F∈RN×6 at each timestep. Contacts applied at the surface are mapped to equivalent wrenches at the sub-region center of mass, preserving both force and torque components.
Figure 2: Mapping of surface contact forces to equivalent wrenches at link centers of mass.
SixthSense processes a windowed proprioceptive observation stream—joint positions, velocities, torques, base orientation and angular velocity—to encode temporal dependencies. To capture the inherent multi-modality and ambiguity in contact perception, it employs Conditional Flow Matching (CFM), which iteratively refines an initial noisy contact estimate toward the true distribution, conditioned on the aligned proprioceptive history. CFM is particularly suited to the non-injective, distributional nature of the joint-torque-to-contact mapping in floating-base robots.
Figure 3: Training procedure using controller rollouts to conditionally learn the estimation flow of wrench fields over discretized surface regions.
Figure 4: Information flow—proprioceptive tokenization, iterative CFM refinement, progressive mask and wrench inference.
A shared-backbone dual-head architecture allows efficient and coherent predictions of contact probability and corresponding wrenches, facilitating generalization across tasks and controllers.
Experimental Validation: Simulation and Real-Robot Deployment
Validation on the Unitree G1 platform demonstrates consistent high accuracy across standing, walking, and whole-body motion tracking controllers. In simulation, SixthSense achieves strict tolerant success rates of up to 98.1% for contact timing (within ±0.1s), mask detection rates above 84% in standing/walking, and mean wrench estimation errors comparable to low-cost force-torque sensors (force magnitude error <2.1N, force direction <29°, torque magnitude <0.85N·m).


Figure 5: Standing scenario evaluated for contact estimation performance.

Figure 6: Mask detection rates and temporal precision in standing scenarios.
SixthSense maintains performance during zero-shot generalization to multi-contact cases, despite training only on single-contact data. CFM distributions enable robust multi-modal posterior representation, outperforming deterministic MLP baselines—which collapse under multi-contact ambiguity—with a detection rate of 89.31% versus 17.24% (Top-1) for two simultaneous contacts.

Figure 7: Bilateral wrist contact scenario illustrating multi-contact inference capability.
Figure 8: Predicted mask and wrench fields for multi-contact zero-shot generalization.
Its performance remains robust with increased sensor noise, ill-conditioned dynamics, and rapid motion, significantly outperforming GMO and CPF baselines by maintaining higher accuracy and lower false alarm rates in multi-contact and noisy regimes.

Figure 9: Quantified localization accuracy across noise levels for mask prediction.

Figure 10: Non-identifiability illustrated; ambiguous solutions in multi-contact scene.
Ablation studies confirm the hypothesis that controller robustness directly improves the observability and performance of contact inference from proprioceptive data. Controllers trained for disturbance-resilience yield more informative signals, reducing false alarms and improving wrench estimation error.
Sim-to-Real Transfer and Practical Deployment
For real-world operation, sim-to-real transfer is accomplished via extensive domain randomization and noise augmentation during both data collection and training. SixthSense enables real-time estimation of spatiotemporally sparse wrench fields with a large (∼100M parameter) model and efficient inference (0.5 s per forward pass).
Figure 11: Real-robot contact data collection setup with instrumented force sensors.

Figure 12: Visualization of predicted contact mask chunk on real robot deployment.
Contact estimation success on hardware validates the practicality and generality of the approach. Applied forces manifest as localized mask activations and wrench magnitudes; neighboring links show correlated responses reflecting mechanical connectivity.
Application: Physical Human–Robot Interaction
Whole-body contact awareness fundamentally enables new modalities of safe and intuitive physical human–robot interaction (pHRI). SixthSense captures contact events anywhere on the torso or limb, allowing robots to interpret interactions as commands, plan environment-aware responses, and improve safety in collaborative scenarios.
Figure 13: Use cases in pHRI—contact as command, feedback for planning, enhanced safety.
Limitations and Theoretical Implications
SixthSense pioneers the task-agnostic proprioception-driven estimation of whole-body external wrenches, but faces several practical and theoretical challenges:
- The scarcity of dense real-world contact wrench ground-truth labels limits absolute benchmarking.
- The exclusive reliance on proprioception restricts information flow; multimodal sensor fusion (vision, language, tactile) could enhance performance.
- Current discretization granularity is sufficient for motion and simple manipulation, but finer spatial resolutions are required for advanced dexterity and nuanced pHRI.
Theoretically, this approach signals a shift toward distributional, generative perception pipelines in embodied intelligence. It exposes the latent structure of contact observability embedded in closed-loop controller dynamics, and hints at scalable, data-driven alternatives to brittle analytical estimators.
Conclusion
SixthSense formulates external wrench estimation for humanoids as a spatiotemporal probabilistic inference problem, utilizing windowed proprioceptive signals and conditional flow matching to predict whole-body contact masks and wrench fields. It achieves task-agnostic, zero-shot generalization across controllers, robust accuracy in simulation and real hardware, and enables advanced pHRI applications. This work opens avenues for further research in richer multimodal integration, dense real-world contact data acquisition, and finer spatial resolution in wrench estimation.