Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 36 tok/s Pro
GPT-4o 113 tok/s
GPT OSS 120B 472 tok/s Pro
Kimi K2 214 tok/s Pro
2000 character limit reached

Object-Aware Whole-Body Teleoperation

Updated 16 August 2025
  • The paper’s main contribution is integrating high-frequency bilateral control, haptic feedback, and online multi-stage inertial parameter estimation for dynamic loco-manipulation.
  • The framework employs vision-based object sizing, VLM-informed priors, and a decoupled hierarchical cross-entropy method to estimate mass, CoM, and inertia under physical constraints.
  • Experimental validation on the SATYRR platform demonstrates enhanced joint tracking, stability, and responsive haptic feedback, ensuring robust manipulation of unknown payloads.

An object-aware whole-body bilateral teleoperation framework enables a human operator to intuitively and safely command a mobile or fixed-base robot (typically a wheeled humanoid) in manipulation and locomotion tasks involving unknown, dynamic objects, by fusing high-frequency bilateral control, integrated haptic feedback, and real-time estimation of object inertial parameters. The core innovation of the framework lies in combining traditional teleoperator–robot mapping with an online multi-stage object parameter estimation pipeline, which empowers the system to update internal dynamic models as new objects are grasped, lifted, or manipulated, thereby enhancing dynamic synchronization, transparency, and manipulation accuracy during complex loco-manipulation scenarios.

1. System Architecture and Teleoperation Overview

The framework unifies whole-body bilateral teleoperation and parallel real-time object parameter identification for wheeled humanoids, such as the SATYRR platform as described in (Baek et al., 13 Aug 2025). At the teleoperator side, a human–machine interface (HMI) acquires the operator's body configuration and motion intent, relaying these as commands to the robot through high-frequency control channels. Critically, the operator receives haptic feedback proportional to dynamic state discrepancies and external disturbances, closing the bilateral control loop for both motion and force.

At the robot side, as the system detects environmental interaction—specifically, when the robot establishes contact with an unknown payload—the framework invokes a multi-stage inertial parameter estimation module. The outputs from this estimator, including updated object mass, center of mass (CoM), and inertia tensor, are incorporated online into the robot's balance and control pipeline, modifying the robot’s dynamic equilibrium and force rendering policies. This object-adaptive adaptation allows the teleoperator to focus on high-level control while the system compensates for the physical impact of the manipulated load.

2. Multi-Stage Inertial Parameter Estimation

The multi-stage object parameter estimation module is designed to deliver physically feasible, real-time updates of all critical object dynamic parameters during manipulation. The estimation proceeds through:

  1. Vision-Based Object Size Estimation: Using CenterSnap on a single RGB-D image, the system extracts the object's dimensions (length, width, height) as initial constraints.
  2. VLM-Informed Prior Generation: A large vision–LLM (VLM) processes both visual appearance and textual context to generate a prior for object mass, density, and initial CoM guess, employing formulas such as:

m=ρV,h=1VVrdV,I=Vρ(rh2I3(rh)(rh)T)dVm = \rho V,\quad h = \frac{1}{V} \int_V r\, dV,\quad I = \int_V \rho \left(\|r - h\|^2 I_3 - (r - h)(r - h)^T\right)dV

where V=abcV=a b c is the object volume, ρ\rho is estimated density, and hh the CoM.

  1. Decoupled Hierarchical Cross-Entropy Method (DH-CEM): Starting from VLM priors, the method samples mass and CoM within physical constraints (e.g., CoM within object volume) and then deterministically computes the inertia tensor using the sampled mass and object geometry. Simulation-based evaluation compares each parameter hypothesis to measured joint position and velocity trajectories during manipulation, employing a cost defined as:

J(θ)=t=1Tqi(t)q(t)2+q˙i(t)q˙(t)2J(\theta) = \sum_{t=1}^{T} \|q_i(t) - q^*(t)\|^2 + \|\dot{q}_i(t) - \dot{q}^*(t)\|^2

Only “elite” samples update the parameter distribution, with a multi-hypothesis scheme amplifying robustness against erroneous priors. This entire loop completes in 0.5–1.0 s for real-time updates.

This pipeline is architected for robust parallel execution with the physical robot, such that parameter estimation runs continuously without interrupting bilateral control.

3. Hierarchical Sampling and Physically Constrained Optimization

The estimation employs a hierarchical decoupling: mass and CoM are estimated first, constrained within physically feasible domains defined by the prior and the robot–object contact configuration. Inertia is then recomputed deterministically via the parallel axis theorem and closed-form integration for the assumed geometric primitive (e.g., cuboid), guaranteeing that all candidate tensors are realizable by a solid object of the given size and mass distribution.

This physical decoupling is enforced at every sampling step and ensures estimator convergence to a plausible dynamic model, even in the presence of high-dimensional, ambiguous, or inaccurate visual or language-based priors. The system’s multi-hypothesis initialization provides resilience to VLM prior errors.

4. Real-Time Dynamic Model Updating and Control Integration

As the robot manipulates the object, the latest estimated inertial parameters are immediately propagated to the whole-body dynamic controller. The controller:

  • Computes a new equilibrium point, typically by shifting robot posture (e.g., forward pitch for balancing a heavy payload), to balance the additional load,
  • Updates feedforward and feedback control policies, including compensation via inverse dynamics, to improve both balance and tracking,
  • Adjusts haptic force rendering to the teleoperator, such that the operator senses disturbances accounting for the true object dynamics.

Integration with high-fidelity simulation using sim-to-real adaptation ensures that the robot executes trajectories with minimal tracking error, even as object parameters vary online.

5. Enhanced Haptic and Bilateral Feedback Mechanisms

In this framework, haptic feedback to the human operator is dynamically adapted to reflect both changes in the robot state and the estimated external load. With accurate inertial parameter estimation, force feedback is computed so that the operator perceives the true dynamic effect of the manipulated object—transparently conveying, for example, shifts in equilibrium or the resistance of a heavy object.

This feedback mechanism minimizes unnecessary counteractive forces by the human, reduces operator fatigue, and allows focus on high-level motion and manipulation decisions, with the system automatically maintaining compliance and safety constraints.

6. Experimental Validation and Real-World Performance

The system was validated on the SATYRR wheeled humanoid for pick-and-place tasks involving heavy unknown objects. Key results include:

  • Reliable dynamic parameter convergence within 1 s after object interaction onset.
  • Stable execution of lifting, transporting, and releasing trajectories for a payload equal to approximately one-third of the robot’s weight.
  • Significant improvement in joint tracking (lower mean squared errors), due to online dynamic compensation.
  • Improved interpretability and responsiveness of haptic feedback for the teleoperator in the presence of dynamic disturbances.

Benchmarking against baselines lacking the multi-stage estimation (e.g., non-visual or non-VLM baselines) demonstrated lower normalized mean absolute error (NMAE) in mass, CoM, and inertia, as well as more robust task completion under various loading conditions.

7. Significance and Implications for Whole-Body Object-Aware Teleoperation

By integrating real-time, vision- and language-informed inertial parameter estimation into bilateral whole-body teleoperation, this framework achieves several notable advances:

  • Enabling semi-autonomous adaptation of robot dynamic equilibrium during manipulation of unknown or changing payloads,
  • Allowing the operator to focus on high-level control as the system compensates for complex dynamic disturbances in real time,
  • Enhancing safety and transparency through improved haptic feedback and physically consistent controller adaptations.

This approach demonstrates the effectiveness of combining multi-modal perception, constrained sampling-based estimation, and tightly coupled haptic rendering for robust, compliant, and object-aware whole-body teleoperation in dynamic and diverse manipulation environments (Baek et al., 13 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)