Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration (2506.16986v3)

Published 20 Jun 2025 in cs.RO

Abstract: Throwing is a fundamental skill that enables robots to manipulate objects in ways that extend beyond the reach of their arms. We present a control framework that combines learning and model-based control for prehensile whole-body throwing with legged mobile manipulators. Our framework consists of three components: a nominal tracking policy for the end-effector, a high-frequency residual policy to enhance tracking accuracy, and an optimization-based module to improve end-effector acceleration control. The proposed controller achieved the average of 0.28 m landing error when throwing at targets located 6 m away. Furthermore, in a comparative study with university students, the system achieved a velocity tracking error of 0.398 m/s and a success rate of 56.8%, hitting small targets randomly placed at distances of 3-5 m while throwing at a specified speed of 6 m/s. In contrast, humans have a success rate of only 15.2%. This work provides an early demonstration of prehensile throwing with quantified accuracy on hardware, contributing to progress in dynamic whole-body manipulation.

Summary

The paper introduces a framework for legged robot throwing, integrating RL policies with real-time optimization for accurate whole-body control.
The approach combines nominal and high-frequency residual RL policies with real-time pullback tube acceleration optimization for robust, accurate tracking.
Hardware validation demonstrates high throwing accuracy (0.276m error), outperforms human baselines, and enables robust dynamic whole-body manipulation.

Learning Accurate Whole-body Throwing with High-frequency Residual Policy and Pullback Tube Acceleration

This paper presents a control framework for prehensile whole-body throwing with legged mobile manipulators, integrating model-based and learning-based components to address the challenges of dynamic, high-precision object throwing. The approach is validated on hardware, demonstrating significant improvements in accuracy and robustness compared to both baseline controllers and human performance.

Framework Overview

The proposed system comprises three main modules:

Nominal Whole-body End-Effector (EE) Tracking Policy: A reinforcement learning (RL) policy trained to track desired EE trajectories for throwing, accounting for object mass variability and release uncertainties.
High-frequency Residual Policy: A secondary RL policy operating at 400 Hz, refining the nominal policy's output to improve EE state and acceleration tracking, particularly under high-speed, dynamic conditions.
Pullback Tube Acceleration Optimizer: A convex optimization-based module that computes real-time corrective EE accelerations, ensuring the EE state remains within the backward reachable tube (BRT) of valid throwing configurations, thus mitigating release timing uncertainties.

The integration of these modules enables robust, closed-loop control for dynamic throwing tasks, leveraging both the agility of legged bases and the precision of learned and optimized control.

Methodological Contributions

Explicit EE State Tracking: Rather than end-to-end motion generation, the nominal policy is trained to explicitly track EE position, velocity, and orientation targets, with randomized object masses to enhance generalization.
Residual Policy for High-frequency Correction: The residual policy, trained after freezing the nominal policy, provides rapid, fine-grained corrections to arm joint positions, significantly improving tracking at high velocities and under model mismatch.
Closed-loop Pullback Tube Acceleration: Building on prior work in robust throwing, the optimizer computes accelerations that pull the EE state into the BRT, making the system robust to stochastic release events and tracking errors. The convex formulation enables real-time (sub-millisecond) optimization, supporting high-frequency feedback.

Implementation Details

Training: Both policies are trained in simulation using Proximal Policy Optimization (PPO) within the legged_gym environment, with extensive domain randomization, actuator modeling, and observation noise to facilitate sim-to-real transfer.
Deployment: The system is implemented on an ANYmal quadruped with a DynaArm and Robotiq gripper. State estimation and control run at 400 Hz, with the nominal policy decimated to 100 Hz and the residual policy and optimizer running at full rate.
Target Acquisition: AprilTag-based vision provides target localization, and parabolic flight models (neglecting air drag) are used for initial velocity computation.

Experimental Results

Quantitative Performance

Landing Accuracy: The integrated controller achieves a mean landing error of 0.276 m for 6 m throws, a 49.5% improvement over the nominal policy alone (0.685 m error).
Velocity Tracking: The system achieves a mean EE velocity tracking error of 0.398 m/s at 6 m/s throw speed.
Success Rate: In a comparative paper, the robot achieved a 56.8% success rate in hitting small targets at 3–5 m, compared to 15.2% for human participants.

Ablation and Component Analysis

Residual Policy Frequency: Higher-frequency (400 Hz) residual policies consistently outperform lower-frequency (100 Hz) variants, especially at high throw velocities.
Pullback Tube Acceleration: The optimizer reduces maximum landing error from 96.8 cm (constant velocity) to 31.1 cm at 400 Hz, with error decreasing monotonically as control frequency increases.
Base Motion Contribution: The legged base provides up to 53.4% higher angular impulse than a fixed-base manipulator, demonstrating the advantage of whole-body coordination for dynamic tasks.

Robustness and Generalization

The system demonstrates robust performance across diverse object types (rigid, deformable, slippery), and in both indoor and outdoor environments.
The closed-loop optimizer effectively compensates for stochastic release timing and tracking errors, as validated in both simulation and hardware.

Implications and Future Directions

Practical Implications

Dynamic Manipulation: The framework enables legged robots to perform dynamic, long-range object delivery and manipulation tasks, relevant for logistics, disaster response, and service robotics.
Sim-to-Real Transfer: The combination of domain randomization, actuator modeling, and high-frequency feedback demonstrates effective sim-to-real transfer for complex, dynamic tasks.
Real-time Optimization: The use of embedded convex optimization (via CVXPYgen) at kilohertz rates sets a precedent for integrating real-time optimization in feedback loops for manipulation.

Theoretical Implications

BRT as an Invariant Set: The closed-loop pullback tube acceleration approach formalizes the BRT as an attracting invariant set, providing a principled method for robustifying against release uncertainty.
Hierarchical Policy Design: The separation of nominal and residual policies, with distinct frequencies and objectives, offers a template for hierarchical control in other dynamic manipulation domains.

Limitations and Future Work

Residual Policy Scope: The current residual policy only tracks vertical accelerations, limiting overall accuracy. Extending to full 3D acceleration tracking could further improve performance.
Object Inertia Assumptions: The framework assumes negligible object mass relative to the manipulator's reflected inertia. Future work could incorporate online estimation of object inertial properties.
Sim-to-Real Gap: While the residual policy improves tracking, its sim-to-real transfer is less effective than model-based reference trackers. Enhancing simulation fidelity or policy generalization remains an open challenge.

Conclusion

This work demonstrates a comprehensive, high-performance framework for whole-body prehensile throwing with legged robots, combining RL-based tracking, high-frequency residual correction, and real-time robust optimization. The approach achieves strong empirical results on hardware, outperforming human baselines and advancing the state of the art in dynamic whole-body manipulation. The modular design and real-time capabilities suggest broad applicability to other dynamic manipulation and loco-manipulation tasks, with future work poised to address remaining limitations in policy generalization and object modeling.

PDF Markdown

Related Papers

Tweets

https://twitter.com/RoboReading/status/1938677701398929774

https://twitter.com/philfung/status/1938779900259061970

https://twitter.com/HumanoidRTech/status/1938147069962997933

YouTube

Show All Videos