Hierarchical Framework for Humanoid Table Tennis
- The paper introduces a hierarchical framework that decouples long-horizon ball trajectory prediction from rapid, RL-based whole-body control.
- It achieves precise ball prediction with errors below 7.5 cm and timing discrepancies under 20 ms, ensuring agile, human-like rallies.
- The system leverages imitation learning from human motion references to orchestrate coordinated swings and maintain overall stability.
A hierarchical framework for humanoid table tennis provides a layered control and decision structure that addresses the extreme dynamic demands of the sport—integrating sensory perception, rapid motion planning, and whole-body actuation. This approach enables humanoid robots to execute agile, human-like rallies with sub-second response, leveraging both model-based trajectory planning and learned whole-body behaviors. It separates high-level predictive planning from low-level motion generation, thereby combining long-horizon reasoning and rapid adaptive control within a modular system.
1. Architectural Overview of the Hierarchical Framework
The core structure consists of two principal layers:
- Model-Based Planner (High Level): This module handles real-time prediction of the incoming ball trajectory and plans the striking strategy. It processes ball position data (using motion capture or vision), fits a local polynomial to infer current position and velocity, then forwards the ball’s flight using a hybrid dynamics model accounting for both flight and table bounce. A virtual hitting plane (e.g., m) is defined where the planner specifies the desired strike position, timing, and target racket velocity. Essential formulas include:
- Free-flight: , with as drag coefficient and gravity.
- Table bounce: , , / are restitution coefficients.
- Racket-ball contact: (where is landing target and is racket position).
- Desired racket velocity is interpolated using the coefficient of restitution along the racket normal.
- Reinforcement Learning (RL)-Based Whole-Body Controller (Low Level): This component receives, at each timestep, the planned striking state and time until impact, along with proprioceptive and exteroceptive sensor input. It outputs coordinated joint-level torque or position commands for the arms, legs, and base, enabling agile movement and rapid arm swings. The controller is trained with Proximal Policy Optimization (PPO), using both task rewards (related to striking accuracy) and motion imitation losses against human-motion references extracted via the SMPL model. This results in human-like, functional table tennis swings while preserving whole-body stability and agility.
This architecture modularizes planning from real-time execution, ensuring low latency in high-level ball prediction, while the RL-based controller pre-learns agility and balance in fast, learned feedback loops (Su et al., 28 Aug 2025).
2. Ball Trajectory Prediction and Striking Target Planning
The model-based planner is responsible for the following sequence:
- Ball trajectory smoothing and velocity estimation: A quadratic fit to the most recent batch of ball positions yields position, velocity, and acceleration estimates.
- Hybrid ball flight/bounce model: During flight, the ball’s acceleration accounts for both linear drag and gravity; at bounces, a restitution matrix flips and scales vertical and horizontal components, reflecting energy loss. The system iteratively integrates the flight phase, predicts table impact, applies bounce dynamics, and continues until the ball’s trajectory reaches the hitting plane.
- Striking target computation: The virtual hitting plane defines and corresponding planned hitting pose. The planner also computes a target for the base so that the humanoid’s center of mass and stance can be adjusted accordingly.
- Racket-ball contact modeling: Given the predicted incoming velocity , outgoing velocity (planned to achieve a specified landing point), and coefficient of restitution along the racket normal, the desired racket velocity is computed by solving for the velocity at impact that interpolates between and according to , formulated as:
where is the normalized vector along .
The separation of prediction/planning and execution ensures that planning errors (in timing or position) can be constrained to less than the racket radius ( cm) and $20$ ms timing error within $0.3$ s of impact.
3. Whole-Body RL-Based Control with Human Motion References
The RL-based whole-body controller (WBC) is trained to realize the high-level planner's commands using the robot’s full kinematic chain. The observation vector comprises:
- The robot's base pose and joint states (positions, velocities; 29-DOF in the described implementation).
- The predicted racket striking state (, from the planner).
- Time to strike ().
- Additional inertial and gravity cues.
The reward function is multi-component:
- Task reward: Maximizes hitting success (contact with ball at target velocity/position).
- Imitation loss: Encourages similarity with retargeted human motion primitives (swing trajectories reconstructed from video/SMPL), facilitating human-like style and coordination.
The WBC is pre-trained in simulation and can rapidly execute lateral stepping, coordinated rotations, and arm swings. During real execution, it achieves sub-second reaction, with typical base displacements up to $0.75$ m completed in under $0.8$ s, and full striking sequences executed with s from perception to action.
4. Integration, Stability, and System Performance
The hierarchical design allows the robot to react adaptively to fast, unpredictable rallies (ball speeds m/s), sustaining up to 106 consecutive shots with a human opponent and similar results against other humanoid robots. System performance is robust with:
- Ball prediction error cm (racket radius) at 0.5 s pre-impact.
- Strike timing error 20 ms at 0.3 s pre-impact, matching a control step.
- Whole-body stabilization mechanisms, leveraging rapid foot placement and full-body balance, enable repeated strikes even under adversarial ball placement.
The modular design allows independent advancement: planner improvements (e.g., richer striking strategies) and controller retraining (e.g., new motion references or hardware upgrades) do not require end-to-end re-engineering.
5. Use of Human Motion References and Naturalness
Two key human swings (forehand, backhand) are captured from video and reconstructed in SMPL parameterization. These are retargeted to the robot and used both to guide the RL policy via imitation reward and to provide reference tracking trajectories for the arms, waist, and base. This enables the robot to exhibit human-like swinging mechanics, coordinated waist rotation, and leg agility—key for realistic, strategically robust performance.
The system’s capacity to imitate motion references remains broad; additional or more diverse motion references may be incorporated to expand the skillset beyond baseline forehand/backhand to more subtle or expert strategies.
6. Extensions, Limitations, and Future Directions
Future developments highlighted in the framework include:
- Extending striking strategies: Relaxing the constraint of a fixed virtual hitting plane and enabling richer strike selection may enable handling of more complex plays and aspect-ratio-limited scenarios.
- Sensing improvements: Transitioning from external motion capture to vision-based ball tracking would improve robustness and field deployment potential.
- Spin and advanced skill handling: The planner and WBC can be extended to generate or counter ball spin by augmenting reward structure and motion reference set.
- Multi-agent training: Introducing adversarial humanoid opponents or agent–agent self-play would foster the emergence of tactics and explicit high-level strategy, enabling further convergence toward expert-level play.
- Generalization: The architecture is not limited to table tennis; the hierarchical interplay of planning and agile RL-based control can be ported to other dynamic, interactive humanoid manipulation domains.
7. Significance and Impact
The HITTER framework demonstrates that general-purpose humanoid robots can interactively and reactively compete in high-speed, dynamic sports, where both trajectory planning and agile, whole-body execution are essential (Su et al., 28 Aug 2025). This marks a substantial advancement in closing the gap between human and robot performance in interactive manipulation, and suggests a pathway to more generalizable and robust humanoid autonomy for complex, fast-paced tasks. The modularity of the hierarchical approach—dividing reasoning and execution, exploiting human demonstration for stylistic naturalness, and supporting rapid online reaction—forms a clear template for applied humanoid learning systems in embodied AI and robotics.