- The paper presents a hierarchical system that combines a model-based planner and a reinforcement learning-based whole-body controller for human-like table tennis play.
- The approach uses polynomial ball trajectory prediction and a hybrid dynamics model to achieve sub-centimeter position accuracy and rapid striking with precise timing.
- Empirical results demonstrate a high hit rate and up to 106 consecutive shots, validating the system’s robust performance in dynamic, real-world matches.
HITTER: Hierarchical Planning and Learning for Humanoid Table Tennis
Introduction
The paper presents HITTER, a hierarchical system for enabling general-purpose humanoid robots to play table tennis with high agility and human-like motion. The approach integrates a model-based planner for ball trajectory prediction and strike planning with a reinforcement learning (RL)–based whole-body controller (WBC) trained on human motion references. The system is validated on the Unitree G1 humanoid, achieving up to 106 consecutive shots against a human opponent and demonstrating sustained rallies in fully autonomous humanoid–humanoid matches. The work addresses the challenge of rapid perception-action loops, coordinated whole-body control, and naturalistic striking in highly dynamic environments.
Figure 1: System overview, showing the hardware setup, motion capture, model-based planning, and RL-based whole-body control pipeline.
Hierarchical System Architecture
The system is modularized into two principal components:
- Model-Based Planner: Operates at high frequency (360 Hz) using motion capture data to estimate ball position and velocity via polynomial fitting. It predicts the ball’s future trajectory using a hybrid dynamics model, accounting for aerodynamic drag and bounce restitution. The planner computes the desired racket striking position, velocity, and timing, as well as the robot base target position, which are passed to the WBC.
- Learning-Based Whole-Body Controller (WBC): Trained in Isaac Lab using PPO, the WBC receives planner outputs and proprioceptive observations, generating joint position commands for all 29 degrees of freedom at 50 Hz. The policy is trained with dense and sparse rewards for imitation and goal tracking, using human forehand and backhand reference motions retargeted to the robot via SMPL and GMR pipelines.
This separation of planning and control improves sample efficiency, robustness to perception errors, and adaptability to real-world conditions.
Model-Based Planning and Ball Prediction
The planner uses a second-order polynomial fit for velocity estimation and a hybrid flight-bounce model for trajectory prediction. Parameters for drag and restitution are empirically identified from recorded trajectories. The planner achieves sub-centimeter position error and sub-20 ms timing error within 0.5 s of the strike, providing reliable commands for the WBC.
Figure 2: Prediction errors of the model-based planner for striking position and time, demonstrating sub-racket-radius accuracy within 0.5 s of impact.
Racket-ball interaction is modeled with a simplified restitution-based approach, neglecting spin and tangential friction. The desired outgoing ball velocity is computed to target the center of the opponent’s side, and the required racket velocity is derived analytically.
Whole-Body Controller and Human-Like Motion
The WBC is trained with asymmetric actor-critic architecture, where the critic receives privileged information (body poses, time left, reference motions) to improve return estimation. The policy tracks separate commands for base and racket, enabling rapid lateral movement and coordinated arm swings. Episodes are structured to allow consecutive strikes and randomization of swing type and target positions.
Human motion references are processed and retargeted to the robot, with interpolation and kinematic augmentation for accurate tracking. The reward function combines imitation, goal tracking, and regularization, with sparse activation for critical strike moments.
Figure 3: Agility evaluation of the WBC policy, showing sub-0.8 s reaching times for initial distances within 0.75 m and a 94.3% success rate in simulation.
Figure 4: Real-world rapid reaching motion, illustrating the robot’s ability to transition swiftly across the table and maintain balance during strikes.
Figure 5: Real-world human-like striking motion, with coordinated waist rotation and arm movement mimicking human table tennis play.
Empirical Results
The integrated system demonstrates high performance in real-world experiments:
System Design Implications and Limitations
The hierarchical combination of model-based planning and learning-based control leverages the strengths of both paradigms. The planner provides reliable, interpretable commands, while the WBC enables agile, human-like motion. This modularity allows independent evaluation and improvement of each component.
Limitations include reliance on a fixed virtual hitting plane, external motion capture for ball and robot pose estimation, and neglect of spin and advanced stroke repertoire. These constraints limit table coverage, deployment flexibility, and performance against skilled opponents.
Future Directions
Potential avenues for advancement include:
- Vision-Based Sensing: Replacing motion capture with onboard vision for ball and robot pose estimation.
- Spin Perception and Stroke Diversity: Extending the system to handle spin and generate a wider range of strokes.
- Multi-Agent Training: Joint training of policies for competitive humanoid–humanoid matches.
- Autonomous Serving: Enabling robots to initiate rallies without human intervention.
- Opponent Modeling: Integrating strategic and tactical learning to adapt to skilled human opponents.
Conclusion
HITTER demonstrates that hierarchical planning and learning can enable general-purpose humanoid robots to play table tennis with high agility, precision, and human-like motion. The system achieves robust real-world performance, including long rallies and autonomous matches, marking a significant step toward interactive, agile humanoid behaviors. Future work will focus on expanding perceptual capabilities, stroke repertoire, and competitive adaptation to approach championship-level play.