Optimal Stroke Learning with Policy Gradient Approach for Robotic Table Tennis (2109.03100v2)

Published 7 Sep 2021 in cs.RO

Abstract: Learning to play table tennis is a challenging task for robots, as a wide variety of strokes required. Recent advances have shown that deep Reinforcement Learning (RL) is able to successfully learn the optimal actions in a simulated environment. However, the applicability of RL in real scenarios remains limited due to the high exploration effort. In this work, we propose a realistic simulation environment in which multiple models are built for the dynamics of the ball and the kinematics of the robot. Instead of training an end-to-end RL model, a novel policy gradient approach with TD3 backbone is proposed to learn the racket strokes based on the predicted state of the ball at the hitting time. In the experiments, we show that the proposed approach significantly outperforms the existing RL methods in simulation. Furthermore, to cross the domain from simulation to reality, we adopt an efficient retraining method and test it in three real scenarios. The resulting success rate is 98% and the distance error is around 24.9 cm. The total training time is about 1.5 hours.

Citations (11)

View on Semantic Scholar

Summary

The paper demonstrates a two-stage reinforcement learning approach that predicts ball impact state and optimizes racket strokes using policy gradients and a TD3 backbone.
It leverages realistic simulation tools like Gazebo, ROS, OpenAI Gym, and Spinning Up to effectively close the sim-to-real gap.
Experiments report a 98% success rate with a 24.9 cm distance error and convergence in roughly 30 epochs, underscoring efficient learning performance.

Optimal Stroke Learning with Policy Gradient Approach for Robotic Table Tennis

This paper addresses the inherently complex problem of robotic table tennis, focusing on learning optimal strokes through an innovative use of reinforcement learning (RL). Despite recent advancements in the application of deep RL models in simulated environments, the challenge of translating these models to real-world scenarios persists, primarily due to high exploratory demands. The authors tackle this challenge by developing a realistic simulation and utilizing a policy gradient approach with a Twin Delayed DDPG (TD3) backbone to enhance the learning of racket strokes based on predictions of the ball state at impact time.

The paper introduces a two-stage RL approach, distinct from end-to-end models, to optimize the robot's strokes. Initially, the state of the ball at the hitting time, including position, velocity, and spin, is predicted using existing ball-flight prediction algorithms. Following this prediction, a novel policy gradient method is employed to determine the optimal stroke. The proposed approach integrates realistic simulation frameworks such as the Gazebo simulator, ROS, OpenAI Gym, and Spinning Up. This integration is critical in closing the sim-to-real gap, a prominent issue in transferring policies from simulation to real-world applications.

The introduction of a 3D $Q$ -value function and a corresponding reward vector stands out as substantial contributions, allowing multi-dimensional reward expressions that better capture interactions within the robotic environment compared to traditional 1D rewards. This approach facilitates efficient learning, with the model converging after approximately 30 epochs, requiring only 1.5 hours of training time.

In experimental evaluation, the proposed method significantly outperforms existing RL algorithms such as TRPO, PPO, and SAC in simulations, yielding a success rate of 98% with a distance error of approximately 24.9 cm. Notably, retraining methods employed in real scenarios exhibit strong results, achieving efficient policy transfer from simulation to reality within a limited retraining period, maintaining a high level of performance across varied testing conditions.

The implications of this research are notable. Practically, it outlines an efficient framework for training robotic systems in tasks requiring rapid response and precision, such as in sports or in industrial applications involving robotic manipulation and interaction. Theoretically, the integration of realistic simulations into RL workflows presents a valuable paradigm in reducing the barriers for real-world applications of complex RL models.

Future directions could focus on extending this framework to other robotic sports or enhancing the existing setup to accommodate more complex maneuvers or higher velocity interactions. Additionally, optimizing the system for more aggressive play could further close the gap between robotic and human performance in table tennis, paving the way for more autonomous and adaptable robotic systems in diverse environments.

PDF Markdown

Related Papers

Tweets

https://twitter.com/jgvfwstone/status/1843658699564560433

YouTube

Show All Videos