- The paper presents a table tennis stroke recognition method combining 2D pose estimation with a temporal convolutional network, built upon a large custom video dataset.
- The temporal convolutional network (TCN) achieved 99.37% validation accuracy and showed strong generalizability on data from an untrained player.
- This research has practical implications for improving sports analytics and player training through objective, data-driven insights into stroke mechanics.
An Evaluation of Two-Dimensional Human Pose Estimation for Table Tennis Stroke Recognition
The research detailed in this paper presents a methodical investigation into the application of two-dimensional human pose estimation (2D-HPE) techniques for the recognition and classification of table tennis strokes. Acknowledging the limited advancement in computer vision applications for table tennis, this paper ventures into an uncharted area, particularly examining stroke dynamics and their implications for performance enhancement.
The authors initiated their paper with the creation of an extensive dataset consisting of 22,111 videos that capture 11 distinct stroke types performed by 14 professional players. The lack of suitable public datasets prompted the authors to develop an innovative video data collection setup that minimally interferes with gameplay. They employed a front-view camera setup mounted in a custom-designed frame ensuring unobtrusive data capture during strokes. This data collection approach also implemented automatic labeling, leveraging dual-mounted vibration sensors for accurate temporal segmentation of video data critical for capturing stroke action.
Subsequent to data collection, the research applies 2D-HPE to extract critical motion features from video frames, focusing on the player's wrist, elbow, and shoulders. This feature extraction phase cleverly bypasses the challenges posed by diverse player attributes and environmental conditions.
The cornerstone of the paper involves implementing a temporal convolutional neural network (TCN) to perform multiclass classification of the collected strokes, achieving an impressive validation accuracy of 99.37%. This TCN model demonstrated robust generalizability by accurately classifying strokes from data belonging to a player excluded from initial training. This capability highlights the model's potential for application in broader scenarios, reinforcing its utility in real-world sports analytics.
A salient aspect of the paper is the comparative analysis of various ML and deep learning (DL) model performances. Among ML approaches, the Support Vector Machine (SVM) using a radial basis function (RBF) kernel achieved the highest accuracy of 98.37%. However, it is in the deep learning domain where the TCN shines, offering not only accuracy beyond ML models but also operational efficiency with reduced inference time, highlighting the TCN's capacity to manage temporal motion data effectively.
The implications of this research are multifaceted. Practically, the proposed system could revolutionize stroke analysis and player training methodologies by offering objective, data-driven insights into stroke mechanics. This can lead to tailored training interventions, enhancing an athlete's performance through precise biomechanical feedback. In the theoretical domain, the research underscores the potential of human pose estimation integrated with temporal convolutional architectures to advance action recognition capabilities, a concept extendable to various sporting and non-sporting domains.
Future work might include fine-tuning pose estimation techniques to enhance robustness against various lighting and environmental conditions, and expanding the model's application to real-time match play scenarios for comprehensive in-game analytics. There is also the potential for integrating similar methodologies into more complex action recognition systems that include three-dimensional estimations, expanding the horizon for AI-driven sports technology.
Overall, this paper contributes significant insights into the burgeoning field of sports analytics, particularly emphasizing the role of advanced computer vision techniques in enhancing the understanding and improvement of athletic performance through intelligent actionable data.