Table Tennis Stroke Recognition Using Two-Dimensional Human Pose Estimation (2104.09907v2)

Published 20 Apr 2021 in cs.CV and cs.LG

Abstract: We introduce a novel method for collecting table tennis video data and perform stroke detection and classification. A diverse dataset containing video data of 11 basic strokes obtained from 14 professional table tennis players, summing up to a total of 22111 videos has been collected using the proposed setup. The temporal convolutional neural network model developed using 2D pose estimation performs multiclass classification of these 11 table tennis strokes with a validation accuracy of 99.37%. Moreover, the neural network generalizes well over the data of a player excluded from the training and validation dataset, classifying the fresh strokes with an overall best accuracy of 98.72%. Various model architectures using machine learning and deep learning based approaches have been trained for stroke recognition and their performances have been compared and benchmarked. Inferences such as performance monitoring and stroke comparison of the players using the model have been discussed. Therefore, we are contributing to the development of a computer vision based sports analytics system for the sport of table tennis that focuses on the previously unexploited aspect of the sport i.e., a player's strokes, which is extremely insightful for performance improvement.

Citations (49)

View on Semantic Scholar

Summary

The paper presents a table tennis stroke recognition method combining 2D pose estimation with a temporal convolutional network, built upon a large custom video dataset.
The temporal convolutional network (TCN) achieved 99.37% validation accuracy and showed strong generalizability on data from an untrained player.
This research has practical implications for improving sports analytics and player training through objective, data-driven insights into stroke mechanics.

An Evaluation of Two-Dimensional Human Pose Estimation for Table Tennis Stroke Recognition

The research detailed in this paper presents a methodical investigation into the application of two-dimensional human pose estimation (2D-HPE) techniques for the recognition and classification of table tennis strokes. Acknowledging the limited advancement in computer vision applications for table tennis, this paper ventures into an uncharted area, particularly examining stroke dynamics and their implications for performance enhancement.

The authors initiated their paper with the creation of an extensive dataset consisting of 22,111 videos that capture 11 distinct stroke types performed by 14 professional players. The lack of suitable public datasets prompted the authors to develop an innovative video data collection setup that minimally interferes with gameplay. They employed a front-view camera setup mounted in a custom-designed frame ensuring unobtrusive data capture during strokes. This data collection approach also implemented automatic labeling, leveraging dual-mounted vibration sensors for accurate temporal segmentation of video data critical for capturing stroke action.

Subsequent to data collection, the research applies 2D-HPE to extract critical motion features from video frames, focusing on the player's wrist, elbow, and shoulders. This feature extraction phase cleverly bypasses the challenges posed by diverse player attributes and environmental conditions.

The cornerstone of the paper involves implementing a temporal convolutional neural network (TCN) to perform multiclass classification of the collected strokes, achieving an impressive validation accuracy of 99.37%. This TCN model demonstrated robust generalizability by accurately classifying strokes from data belonging to a player excluded from initial training. This capability highlights the model's potential for application in broader scenarios, reinforcing its utility in real-world sports analytics.

A salient aspect of the paper is the comparative analysis of various ML and deep learning (DL) model performances. Among ML approaches, the Support Vector Machine (SVM) using a radial basis function (RBF) kernel achieved the highest accuracy of 98.37%. However, it is in the deep learning domain where the TCN shines, offering not only accuracy beyond ML models but also operational efficiency with reduced inference time, highlighting the TCN's capacity to manage temporal motion data effectively.

The implications of this research are multifaceted. Practically, the proposed system could revolutionize stroke analysis and player training methodologies by offering objective, data-driven insights into stroke mechanics. This can lead to tailored training interventions, enhancing an athlete's performance through precise biomechanical feedback. In the theoretical domain, the research underscores the potential of human pose estimation integrated with temporal convolutional architectures to advance action recognition capabilities, a concept extendable to various sporting and non-sporting domains.

Future work might include fine-tuning pose estimation techniques to enhance robustness against various lighting and environmental conditions, and expanding the model's application to real-time match play scenarios for comprehensive in-game analytics. There is also the potential for integrating similar methodologies into more complex action recognition systems that include three-dimensional estimations, expanding the horizon for AI-driven sports technology.

Overall, this paper contributes significant insights into the burgeoning field of sports analytics, particularly emphasizing the role of advanced computer vision techniques in enhancing the understanding and improvement of athletic performance through intelligent actionable data.

Related Papers

YouTube

Show All Videos