Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TTNet: Real-time temporal and spatial video analysis of table tennis (2004.09927v1)

Published 21 Apr 2020 in cs.CV

Abstract: We present a neural network TTNet aimed at real-time processing of high-resolution table tennis videos, providing both temporal (events spotting) and spatial (ball detection and semantic segmentation) data. This approach gives core information for reasoning score updates by an auto-referee system. We also publish a multi-task dataset OpenTTGames with videos of table tennis games in 120 fps labeled with events, semantic segmentation masks, and ball coordinates for evaluation of multi-task approaches, primarily oriented on spotting of quick events and small objects tracking. TTNet demonstrated 97.0% accuracy in game events spotting along with 2 pixels RMSE in ball detection with 97.5% accuracy on the test part of the presented dataset. The proposed network allows the processing of downscaled full HD videos with inference time below 6 ms per input tensor on a machine with a single consumer-grade GPU. Thus, we are contributing to the development of real-time multi-task deep learning applications and presenting approach, which is potentially capable of substituting manual data collection by sports scouts, providing support for referees' decision-making, and gathering extra information about the game process.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Roman Voeikov (1 paper)
  2. Nikolay Falaleev (3 papers)
  3. Ruslan Baikulov (3 papers)
Citations (67)

Summary

  • The paper presents TTNet, a novel neural network for real-time multi-task analysis of high-resolution table tennis videos, integrating event spotting, precise ball detection, and semantic segmentation.
  • TTNet achieves high accuracy in tasks such as multi-stage ball detection (97.5% accuracy, 2 pixel RMSE) and event spotting (97.0% accuracy for bounces and net hits) while operating within sub-6 ms inference times on standard hardware.
  • The research includes the public release of OpenTTGames, a specialized dataset with annotated high frame rate table tennis videos to support community development in sports video analysis.

Real-time Video Analysis for Table Tennis with TTNet

The research outlined in the paper presents TTNet, a novel neural network architecture designed for analyzing high-resolution table tennis videos in real-time. TTNet addresses complex challenges inherent in sports video analysis, specifically the demands for temporal event spotting, precise spatial object detection, and semantic segmentation within the constraints of real-time processing. This paper makes noteworthy contributions to the field of sports analytics through its robust methodology and the public release of a specialized dataset, OpenTTGames.

Methodological Overview

TTNet focuses on three primary tasks: event spotting, ball detection, and semantic segmentation. The architecture is structured to operate on downscaled full HD video input, efficiently managing the processing within the constraints of real-time computation using consumer-grade hardware (a single NVIDIA RTX 2080Ti).

The network employs a multi-stage approach to ball detection, leveraging both global and local feature analysis. The global detector processes downscaled images to locate the ball position with a resolution sufficient to approximate the region of interest. To refine this detection, local crops from the full-resolution images are fed into a secondary detection stage aimed at precise localization—achieving an impressive accuracy of 97.5% with a RMSE of 2 pixels.

TTNet also integrates event detection, capable of identifying rapid game actions such as ball bounces and net hits with 97.0% accuracy. This capability is crucial for automated referee systems tasked with maintaining accurate scoring and game states.

Semantic segmentation, handling multiple classes (humans, table, and scoreboard), underpins the spatial understanding required to discern critical interactions in the game environment. The segmentation approach, supported by convolutional encoder-decoder structures, achieves competitive intersection-over-union (IoU) results.

Dataset Description

Acknowledging the scarcity of public datasets suitable for multi-task sports video analysis, the authors introduce OpenTTGames—comprising high frame rate videos of table tennis matches, annotated for events, semantic segmentation masks, and ball coordinates. This dataset facilitates the evaluation and development of models tailored to the nuanced demands of rapid sports analytics, particularly in tracking swift events and small objects.

Evaluation and Results

TTNet is rigorously assessed against standardized metrics: accuracy for ball presence detection, RMSE for ball position, and IoU for segmentation maps. The adaptive loss balancing adopted in training—a technique leveraging homoscedastic uncertainty—has proven effective in harmonizing the learning dynamics across multiple tasks. The architecture maintains sub-6 ms inference times, demonstrating both efficacy and efficiency in practical real-time applications.

The paper also recognizes potential further applications in sports analytics, predicting expansions into automated scouting and enhanced decision support systems for referees. The multi-task modality of TTNet lays the groundwork for comprehensive game analysis frameworks that can potentially extend to other sports requiring real-time analytics.

Future Prospects

The implications of TTNet's application extend beyond mere automation; this framework enables precise, scalable, and rapid analysis pertinent to increasingly data-driven sports environments. Future research may explore extensions into more diverse sports contexts, optimize detection methodologies for even smaller objects, or expand real-time multi-task processing capabilities.

In conclusion, TTNet represents a significant advance in sports video analysis, offering a robust model for the automated processing of high-resolution, high-frame-rate table tennis video data. The release of the OpenTTGames dataset further supports the community's continual development in sports analytics.

Youtube Logo Streamline Icon: https://streamlinehq.com