TTHQ Dataset for Table Tennis Analysis
- TTHQ Dataset is a curated benchmark comprising high-resolution, professionally annotated table tennis videos with precise 2D ball positions and table keypoints.
- The dataset supports robust 2D detection, 3D trajectory uplifting, and spin classification via a multi-stage pipeline using real match and highlight reel recordings.
- It presents real-world challenges like occlusion, motion blur, and lighting variations, driving innovative research in sports video analysis and computer vision.
TTHQ Dataset
The TTHQ dataset, as introduced in "Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation," is a curated benchmark tailored for monocular table tennis analysis. It features high-resolution, professionally annotated recordings from real matches and highlight reels, supporting supervised learning for 2D object localization, as well as weakly supervised spin classification and enabling benchmarking for 3D trajectory and spin estimation via a two-stage uplifting pipeline. The dataset is constructed specifically to address the gap in real-world, physically consistent 2D and 3D ground truth for table tennis research in computer vision, perception, and real-time sports analytics (Kienzle et al., 25 Nov 2025).
1. Source Material and Dataset Scope
TTHQ comprises 19 source videos, encompassing 14 full matches (drawn from professional, semi-professional, and amateur games) and 5 highlight reels constructed by concatenating rallies from various matches. All videos are standard broadcast or semi-broadcast YouTube content, captured at 1920 × 1080 resolution with a static camera per video. Frame rates range from 25 to 60 FPS, with precise values recorded in the dataset manifest.
Across the corpus, 9,092 frames are manually annotated with 2D ball positions. 257 frames are annotated with table geometry, capturing 13 carefully defined keypoints—the four corners, four side midpoints, and five net-related coordinates. Additionally, 57 distinct rallies (complete sequences of ball movement, serving and returning, bounded by rallies) are labeled by spin type (binary: topspin or backspin).
2. Annotation Protocol and Formats
Annotations in TTHQ are precise and reproducible, with the following conventions:
- For each annotated frame , the table tennis ball center is recorded as , in pixel coordinates on the canonical 1920 × 1080 canvas.
- For each table-annotated frame, 13 keypoints (enumerated as ) are stored as .
- Spin labels are assigned per rally, , based on manual inspection of the ball's rotation relative to its flight direction.
- Temporal data is provided as both integer frame index and continuous timestamp in seconds; metadata CSVs associate these indices, positions, and keypoint visibility for rapid lookup.
Annotation files are distributed in CSV and JSON: each video includes a manifest with indexing and timestamps, ball and keypoint coordinates per frame, and rally-level spin annotations.
3. Data Collection, Quality Assurance, and Inter-Annotator Agreement
All source videos are obtained from public YouTube channels under research-oriented fair use guidelines and are downloaded in their original resolution and frame rate. Manual annotation is performed with a custom heatmap-based tool by trained human annotators. Spin classification is executed via visual inspection of each rally's spatial and rotational dynamics.
Quality control implements a dual-annotator scheme for a 10% randomly selected subset (900 frames). For these, both annotators independently label all ball and keypoint locations; mean Euclidean error is computed: yielding px, px over frames. Any instance with disagreement exceeding 5 px triggers arbitration and correction by a senior annotator.
4. Dataset Splits and Real-World Challenges
Splitting is performed at the video level to minimize information leakage. The training set comprises 16 videos (7,600 ball-labeled frames, 214 keypoint frames, 47 spin-labeled rallies), while the validation/test set is composed of 3 videos (1,492 ball frames, 43 keypoint frames, 10 spin-labeled rallies).
A salient attribute of TTHQ is the rigorous inclusion and documentation of challenging scenarios:
- 25% of annotated frames exhibit ball occlusion (e.g., by a player's hand or racquet),
- 15% suffer from severe motion blur,
- 10% are collected in suboptimal lighting (dark or over-exposed),
- Highlight reels are specifically selected to maximize the diversity of such artifact-laden conditions.
5. Benchmarked Tasks and Evaluation Metrics
TTHQ is designed to operationalize the canonical structure of monocular table tennis analysis pipelines:
- Front-End (2D Detection):
- Ball detection: predict per frame.
- Table keypoint detection: estimate all 13 .
- Metric: —fraction of predictions within pixels of ground truth. Reported at 2 px, 5 px, and 10 px thresholds.
- Back-End (2D-to-3D Uplifting):
- Inputs: filtered 2D ball trajectory and a single table-geometry keypoint set.
- Outputs: 3D trajectory in real-world coordinates, initial spin vector .
- Reprojection metric (mean 2D Reprojection Error):
where is the camera projection matrix. Final reported score is the mean over all test rallies: .
Spin Classification:
- Metrics: accuracy (ACC) and macro-averaged F1 score ().
6. Access, Organization, and Camera Calibration
The dataset, including code and formatted annotations, is publicly available at https://kiedani.github.io/WACV2026/. Organization is as follows:
| Type | File Format | Contents |
|---|---|---|
| Frame/video manifest | JSON | Frame indices, timestamps, annotation presence |
| Ball/keypoints | CSV | Columns: frame, , positions |
| Spin labels | CSV | Rally ID, spin class (topspin/backspin) |
Camera calibration is reproducible for each test sequence; a homography or projection matrix can be constructed from the 13 annotated keypoints and the known metric dimensions of the standard table tennis court.
7. Applications and Research Use
TTHQ enables comprehensive benchmarking and research into:
- Robust 2D detection models trained on data with real-world artefacts,
- Physically consistent monocular 2D-to-3D ball trajectory uplifting,
- Automated spin estimation and classification,
- Comparative assessment of perception and reasoning modules under visibility, lighting, and motion degradation.
The dataset's diversity and precision position it as a standard for evaluating multi-stage perception systems in sports video analysis and for developing robust, real-world table tennis analytics (Kienzle et al., 25 Nov 2025).