THETIS Dataset: Multimodal Tennis Analysis

Updated 11 October 2025

The THETIS dataset is a multimodal resource offering controlled tennis stroke recordings in both RGB video and 3D skeleton formats for advanced deep learning and biomechanical analysis.
It employs rigorous preprocessing steps, including resizing, cropping, and data augmentation, to standardize 1,980 video sequences and 8,374 3D skeleton recordings across 12 shot classes.
Utilized in action classification with models like SlowFast networks and CNN-LSTM hybrids, THETIS achieves up to 79.17% accuracy while providing explainable AI insights for athletic coaching.

The THETIS dataset is a specialized multimodal resource that provides controlled, high-quality recordings of tennis stroke actions, supporting both deep learning-based classification and biomechanical analysis. It comprises thousands of RGB video sequences and corresponding 3D skeleton data, capturing 12 defined tennis shot classes from players of varying skill levels. As an academic benchmark, THETIS is instrumental in evaluating state-of-the-art action recognition models and in advancing explainable AI methodologies linking kinematic features to actionable coaching feedback.

1. Dataset Composition and Acquisition Protocols

THETIS includes two principal modalities:

RGB Video Sequences: The canonical set consists of 1,980 videos featuring 55 players (beginner to intermediate), with each of 12 shot classes represented by 165 samples. All actions were performed without a tennis ball and in two controlled environments (changing room, basketball court) to introduce moderate background variation. Every video is standardized by resizing the shortest side to 256 pixels, followed by a 224×224 crop and temporal normalization to 64 frames per clip.
3D Skeleton Data: In expanded configurations comprising 8,374 sequences, THETIS incorporates temporally resolved 3D joint coordinates from both novice and expert players, enabling extraction of biomechanical descriptors. These skeleton representations encode sufficient detail for joint angle calculation, kinematic chain analysis, and power estimation via motion-derived features.

Classes are equally distributed, and the controlled setting expedites model benchmarking and transfer learning, albeit at the cost of reduced ecological validity due to the absence of ball interactions and authentic court spatial constraints.

2. Preprocessing and Data Augmentation

Preprocessing is strictly defined: each video is resized as described above and cropped spatially, with frame sampling or truncation to ensure temporal consistency. Data augmentation is reported as being employed but not enumerated; typical augmentative procedures in this context include stochastic cropping, horizontal flipping, and intensity jittering. Regularization strategies employed on splits (70% train, 20% validation, 10% test) include dropout, early stopping, and weight decay, mitigating overfitting given the relatively small class count and sample size.

3. Utilization in Action Classification

THETIS underpins various deep learning architectures, notably:

SlowFast Networks: As employed and documented, SlowFast is a dual-pathway convolutional model. The "Slow" stream samples frames with stride τ = 16 to capture static spatial semantics, while the "Fast" stream processes 8× denser temporal samples (α = 8) with a channel ratio β = 1/8, isolating rapid motion cues. Both pathways are built atop inflated ResNet blocks; feature maps from the Fast stream are transformed (3D convolution kernel 1×1×5) and fused via lateral connections into the Slow stream. Final classification is produced after global average pooling and feature concatenation.
CNN-LSTM Hybrid Models: The dataset’s 3D skeleton modality enables sequence modeling via CNN-LSTM architectures. EfficientNet-B0 is utilized for frame-wise spatial feature extraction (1280D), with temporal dependencies captured by a two-layer LSTM (512 units, dropout 0.4). Output hidden states feed a linear classifier mapping to shot labels.

These architectures, tested on THETIS, establish performance baselines: SlowFast 4×16 achieves 74% test accuracy using ensemble evaluation strategies; the CNN-LSTM approach reaches 79.17%, demonstrating improved fine-grained stroke discrimination when leveraging skeleton-based descriptors.

4. Biomechanical Feature Extraction and Feedback Generation

THETIS’s 3D data supports comprehensive kinematic and kinetic analyses:

Joint Angles: Computed via three-point methods, notably $\theta = \cos^{-1} \left( \frac{u \cdot v}{\|u\|\|v\|} \right)$ for angle estimation between bone vectors.
Limb Velocities: Approximated with finite differences, $v(t) = \frac{p(t+1) - p(t-1)}{2A_t}$ , affording temporal profiling of stroke segments.
Kinetic Chain Patterns: Assessed by segmental sequencing (proximal-distal activation), peak angular velocities, and moment-specific metrics (e.g., trunk rotation via $\theta(t) = \arctan2(\text{shoulder\_vector}_y, \text{shoulder\_vector}_x)$ ).
Power Metrics: Racket velocities and inferred kinetic energy ( $KE = \frac{1}{2} m v^2$ ) are extracted for impact and injury risk analysis.

These biomechanical features are compiled into structured outputs enabling deterministic, rule-based comparison against expert-defined reference ranges. Downstream, LLMs (e.g., GPT-4) synthesize diagnostic and corrective feedback in a coach-mimetic format, such as overall scores, condensed diagnostic summaries, and actionable technique corrections. This process ensures that machine-derived insights are both interpretable and operationally relevant in training contexts.

5. Experimental Results and Error Analysis

Classification metrics such as accuracy, precision, recall, and F1-score were calculated per standard formula (e.g., accuracy: $\frac{TP+TN}{TP+FP+FN+TN} \times 100\%$ ). The extended error analysis leveraged confusion matrices to identify prevalent misclassification categories ("serve confusion," "slice/volley confusion," "errors due to beginner execution"). Serve misidentifications comprised 44.4% of errors. Insufficient contextual information (no ball trajectory, absent court lines) resulted in ambiguities for strokes with subtle kinematic differences.

Statistical biomechanical analysis revealed significant, interpretable distinctions between expert and beginner strokes (e.g., higher peak racket velocities in expert executions). Qualitative evaluation of LLM-generated language feedback corroborated its acceptability and actionability according to experts and coaches.

6. Limitations and Recommended Advancements

The principal limitations of THETIS relate to the controlled recording environment: absence of a tennis ball, lack of realistic court spatiality, and stages filmed predominantly in close-up introduce challenges in recognizing context-dependent action classes. Absence of ball dynamics precludes identification of shots reliant on trajectory cues. A plausible implication is that future datasets should encompass in-game recordings—capturing real court geometry, ball trajectory, and player positions—to enhance ecological validity and facilitate disambiguation among similar actions.

Augmentation with additional sensor modalities, such as depth imaging or full skeletal tracking, is advocated to strengthen the resolution of spatio-temporal features, particularly for subtle shot distinctions. Such developments would directly benefit both the statistical rigor of match analysis and future benchmarking of tracking research.

7. Impact and Research Applications

THETIS serves as a foundational benchmark for deep learning in tennis action recognition, facilitating evaluation of convolutional and sequential models in realistic sports scenarios. It enables the linkage of low-level kinematic data to interpretable, expert-aligned feedback, bridging a gap between technical action classification and biomechanical sports analytics. The dataset’s dual modalties (RGB and skeleton) support advances in explainable AI for biomechanics, fostering the development of transparent, actionable feedback systems for athlete training and injury prevention. Publications such as "Classification of Tennis Actions Using Deep Learning" (Hovad et al., 4 Feb 2024) and "Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition" (Dashore et al., 4 Oct 2025) have exploited THETIS for both benchmarking and as an enabling framework for integrated biomechanical and linguistic feedback in tennis stroke analysis.

PDF Markdown Chat (Pro)

References (2)

Classification of Tennis Actions Using Deep Learning (2024)

Talking Tennis: Language Feedback from 3D Biomechanical Action Recognition (2025)

Follow Topic

Get notified by email when new papers are published related to THETIS Dataset.