Papers
Topics
Authors
Recent
Search
2000 character limit reached

THETIS RGB: Tennis Action Classification Dataset

Updated 22 April 2026
  • THETIS RGB is a balanced video dataset with 1980 clips uniformly distributed over 12 tennis stroke classes, supporting fine-grained action recognition.
  • The dataset was recorded in controlled indoor environments with a fixed viewpoint and standardized preprocessing, ensuring consistency in video quality.
  • The dataset’s controlled settings, lack of ball tracking, and limited environmental diversity highlight challenges for real-world application and context-rich analyses.

The THETIS RGB dataset is a publicly available academic video dataset designed for research in tennis action classification using deep learning. Developed as part of the study "Classification of Tennis Actions Using Deep Learning" by Hovad et al. (Hovad et al., 2024), it addresses the need for structured and balanced datasets tailored to fine-grained recognition of tennis stroke sub-types. THETIS RGB comprises 1980 short video clips, each showing a single tennis shot performed by amateur/intermediate-level players, capturing a broad taxonomy of tennis actions across 12 well-defined classes. The dataset provides a rigorously controlled environment for benchmarking models but exhibits several inherent limitations stemming from its collection protocol and annotation scheme.

1. Dataset Construction and Scope

THETIS RGB consists of 1980 video clips, each capturing exactly one tennis shot performed in controlled laboratory settings. Clip durations range from 2 to 5 seconds, with each clip filmed such that it contains a single, complete action from start to finish. No total frame count per clip is specified in the original documentation.

Filming took place in two artificial indoor environments: a changing-room with a mirror and a basketball court. All videos were captured from a single fixed viewpoint, with the player facing the camera throughout the recording. Only intermediate/amateur players participated, and each performed each shot type three times, yielding uniform representation across players and classes.

2. Action Classes and Taxonomy

THETIS RGB defines 12 action classes corresponding to tennis stroke sub-types, distributed as follows:

Class Videos per Class
backhand 165
backhand 2 hands 165
backhand slice 165
backhand volley 165
flat service 165
forehand flat 165
forehand open-stands 165
forehand slice 165
forehand volley 165
kick service 165
slice service 165
smash 165

Each class is populated with exactly 165 video clips, derived from 55 players each repeating every action three times. The taxonomy includes three types of serve (flat, kick, slice), four forehand variants (flat, open-stance, slice, volley), four backhand variants (single-handed, two-handed, slice, volley), and smash. Although no formal textual definitions are provided, the classes are operationalized based on distinctions in racquet grip, racquet-head angle, body posture, and typical swing path.

3. Data Preprocessing and Partitioning

For deep learning workflows, each video in THETIS RGB is spatially rescaled so that its shortest side is 256 pixels, followed by a center crop to 224×224224 \times 224 pixels. Temporally, each video is clipped or uniformly sampled to produce a 64-frame input per clip. The dataset is partitioned into training, validation, and test splits at a fixed ratio of 70 : 20 : 10, giving 1386, 396, and 198 clips, respectively.

Specific details regarding native video resolution and capture frame rate are not specified in the source publication.

4. Annotation Protocol and Metadata

Labeling for THETIS RGB is provided exclusively at the video level; every video is annotated with its corresponding shot-type class. No additional annotations are given, such as frame-by-frame temporal boundaries, bounding boxes, skeletal keypoints, or ball/player tracking. Each action was performed without a ball, so cues such as ball trajectory, bounce, or spin are absent. The annotation process did not include formally defined class definitions, relying instead on standard coaching distinctions and visual-action characteristics.

5. Statistical Characteristics

The dataset is strictly balanced, with the number of clips per class denoted Nc=165N_c = 165 for all c∈{1,…,12}c \in \{1, \dots, 12\}, yielding a total N=∑cNc=1980N = \sum_c N_c = 1980. This uniform distribution avoids class imbalance, supporting robust model benchmarking across all action types. While overall video duration varies between 2 and 5 seconds, detailed per-class length distributions are illustrated in Figure 1 of the cited study.

6. Identified Limitations

Several limitations of THETIS RGB are explicitly acknowledged by its creators:

  • No ball is present in any video, resulting in the absence of key spatio-temporal cues such as ball trajectory, bounce, and spin.
  • Filming was conducted in only two artificial backgrounds, lacking representation of real-world tennis court environments, court lines, or variable lighting.
  • All actions are performed by intermediate or amateur players; professional-level match play is not represented.
  • The viewpoint is fixed with the player always facing the camera, restricting pose and perspective diversity.
  • The dataset lacks any form of fine-grained annotation, such as temporal action boundaries, player/ball bounding boxes, or skeletal keypoint tracks.

These limitations constrain the dataset’s suitability for tasks involving contextual understanding, multi-person interaction, or realistic match conditions.

7. Recommendations for Future Developments

The cited authors propose several concrete improvements for subsequent versions of tennis action datasets:

  • Recording in real tennis match environments (on-court), with full inclusion of ball, court lines, and multiple camera viewpoints.
  • Capturing at higher native resolutions and standard frame rates (e.g., 25–60 fps).
  • Providing precise time-stamped labels for shot boundaries, bounce events, and score changes.
  • Augmenting video with additional modalities, such as 2D/3D player and ball tracking, court calibration, and depth mapping.
  • Incorporating a broader range of player skill levels, including professionals, and varying environmental contexts (indoor/outdoor, different lighting).

A plausible implication is that the adoption of such improvements would increase the dataset’s ecological validity and granularity, thereby facilitating advances in both action understanding and higher-level strategic analysis.


THETIS RGB is significant for its rigorously balanced structure and detailed taxonomy of tennis strokes, enabling systematic benchmarking of deep learning models in action classification. However, researchers are cautioned regarding its environmental and contextual constraints, as highlighted in the foundational study (Hovad et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to THETIS RGB Dataset.