Papers
Topics
Authors
Recent
2000 character limit reached

Azerbaijani Sign Language Dataset Overview

Updated 24 November 2025
  • Azerbaijani Sign Language Dataset (AzSLD) is a comprehensive public corpus featuring three modalities for isolated sign recognition, fingerspelling, and sentence-level translation.
  • The dataset provides detailed frame-level annotations and a pip-installable data loader, ensuring consistent data collection under controlled conditions.
  • AzSLD supports low-resource sign language research with benchmarks for transformer and ConvLSTM models, facilitating advances in automated sign recognition.

The Azerbaijani Sign Language Dataset (AzSLD) is a public, large-scale multimodal corpus tailored for computer vision research in sign language recognition and translation, specifically targeting Azerbaijani Sign Language (AzSL). Published and curated by Alishzade and Hasanov, AzSLD supports research into isolated sign recognition, fingerspelling, and sentence-level translation. It is accompanied by comprehensive documentation, frame-level linguistic annotations, and baseline data processing software (Alishzade et al., 19 Nov 2024, Alishzade et al., 17 Nov 2025).

1. Composition and Structure

AzSLD is composed of three principal modalities:

  • Fingerspelling/alphabet-level: Images and short videos for each of the 32 letters in the Azerbaijani Latin script.
  • Word-level: Isolated signs for the 100 most frequent words in Azerbaijani, represented as RGB video clips.
  • Sentence-level: 500 sentences oriented toward social-service scenarios, with each sentence performed by 18–25 signers.

Key statistics:

Subset #Classes #Samples Format
Fingerspelling 32 10,864 images + 3,587 vids JPG / MP4
Word-level 100 ≈19,800–21,300 MP4
Sentence-level 500 9,000–12,500 MP4 + JSON annot.

The complete corpus encapsulates ≈65 hours of video with a total of over 30,000 samples. For isolated word recognition tasks, an internally curated subset (“AzSLD-isolated”) contains exactly 100 classes × 18 samples/class = 1,800 uniformly distributed videos (Alishzade et al., 17 Nov 2025).

2. Data Collection and Recording Conditions

All recordings were made under controlled laboratory conditions:

  • Cameras: Two Full-HD RGB cameras (frontal and upper-left side views), 30 fps, 1920×1080 resolution.
  • Background/Illumination: White, uniform, non-textured backdrop and standardized lighting.
  • Signer Demographics: 43 unique signers (42 DHH, 1 CODA, plus professional interpreters); majority are adult, both male and female, but finer demographic breakdown is not reported.
  • Protocol: All participants signed informed consent documents and the project adheres to Fairness, Accountability, Transparency, and Ethics (FATE) guidelines. No in-the-wild or spontaneous signing data is included (Alishzade et al., 19 Nov 2024).

Signers performed all classes in uniform studio conditions with fixed camera distance and position to maximize consistency and facilitate later processing.

3. Annotation and Data Splitting

Annotation protocols ensure utility for both isolated and continuous recognition:

  • Labeling:
    • Fingerspelling: Each image/video mapped to its corresponding letter.
    • Word-level: Each folder named for the ground-truth token (integer-encoded for model use).
    • Sentence-level: Gloss boundaries provided as JSON entries (label, start_frame, end_frame), with sentence-level spoken-Azerbaijani translations.
  • Glosses: Only manual linguistic boundaries and translations are provided; non-manual features are not yet annotated.
  • Data Splits: Default split is 80 % training / 20 % test (via included data loader), but variations (e.g., five-fold signer-independent cross-validation for the 1,800-sample subset) are used in benchmarking (Alishzade et al., 17 Nov 2025).

No inter-annotator agreement statistics have been published.

4. Preprocessing and Feature Representation

Raw video samples are distributed unprocessed; the reference software enables:

  • Frame Extraction: Uniform, optionally padding or interpolating to a fixed count (default 64).
  • Spatial Normalization: Central cropping/padding to 224×224 pixels; pixel normalization to [0, 1].
  • Manual Landmark Extraction: For research on the 1,800-sample subset, 63-dimensional hand landmark vectors per frame are derived using MediaPipe Holistic: 21 points per hand × 3D.
  • Temporal Alignment: Dynamic Time Warping (DTW) with a Sakoe–Chiba band (width = 10 frames); resampling to 64 frames using cubic spline interpolation (Alishzade et al., 17 Nov 2025).
  • Label Encoding: Integer and one-hot vector formats for model consumption; mapping files provided as pickled Python objects.
  • No Precomputed Features: Skeleton/keypoints, optical flow, or appearance-based features are not bundled with the base release, except as derived in secondary research contexts.

The pipeline is accessible via a pip-installable Python data loader:

1
2
from azsl_dataloader import AzSLDataset
ds = AzSLDataset(root_dir="/path/to/azsld", split="train", n_frames=64)
(Alishzade et al., 19 Nov 2024)

5. Evaluation, Benchmarks, and Metrics

The AzSLD release does not include official model baselines; however, a subsequent benchmarking paper on the isolated word subset reports a comparative evaluation:

  • Recognition architectures: ConvLSTM (recurrent) vs. Vanilla Transformer (attention-based)
  • Top-1 Recognition Accuracy: Transformer: up to 76.8 %, ConvLSTM: lower, particularly for small sample counts.
  • Top-5 Accuracy and Signer Independence: Transformer is superior; ConvLSTM is preferred for computationally efficient deployment.
  • Validation Protocol: Five-fold signer-independent cross-validation (no overlap of signers between train/test splits).
  • Metrics:

    Top-1=1Ni=1N1(y^i=yi)\mathrm{Top\text{-}1} = \frac{1}{N}\sum_{i=1}^{N}\mathbb{1}(\hat y_i = y_i)

    Top-5=1Ni=1N1(yi{y^i(1),,y^i(5)})\mathrm{Top\text{-}5} = \frac{1}{N}\sum_{i=1}^{N}\mathbb{1}(y_i \in \{\hat y_{i}^{(1)},\ldots,\hat y_{i}^{(5)}\})

    with NN test samples, yiy_i reference labels, y^i\hat y_i predictions (Alishzade et al., 17 Nov 2025).

No translation or sentence-level metrics (e.g., BLEU/WER) have been published for AzSLD. Users are expected to train models and compute application-relevant metrics using the provided data loader (Alishzade et al., 19 Nov 2024).

6. Licensing, Access, and Limitations

  • Open Access: AzSLD is available on Zenodo (10.5281/zenodo.13627301), with source code and prepackaged data loaders on GitHub (azsl_dataloader).
  • License: Creative Commons Attribution 4.0 International (CC BY 4.0), facilitating reuse in research and application development.
  • Limitations:
    • Lab-controlled domain; limited environmental variability.
    • No in-the-wild or background-diverse material.
    • Vocabulary limited to 32 letters, 100 words, 500 sentences.
    • No fine-grained signer demographic metadata.
    • Absence of reported baseline performance for fingerspelling or sentence translation.
    • Non-manual features (e.g., eye gaze) not annotated.

7. Applications and Future Directions

AzSLD is intended as foundational infrastructure for low-resource sign language research, enabling:

  • Isolated sign and fingerspelling recognition.
  • Continuous sign-to-text translation.
  • Sign language synthesis (e.g., avatar animation or sign-to-speech).
  • Real-time communication systems in social-service contexts.
  • Low-resource adaptation benchmarks (for transfer learning and signer adaptation studies).

Recommended future enhancements include expanding the vocabulary, incorporating in-the-wild recordings, augmenting with depth and skeleton annotations (MediaPipe, OpenPose), integrating non-manual annotation, publishing quantitative recognition and translation baselines, and supporting richer data augmentation strategies. Community collaboration is encouraged to improve coverage and benchmark establishment (Alishzade et al., 19 Nov 2024).

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Azerbaijani Sign Language Dataset (AzSLD).