IsharaKotha: Bangla Sign Language Corpus
- IsharaKotha is an avatar-based Bangla Sign Language resource that uses the HamNoSys notation system to encode signs for detailed, language-agnostic transcription.
- It comprises a structured corpus of 3,823 annotated entries with integrated SiGML files, facilitating dynamic, real-time sign generation via a modular animation pipeline.
- The system employs a deep learning-based lemmatizer achieving 79.22% accuracy and renders animations at approximately 30 fps, validated by rigorous evaluation protocols.
IsharaKotha is an avatar-based Bangla Sign Language (BSL) resource designed for text-to-sign translation, integrating a structured linguistic corpus encoded in the Hamburg Notation System (HamNoSys) with a modular animation rendering pipeline. It is the first comprehensive, HamNoSys-based Bangla Sign Language corpus and supports both research and practical applications requiring dynamic sign generation and avatar animation from Bangla text input (Islam et al., 21 Nov 2025).
1. Corpus Architecture and Linguistic Representation
The IsharaKotha corpus is phonetically encoded using HamNoSys, a notation system developed for detailed, language-agnostic transcription of signed languages. Each sign, corresponding to a letter, digit, or word, is decomposed into five features:
- Handshape (~200 possible configurations)
- Orientation (six principal palm/finger directions)
- Location (over 30 body-relative positions)
- Movement (e.g., straight, circular, repeated)
- Non-manual features (facial expressions, head/lip movement)
For instance, the sign for the Bangla word for "book" combines a two-hand symmetry operator, flat handshapes with palmar orientation inward, finger contact at chest height, an opening motion, and no non-manual markers. Signs authored in HamNoSys are converted into SiGML (Signing Gesture Markup Language) XML files using the SiS-Builder toolkit, with separate <hamnosys_manual> and <hamnosys_nonmanual> tags representing manual and non-manual components, respectively, and a gloss attribute linking each sign to its Bangla term (Islam et al., 21 Nov 2025).
2. Corpus Scope, Coverage, and Metadata
The corpus comprises 3,823 annotated sign entries, spanning alphabets, digits, and 34 semantic classes of vocabulary:
| Category | Entries |
|---|---|
| Alphabets | 49 |
| Digits | 10 |
| Word signs (34 classes) | 3,764 |
| Total | 3,823 |
Primary semantic domains include Crime & Law (38), Economics (35), Food & Drinks (234), Household Items (342), Human Characteristics (470), Sports (53), and Others (776). Each entry is annotated with:
- Bangla orthographic gloss
- HamNoSys transcription
- SiGML file (manual/non-manual)
- Semantic category tag
This structure enables downstream NLP tasks, comprehensive annotation, and consistent mapping between Bangla text and sign form (Islam et al., 21 Nov 2025).
3. Text-to-Sign Translation Pipeline and Lemmatization
The IsharaKotha workflow operates as follows:
- Input Processing: Raw Bangla sentences are segmented and tokenized.
- Lemmatization: Inflected tokens are mapped to lemmas using a deep learning–based sequence-to-sequence (Seq2Seq) model with global attention. The architecture uses a Bi-LSTM encoder to process input character sequences, an attention function:
where , a unidirectional LSTM decoder, and a softmax output layer:
Trained on a corpus of 94,781 word-form pairs, the lemmatizer achieves 79.22% accuracy.
- SiGML Retrieval: For each lemma, a pre-computed SiGML file is located.
- Animation Rendering: SiGML sequences are rendered via a 3D avatar engine (Islam et al., 21 Nov 2025).
4. Avatar-Based Animation Generation
The rendering engine uses the JASigning platform to translate SiGML into avatar motion, mapping HamNoSys features as follows:
- Handshapes: Symbol-to-joint angle presets
- Orientation/Location: Relative palm/finger orientation mapped to avatar body coordinates
- Movement: Bézier-style interpolation of hand trajectories
- Non-manual features: Head and facial animations parsed from
<hamnosys_nonmanual>tags
This pipeline supports real-time signing at ≈30 fps, enabling dynamic generation without recourse to pre-recorded video data (Islam et al., 21 Nov 2025).
5. Evaluation Protocol and Quantitative Results
Evaluation was performed via a publicly accessible web interface, partitioned into alphabets, digits, word categories, and sentences. Three evaluators (two professional sign interpreters and one hearing-impaired athlete) rated each animation on a scale (“Bad”=1, “Average”=2, “Good”=3, “Excellent”=4), yielding 3,828 ratings overall.
Distribution of ratings:
| Rating | Count | Percentage |
|---|---|---|
| Bad | 116 | 3.03% |
| Average | 294 | 7.68% |
| Good | 2,346 | 61.29% |
| Excellent | 1,072 | 28.00% |
The aggregate mean score is , with a variance of ≈0.4774, standard deviation ≈0.691, and a 95% confidence interval of . Digits scored ≈3.6–4.0, while full sentence signing scored ≈3.06–3.32, reflecting relatively higher rates of lemmatizer errors (≈20% inflection mistransformations) and incomplete lemma coverage (Islam et al., 21 Nov 2025).
6. Applications, Limitations, and Future Directions
IsharaKotha supports applications including e-learning for the hearing-impaired, real-time smartphone/web text-to-sign translation, and sign language production or annotation for NLP research. Notable limitations include an estimated 1% omission rate for semantic units (chiefly directional/facial signs), reliance on a static dictionary with incomplete full-sentence coverage, and simplified avatar facial expression modeling. Extensions under development involve expanding the sign inventory (>10,000 entries), improving morphological analysis (target >90% lemmatizer accuracy), refining avatar blendshapes and eye gaze, grammar-based reordering for smoother multiword signing, and computer vision–assisted semi-automated HamNoSys transcription from usable video corpora (Islam et al., 21 Nov 2025).
By providing a rigorously annotated, extensible, and openly accessible Bangla Sign Language resource based on HamNoSys and SiGML standards, IsharaKotha establishes the foundation for scalable, dynamic text-to-sign translation systems and advances the state of computational sign linguistics for Bangla.