Canonical Semantic Form (CSF)

Updated 12 January 2026

CSF is a language-neutral semantic framework that employs a structured 9-tuple of discrete slots to decompose utterances for precise sign language generation.
It features a detailed condition taxonomy with 35 classes across 8 categories to capture nuanced, context-dependent linguistic expressions.
A lightweight Transformer-based slot extractor achieves over 99% accuracy, enabling parameter-efficient, real-time crosslingual translation.

Canonical Semantic Form (CSF) is a language-agnostic semantic representation designed to enable direct translation from any source language to sign languages without mediating through English. CSF represents utterances as a structured 9-tuple of discrete semantic slots, supporting precise, universal mapping suitable for multilingual sign language generation. It is constructed to facilitate parameter-efficient, real-time processing and provides explicit decomposition of utterances for nuanced linguistic phenomena, notably conditional expressions, across a broad typological spectrum (Bao, 5 Jan 2026).

1. Formal Specification and Semantic Slot Scheme

Let $x$ denote an input utterance in any natural language. The canonical representation is defined as:

$\text{CSF}(x) = (e(x), i(x), t(x), c(x), a(x), o(x), l(x), p(x), m(x))$

where each function extracts the corresponding semantic slot. More formally, there exist finite sets:

$S_\text{event},\ S_\text{intent},\ S_\text{time},\ S_\text{condition},\ S_\text{agent},\ S_\text{object},\ S_\text{location},\ S_\text{purpose},\ S_\text{modifier}$

and a mapping

$f_\text{CSF}:\ \text{Vocabulary}^* \to S_\text{event} \times S_\text{intent} \times S_\text{time} \times S_\text{condition} \times S_\text{agent} \times S_\text{object} \times S_\text{location} \times S_\text{purpose} \times S_\text{modifier}$

The slot inventory and their permitted values are summarized below:

Slot	Value Set Size	Example Values
event	7	GO, STAY, BUY, WORK, MEET, EAT, LEARN
intent	4	NONE, PLAN, WANT, DECIDE
time	5	NONE, TODAY, TOMORROW, YESTERDAY, NOW
condition	35	See Section 2
agent	5	ME, YOU, HE, SHE, THEY
object	5	NONE, FOOD, BOOK, MEDICINE, THING
location	6	NONE, HOME, SCHOOL, HOSPITAL, OFFICE, STORE
purpose	2	NONE, REST
modifier	4	NONE, FAST, SLOW, ALONE

"NONE" denotes the absence of a value. Each slot captures a high-level primitive, enabling compositional meaning across languages.

2. Condition Taxonomy: Semantic Breadth and Systematicity

The condition slot ( $c \in S_\text{condition}$ ) is distinguished by a fine-grained taxonomy encompassing 35 classes allocated to eight disjoint categories:

Category	Condition Classes
Weather	IF_RAIN, IF_SUNNY, IF_COLD, IF_HOT, IF_WINDY
Time	IF_LATE, IF_EARLY, IF_WEEKEND, IF_NIGHT, IF_MORNING
Health	IF_SICK, IF_TIRED, IF_HUNGRY, IF_THIRSTY, IF_FULL
Schedule	IF_BUSY, IF_FREE, IF_HOLIDAY, IF_WORKING
Mood	IF_BORED, IF_HAPPY, IF_SAD, IF_STRESSED, IF_ANGRY
Social	IF_ALONE, IF_WITH_FRIENDS, IF_WITH_FAMILY
Activity	IF_FINISH_WORK, IF_FINISH_SCHOOL, IF_FINISH_EATING, IF_WATCH_MOVIE, IF_LISTEN_MUSIC
Financial	IF_HAVE_MONEY, IF_NO_MONEY

This taxonomy enables explicit encoding of conditional expressions prevalent in natural language, such as weather contingencies, temporal and habitual conditions, agent internal state, and socio-economic factors. Examples include:

“If it rains, I stay home.” $\to$ c = IF_RAIN
“When I’m bored, I watch Netflix.” $\to$ c = IF_BORED
“If I have money, I go shopping.” $\to$ c = IF_HAVE_MONEY

A plausible implication is that such detailed condition granularity supports nuanced, inferential translation for sign languages, which often encode conditional semantics syntactically rather than lexically.

3. Transformer-Based Slot Extraction Architecture

CSF slot extraction is operationalized via a bespoke lightweight Transformer. The architecture includes:

Subword tokenization with a custom BPE vocabulary ( $|V|=8,\!000$ )
Embedding layer maps tokens to $\mathbb{R}^d$ , adding positional information
$L$ stacked Transformer encoder layers using Pre-LayerNorm, $H$ -headed self-attention, and dual-layer FFN (inner size = $1,\!024$ , GELU activation)
A global [CLS] token representation $h_\text{cls} \in \mathbb{R}^d$ feeds nine slot-wise softmax classifiers

Per-slot prediction is given by:

$z_k = W_k h_\text{cls} + b_k \in \mathbb{R}^{|S_k|}$

$p_k = \operatorname{softmax}(z_k)$

for slot $k=1,\ldots,9$ , with $W_k$ and $b_k$ as learned parameters. The total loss, summed over all slots, is:

$L(x;\theta) = \sum_{k=1}^9 \operatorname{CE}(y_k, p_k)$

where $y_k$ is the one-hot ground truth for slot $k$ .

Model specifications include approximately $1.5\times 10^6$ parameters, an ONNX export size of $433.7$ KB, and a complete deployment footprint of $0.74$ MB. Inference achieves $3.02$ ms per utterance on CPU, supporting real-time applications.

4. Data Regime, Training Procedure, and Empirical Performance

Training utilizes a dataset of $18,\!885$ utterances distributed across English, Vietnamese, Japanese, and French:

Train/validation split: $16,\!996/1,\!889$
All $35$ condition classes represented; $NONE$ ≈ $22.6\%$
Optimization: AdamW (weight decay $0.01$), learning rate $2\times10^{-4}$ , OneCycleLR with cosine decay, $3,\!990$ steps over $15$ epochs, $10\%$ warm-up, A100 GPU ( $\sim$ 20 min total).

Performance is reported as slot-level accuracy and averaged over all slots:

Slot	Classes	Accuracy (%)
event	7	97.8
intent	4	99.2
time	5	99.6
condition	35	99.4
agent	5	99.0
object	5	99.2
location	6	97.9
purpose	2	99.7
modifier	4	99.5
Average	—	99.03

Condition classification reaches $99.4\%$ across $35$ classes, indicating robust, fine-grained extraction even under high class cardinality. This suggests strong crosslingual generalization from a unified, parameter-efficient model.

5. Deterministic Mapping to Signed Gloss Representations

Once extracted, the nine semantic slots are mapped deterministically to a gloss sequence emulating American Sign Language (ASL) topic–comment structure. The output ordering $\pi$ is $(\text{modifier},\ \text{time},\ \text{condition},\ \text{agent},\ \text{location},\ \text{object},\ \text{event},\ \text{purpose})$ . The gloss string construction is:

$\text{GLOSS}(x) = [\, v_k\ |\ k \in \pi,\ v_k \neq \text{“NONE”} \,]$

where $v_k$ is the label for slot $k$ .

For example, the utterance “If it rains tomorrow, I stay home.” is mapped as follows:

CSF: $(e=\text{STAY},\ i=\text{NONE},\ t=\text{TOMORROW},\ c=\text{IF\_RAIN},\ a=\text{ME},\ o=\text{NONE},\ l=\text{HOME},\ p=\text{NONE},\ m=\text{NONE})$
GLOSS output: TOMORROW IF_RAIN HOME STAY

The conversion is algorithmically defined:

def CSF_to_GLOSS(slots):
    order = [modifier, time, condition, agent, location, object, event, purpose]
    gloss_seq = []
    for s in order:
        if slots[s] != "NONE":
            gloss_seq.append(slots[s])
    return gloss_seq

A plausible implication is that this explicit pipeline reduces ambiguity in mapping open-domain sentences to sign language production by decomposing them into non-overlapping primitives.

6. Practicality, Scope, and Significance

By employing fixed, language-neutral semantic primitives, CSF enables direct $L_1\to \text{Sign}$ translation—removing reliance on English as a pivot and bypassing inherent bottlenecks in resource-lean languages. The framework demonstrates:

Extreme parameter and compute efficiency ($0.74$ MB deployment, $3.02$ ms inference)
Exceptional crosslingual generalization ( $99.03\%$ slot extraction, $99.4\%$ condition classification)
The most comprehensive condition taxonomy yet published for sign language translation (35-class, 8-category schema)
Applicability in browser-based environments for real-time multimodal accessibility

This design bridges typologically diverse spoken languages and signed languages with a unified, interpretable intermediate representation, directly addressing barriers faced by global Deaf communities in current translation systems that require English mediation (Bao, 5 Jan 2026).

PDF Markdown Chat (Pro)

References (1)

CSF: Contrastive Semantic Features for Direct Multilingual Sign Language Generation (2026)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Canonical Semantic Form (CSF).