Papers
Topics
Authors
Recent
2000 character limit reached

Camlang: Constructed Language for Evaluation

Updated 27 January 2026
  • Camlang is a constructed language characterized by a rich, agglutinative grammar and controlled, leakage-free evaluation environment.
  • It features an explicit grammar book and bilingual English–Camlang dictionary that enable systematic analysis of morphological, syntactic, and semantic processing.
  • Camlang serves as a diagnostic tool for comparing human and AI metalinguistic reasoning, highlighting limitations in current language model performances.

Camlang is a constructed, typologically grounded language devised to evaluate metalinguistic reasoning in humans and LLMs under tightly controlled, leakage-free conditions. Developed for the diagnostic study of explicit linguistic rule learning, Camlang provides a morphosyntactically rich, typologically plausible system combining features found across multiple real-world languages yet in an unattested configuration. Its core design consists of an explicit grammar book and a comprehensive bilingual English–Camlang dictionary, enabling controlled, resource-bound evaluation of parsing, lexical inference, and deductive language acquisition in both humans and computational models (Liu et al., 30 Aug 2025).

1. Syntactic Structure and Clause Formation

Camlang's syntax is defined by head-final, polysynthetic, pro-drop characteristics with a canonical SOV (subject-object-verb) word order. The clause structure is formally specified by a BNF-style grammar, capturing both hierarchical phrase structure and morphological processes:

S    [TopP][FocP]VP\mathsf{S} \;\to\;\bigl[\mathsf{TopP}\bigr]\,\bigl[\mathsf{FocP}\bigr]\,\mathsf{VP}

Where the verbal phrase (VP) is further structured as:

VP    SubjP  ObjP  VFin\mathsf{VP}\;\to\;\mathsf{SubjP}\;\mathsf{ObjP}\;\mathsf{VFin}

SubjP    [PrClitic]x(NP)ObjP    [PrClitic]x(NP)\mathsf{SubjP}\;\to\;\bigl[\mathsf{PrClitic}\bigr]\phantom{x}\mathsf{(NP)} \qquad \mathsf{ObjP}\;\to\;\bigl[\mathsf{PrClitic}\bigr]\phantom{x}\mathsf{(NP)}

Topicalization and focus are realized by dedicated proclitics (e.g., =átA for topic, =gIpA for focus) and left-dislocated noun phrases. In main finite clauses, subject agreement is obligatorily encoded as a verbal suffix, while non-subject arguments surface through person–case proclitics (e.g., =li '2 sg.nom', =mI 'wh–acc').

2. Morphology, Inflection, and Derivation

Camlang exemplifies agglutinative morphology with both affixal and clitic exponence. The nominal and verbal domains exhibit regular, productive paradigms, interacting to express a range of grammatical features:

  • Nominal Inflection:

NP    StemN  (Num)  (Def)  (Case)\mathsf{NP}\;\to\;\mathsf{Stem}_\mathrm{N}\;(\mathsf{Num})\;(\mathsf{Def})\;(\mathsf{Case})

Num-lAr  (plural)Def-ʔA  (definite)Case-nI  (acc)-s  (gen)\mathsf{Num}\to\texttt{-lAr}\;(plural)\qquad \mathsf{Def}\to\texttt{-ʔA}\;(definite)\qquad \mathsf{Case}\to\texttt{-nI}\;(acc)\mid\texttt{-s}\;(gen)\mid\dots

  • Verbal Inflection:

Finite verb forms include valency prefixes, verb stems, tense-aspect-mood (TAM) affixes, and agreement suffixes. For instance:

VFin    (valency-prefix)  V_stem  (TAM)  AGR\mathsf{VFin}\;\to\;\text{(valency-prefix)}\;\mathsf{V\_stem}\;\mathsf{(TAM)}\;\mathsf{AGR}

$\mathsf{TAM}\;\to\;\texttt{-mA^4}\;(prog)\;\mid\;\texttt{-DyI}\;(past)\;\mid\dots$

AGR      (3sg)    -kI  (2sg)\mathsf{AGR}\;\to\;\texttt{-Ø}\;(3sg)\;\mid\;\texttt{-kI}\;(2sg)

Examples include mu¨sˊ+-mmu¨sˊm\textit{müś}+\text{-m}\to\text{müśm} ("to live"), $\textit{e-m-mA^4}$ ("be eating"), and e-m-Ø\textit{e-m-Ø} ("he eats").

An irregular pattern is seen in the verb "get" ("play (sports)"), which forms its plural noun by full reduplication (getgetget\textit{get} \rightarrow \textit{getget}).

3. Lexicon and Semantic Features

The lexicon is engineered with explicit one-to-one mappings for most entries, excepting select calques, loans, and compounds. The semantic distinction between polysemous entries is rigorously maintained, as in the contrast of "ąt" ("play-with-toys") versus "get" ("play sports/instrument").

headword PoS gloss
nos n. child; kid
müś v. live; survive
jer v. want
me pro. what (wh-word)
fa v. do; make
xöt v. must; need
ąt n. play (with toys)
get v. play (sports/instr.)
wec v. ask (someone GEN something ACC)
cewmyl n. answer

All stems retain transparent morphotactic boundaries, supporting compositional derivation and inflection.

4. Phonological and Orthographic Structure

Camlang implements a strict (C)V(C) syllable template, enforced by epenthesis to break vowel clusters. The vowel inventory consists of eight vowels with [±back] Turkic-style harmony; unstressed vowels reduce to a central mid position. The consonantal inventory includes 50 phonemes—markedly symmetric across place and manner classes (four places, stops/fricatives/nasals). Consonantal aspiration is morphophonologically triggered at clitic junctures. Orthography maps directly to phonemic representations, with tone and syllabification left unmarked.

5. Illustrative Sentences and Grammatical Analysis

Sample sentences provide insight into morphosyntactic and semantic compositionality:

  • “lichéwcymyşür”:

li=x=cew-RED-mA4-s==jUr 2sg.nom=EZ=answer-RED-PROG-NMLS-GEN=at "when you are answering"

  • “nosáttä mëśjer. meni myvá xöt?”:

nos=átA müś-m=jer-Ø meni mI=fa-m= xöt-Ø? child=TOP live-INF=want-3sg what WH.ACC=do-INF=need-3sg "A child wants to live. What does he need to do?"

  • “wecmylírsi irwéč xöt.”:

wecmyl-ir-si irwéč-Ø xöt-Ø question-GEN-PL ask-3pl=ACC need-3sg "They need to ask questions."

The segmentation and glossing annotate cliticization, derivational morphology, and core arguments.

6. Applications: Diagnostic Evaluation for Metalinguistic Reasoning

Camlang is operationalized in controlled evaluation protocols, as exemplified in Camlang-CSQA-v0—a direct translation of 47 CommonsenseQA items into Camlang, each question accompanied by six Camlang answer choices. Participants and systems are provided only with the grammar book and dictionary, explicitly prohibiting data leakage or prior exposure. Task completion involves:

  • Morphosyntactic parsing of the Camlang input (with segmentation, affix parsing, head-final dependency analysis)
  • Morpheme-level lexical look-up to abstract meanings
  • Projection into semantic/pragmatic inference to resolve the underlying world-knowledge query

Human subjects attain ~87% exact match (EM) accuracy using only the designated resources. State-of-the-art LLMs (e.g., GPT-5) achieve 47% EM (in contrast to 98% EM for the English CommonsenseQA baseline), with most system successes traceable to partial lexical matching rather than full-rule compositional parsing or semantic generalization (Liu et al., 30 Aug 2025). This suggests Camlang exposes limitations in current LLM architectures concerning explicit, deductive, metalinguistic reasoning.

7. Significance and Research Impact

Camlang provides a leakage-free, cognitively realistic paradigm for probing the capacity of AI and human subjects to learn unseen grammars via explicit rule induction and resource-bounded reasoning. By decoupling morphosyntax, lexicon, and domain inference, the language design enables fine-grained error analysis distinguishing morpho-syntactic, lexical, and reasoning failures. The Camlang-CSQA benchmark establishes a new frontier for evaluating progress in metalinguistic competence, distinct from memorized pattern matching on natural languages (Liu et al., 30 Aug 2025).

A plausible implication is that research on language learning, artificial grammar induction, and human–machine comparison will benefit from the continued expansion and refinement of Camlang-style evaluation suites, enabling rigorous assessment of the mechanisms underpinning metalinguistic generalization and reasoning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Camlang Language.