Evaluating Automatically Generated Orthographies

Develop formal evaluation metrics and procedures to assess the orthographies produced by the IASC Orthography module, which maps IPA phonemic transcriptions to graphemes in a chosen script, determining whether the resulting phoneme–grapheme correspondences are reasonable and typologically plausible rather than merely internally consistent one-to-one mappings.

Background

The IASC pipeline generates a segmental writing system by asking an LLM to produce a program that maps IPA phonemes to graphemes in an existing script (e.g., Latin, Cyrillic, Arabic, Greek). The resulting systems are typically shallow, one-to-one phoneme–grapheme mappings, though models sometimes make creative choices that align with particular orthographic traditions.

The authors note that while such orthographies are consistent, real-world writing systems often reflect historical change, conventional idiosyncrasies, and cross-linguistic variation. Consequently, the central unresolved issue is how to objectively evaluate the quality and plausibility of the generated orthographies beyond internal consistency.

References

One outstanding question is how one evaluates the orthography. The rules generated by the system are consistent in that it produces one-for-one mappings between phonemes and graphemes, leading to a very shallow orthography. But are the mappings reasonable?

— IASC: Interactive Agentic System for ConLangs (2510.07591 - Taguchi et al., 8 Oct 2025) in Subsection "Orthography" (within Section "Stages of ConLang Construction")

Evaluating Automatically Generated Orthographies

Background

References

Related Problems