2000 character limit reached
Orthographic Syllable as basic unit for SMT between Related Languages
Published 3 Oct 2016 in cs.CL | (1610.00634v1)
Abstract: We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.