2000 character limit reached
Challenges in Kurdish Text Processing (1212.0074v1)
Published 1 Dec 2012 in cs.IR and cs.CL
Abstract: Despite having a large number of speakers, the Kurdish language is among the less-resourced languages. In this work we highlight the challenges and problems in providing the required tools and techniques for processing texts written in Kurdish. From a high-level perspective, the main challenges are: the inherent diversity of the language, standardization and segmentation issues, and the lack of language resources.