2000 character limit reached
Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian (1911.11503v1)
Published 26 Nov 2019 in cs.CL and cs.AI
Abstract: We present experiments with part-of-speech tagging for Bulgarian, a Slavic language with rich inflectional and derivational morphology. Unlike most previous work, which has used a small number of grammatical categories, we work with 680 morpho-syntactic tags. We combine a large morphological lexicon with prior linguistic knowledge and guided learning from a POS-annotated corpus, achieving accuracy of 97.98%, which is a significant improvement over the state-of-the-art for Bulgarian.