2000 character limit reached
Creating a morphological and syntactic tagged corpus for the Uzbek language (2210.15234v1)
Published 27 Oct 2022 in cs.CL
Abstract: Nowadays, creation of the tagged corpora is becoming one of the most important tasks of NLP. There are not enough tagged corpora to build machine learning models for the low-resource Uzbek language. In this paper, we tried to fill that gap by developing a novel Part Of Speech (POS) and syntactic tagset for creating the syntactic and morphologically tagged corpus of the Uzbek language. This work also includes detailed description and presentation of a web-based application to work on a tagging as well. Based on the developed annotation tool and the software, we share our experience results of the first stage of the tagged corpus creation
- Maksud Sharipov (5 papers)
- Jamolbek Mattiev (1 paper)
- Jasur Sobirov (1 paper)
- Rustam Baltayev (1 paper)