2000 character limit reached
Contemporary Amharic Corpus: Automatically Morpho-Syntactically Tagged Amharic Corpus
Published 14 Jun 2021 in cs.CL | (2106.07241v1)
Abstract: We introduced the contemporary Amharic corpus, which is automatically tagged for morpho-syntactic information. Texts are collected from 25,199 documents from different domains and about 24 million orthographic words are tokenized. Since it is partly a web corpus, we made some automatic spelling error correction. We have also modified the existing morphological analyzer, HornMorpho, to use it for the automatic tagging.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.