Papers
Topics
Authors
Recent
2000 character limit reached

Urdu Morphology, Orthography and Lexicon Extraction

Published 6 Apr 2022 in cs.CL | (2204.03071v1)

Abstract: Urdu is a challenging language because of, first, its Perso-Arabic script and second, its morphological system having inherent grammatical forms and vocabulary of Arabic, Persian and the native languages of South Asia. This paper describes an implementation of the Urdu language as a software API, and we deal with orthography, morphology and the extraction of the lexicon. The morphology is implemented in a toolkit called Functional Morphology (Forsberg & Ranta, 2004), which is based on the idea of dealing grammars as software libraries. Therefore this implementation could be reused in applications such as intelligent search of keywords, language training and infrastructure for syntax. We also present an implementation of a small part of Urdu syntax to demonstrate this reusability.

Citations (25)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.