Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Automatic Identification of Closely-related Indian Languages: Resources and Experiments (1803.09405v1)

Published 26 Mar 2018 in cs.CL

Abstract: In this paper, we discuss an attempt to develop an automatic language identification system for 5 closely-related Indo-Aryan languages of India, Awadhi, Bhojpuri, Braj, Hindi and Magahi. We have compiled a comparable corpora of varying length for these languages from various resources. We discuss the method of creation of these corpora in detail. Using these corpora, a language identification system was developed, which currently gives state of the art accuracy of 96.48\%. We also used these corpora to study the similarity between the 5 languages at the lexical level, which is the first data-based study of the extent of closeness of these languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Ritesh Kumar (42 papers)
  2. Bornini Lahiri (5 papers)
  3. Deepak Alok (3 papers)
  4. Atul Kr. Ojha (19 papers)
  5. Mayank Jain (14 papers)
  6. Abdul Basit (31 papers)
  7. Yogesh Dawer (3 papers)
Citations (31)

Summary

We haven't generated a summary for this paper yet.