Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discriminating between Indo-Aryan Languages Using SVM Ensembles (1807.03108v1)

Published 9 Jul 2018 in cs.CL

Abstract: In this paper we present a system based on SVM ensembles trained on characters and words to discriminate between five similar languages of the Indo-Aryan family: Hindi, Braj Bhasha, Awadhi, Bhojpuri, and Magahi. We investigate the performance of individual features and combine the output of single classifiers to maximize performance. The system competed in the Indo-Aryan Language Identification (ILI) shared task organized within the VarDial Evaluation Campaign 2018. Our best entry in the competition, named ILIdentification, scored 88:95% F1 score and it was ranked 3rd out of 8 teams.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Alina Maria Ciobanu (5 papers)
  2. Marcos Zampieri (94 papers)
  3. Shervin Malmasi (40 papers)
  4. Santanu Pal (21 papers)
  5. Liviu P. Dinu (23 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.