Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

IndicXNLI: Evaluating Multilingual Inference for Indian Languages (2204.08776v1)

Published 19 Apr 2022 in cs.CL and cs.AI

Abstract: While Indic NLP has made rapid advances recently in terms of the availability of corpora and pre-trained models, benchmark datasets on standard NLU tasks are limited. To this end, we introduce IndicXNLI, an NLI dataset for 11 Indic languages. It has been created by high-quality machine translation of the original English XNLI dataset and our analysis attests to the quality of IndicXNLI. By finetuning different pre-trained LMs on this IndicXNLI, we analyze various cross-lingual transfer techniques with respect to the impact of the choice of LLMs, languages, multi-linguality, mix-language input, etc. These experiments provide us with useful insights into the behaviour of pre-trained models for a diverse set of languages.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Divyanshu Aggarwal (9 papers)
  2. Vivek Gupta (75 papers)
  3. Anoop Kunchukuttan (45 papers)
Citations (24)

Summary

We haven't generated a summary for this paper yet.