Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ParsiNLU: A Suite of Language Understanding Challenges for Persian (2012.06154v2)

Published 11 Dec 2020 in cs.CL and cs.AI

Abstract: Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this rich language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of high-level tasks -- Reading Comprehension, Textual Entailment, etc. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5$k$ new instances across 6 distinct NLU tasks. Besides, we present the first results on state-of-the-art monolingual and multi-lingual pre-trained language-models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (25)
  1. Daniel Khashabi (83 papers)
  2. Arman Cohan (121 papers)
  3. Siamak Shakeri (29 papers)
  4. Pedram Hosseini (12 papers)
  5. Pouya Pezeshkpour (25 papers)
  6. Malihe Alikhani (50 papers)
  7. Moin Aminnaseri (3 papers)
  8. Marzieh Bitaab (2 papers)
  9. Faeze Brahman (47 papers)
  10. Sarik Ghazarian (13 papers)
  11. Mozhdeh Gheini (8 papers)
  12. Arman Kabiri (3 papers)
  13. Rabeeh Karimi Mahabadi (9 papers)
  14. Omid Memarrast (5 papers)
  15. Ahmadreza Mosallanezhad (10 papers)
  16. Erfan Noury (3 papers)
  17. Shahab Raji (4 papers)
  18. Mohammad Sadegh Rasooli (15 papers)
  19. Sepideh Sadeghi (2 papers)
  20. Erfan Sadeqi Azer (11 papers)
Citations (38)