Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training (2307.14666v1)

Published 27 Jul 2023 in cs.CL

Abstract: This paper addresses the classification of Arabic text data in the field of NLP, with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mohammad Majd Saad Al Deen (1 paper)
  2. Maren Pielka (8 papers)
  3. Jörn Hees (28 papers)
  4. Bouthaina Soulef Abdou (1 paper)
  5. Rafet Sifa (32 papers)