Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Arabic Offensive Language on Twitter: Analysis and Experiments (2004.02192v3)

Published 5 Apr 2020 in cs.CL

Abstract: Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers use offensive language. Lastly, we conduct many experiments to produce strong results (F1 = 83.2) on the dataset using SOTA techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hamdy Mubarak (34 papers)
  2. Ammar Rashed (3 papers)
  3. Kareem Darwish (35 papers)
  4. Younes Samih (11 papers)
  5. Ahmed Abdelali (21 papers)
Citations (141)