SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) (2006.07235v2)

Published 12 Jun 2020 in cs.CL

Abstract: We present the results and main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval 2020). The task involves three subtasks corresponding to the hierarchical taxonomy of the OLID schema (Zampieri et al., 2019a) from OffensEval 2019. The task featured five languages: English, Arabic, Danish, Greek, and Turkish for Subtask A. In addition, English also featured Subtasks B and C. OffensEval 2020 was one of the most popular tasks at SemEval-2020 attracting a large number of participants across all subtasks and also across all languages. A total of 528 teams signed up to participate in the task, 145 teams submitted systems during the evaluation period, and 70 submitted system description papers.

PDF Abstract

Overview of SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval-2020)

The paper presents the results and methodology of SemEval-2020 Task 12, known as OffensEval-2020, which focused on the identification of offensive language in social media across multiple languages. Building on the framework established in OffensEval-2019, the task extended the research scope by introducing four new languages—Arabic, Danish, Greek, and Turkish—alongside English. The framework utilized the OLID schema’s hierarchical approach for offensive language annotation.

Task Formulation

OffensEval-2020 was structured into three subtasks per OLID's taxonomy for the English language and an overarching subtask for other languages:

Subtask A: Offensive language identification.
Subtask B: Categorization of offense types into targeted or untargeted.
Subtask C: Identification of the offense target, whether an individual, group, or other entities.

The task attracted substantial interest, with 528 registered teams, 145 of which submitted results. Moreover, 70 teams contributed system description papers.

Methodological Framework

The datasets utilized included a new semi-supervised dataset, SOLID, which offered over nine million English tweets, and multilingual datasets adhering to the OLID schema. Participants employed an array of machine learning models, with pre-trained Transformer models such as BERT and its variants, playing a central role. Numerous systems also explored cross-lingual approaches leveraging the multilingual setup.

Results and Performance

Subtask A (English): The best performing model achieved an F1 score of 0.9204, primarily through Transformer ensembles. Notably, the competition saw a tight clustering of high F1 scores, indicating strong overall performance across submissions.
Subtask B (English): The top model in this subtask recorded an F1 score of 0.7462 using a teacher-student architecture, showcasing the complexity and variety of approaches used.
Subtask C (English): With an F1 score of 0.7145, the leading team also utilized knowledge distillation techniques, underlining a preference for sophisticated deep learning methods.

For the multilingual subtasks:

Arabic, Danish, Greek, Turkish: The Arabic highest F1 score was 0.9017, Danish at 0.8119, Greek at 0.8522, and Turkish at 0.8258. The results highlight the potential of multilingual datasets and architectures, with most high-performing teams using Transformer-based models.

Implications and Future Directions

The strong performance across languages demonstrates the viability of multilingual offensive language identification and the efficacy of using large pre-trained models coupled with domain-specific fine-tuning. The successful integration of multiple languages offers promising avenues for further research into cross-lingual learning and domain transferability.

Future iterations of the task could explore underrepresented languages, the challenge of code-switching, and the dynamics across various social media platforms. Expanding the subtasks for non-English languages and improving datasets' richness and diversity could significantly contribute to advancing multilingual natural language processing.

In conclusion, OffensEval-2020 effectively advanced the field of abusive language detection by leveraging multilingual resources and complex model architectures, setting a foundation for future developments in addressing online offensive content across languages.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Marcos Zampieri (94 papers)
Preslav Nakov (253 papers)
Sara Rosenthal (21 papers)
Pepa Atanasova (27 papers)
Georgi Karadzhov (20 papers)
Hamdy Mubarak (34 papers)
Leon Derczynski (48 papers)
Zeses Pitenis (2 papers)
Çağrı Çöltekin (8 papers)

Citations (462)

View on Semantic Scholar