Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models (2010.03542v1)

Published 7 Oct 2020 in cs.CL

Abstract: This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained LLMs, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Shuohuan Wang (30 papers)
  2. Jiaxiang Liu (39 papers)
  3. Xuan Ouyang (9 papers)
  4. Yu Sun (226 papers)
Citations (33)

Summary

We haven't generated a summary for this paper yet.