Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hitachi at SemEval-2020 Task 12: Offensive Language Identification with Noisy Labels using Statistical Sampling and Post-Processing (2005.00295v1)

Published 1 May 2020 in cs.CL and cs.LG

Abstract: In this paper, we present our participation in SemEval-2020 Task-12 Subtask-A (English Language) which focuses on offensive language identification from noisy labels. To this end, we developed a hybrid system with the BERT classifier trained with tweets selected using Statistical Sampling Algorithm (SA) and Post-Processed (PP) using an offensive wordlist. Our developed system achieved 34 th position with Macro-averaged F1-score (Macro-F1) of 0.90913 over both offensive and non-offensive classes. We further show comprehensive results and error analysis to assist future research in offensive language identification with noisy labels.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Manikandan Ravikiran (9 papers)
  2. Amin Ekant Muljibhai (1 paper)
  3. Toshinori Miyoshi (1 paper)
  4. Hiroaki Ozaki (8 papers)
  5. Yuta Koreeda (9 papers)
  6. Sakata Masayuki (1 paper)
Citations (7)

Summary

We haven't generated a summary for this paper yet.