Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment (2208.10684v3)

Published 23 Aug 2022 in cs.CL and cs.AI

Abstract: Online hate speech detection has become an important issue due to the growth of online content, but resources in languages other than English are extremely limited. We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns. The dataset consists of 109k utterances from news comments and provides a multi-label classification using 1 to 4 labels, and handles subjectivity and intersectionality. We evaluate strong baseline experiments on K-MHaS using Korean-BERT-based LLMs with six different metrics. KR-BERT with a sub-character tokenizer outperforms others, recognizing decomposed characters in each hate speech class.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jean Lee (10 papers)
  2. Taejun Lim (3 papers)
  3. Heejun Lee (7 papers)
  4. Bogeun Jo (1 paper)
  5. Yangsok Kim (1 paper)
  6. Heegeun Yoon (1 paper)
  7. Soyeon Caren Han (48 papers)
Citations (17)