Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention (2105.11119v1)

Published 24 May 2021 in cs.CL and cs.AI

Abstract: Abusive language is a massive problem in online social platforms. Existing abusive language detection techniques are particularly ill-suited to comments containing heterogeneous abusive language patterns, i.e., both abusive and non-abusive parts. This is due in part to the lack of datasets that explicitly annotate heterogeneity in abusive language. We tackle this challenge by providing an annotated dataset of abusive language in over 11,000 comments from YouTube. We account for heterogeneity in this dataset by separately annotating both the comment as a whole and the individual sentences that comprise each comment. We then propose an algorithm that uses a supervised attention mechanism to detect and categorize abusive content using multi-task learning. We empirically demonstrate the challenges of using traditional techniques on heterogeneous content and the comparative gains in performance of the proposed approach over state-of-the-art methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hongyu Gong (44 papers)
  2. Alberto Valido (1 paper)
  3. Katherine M. Ingram (1 paper)
  4. Giulia Fanti (55 papers)
  5. Suma Bhat (28 papers)
  6. Dorothy L. Espelage (1 paper)
Citations (13)

Summary

We haven't generated a summary for this paper yet.