Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Trawling for Trolling: A Dataset (2008.00525v1)

Published 2 Aug 2020 in cs.CY and cs.SI

Abstract: The ability to accurately detect and filter offensive content automatically is important to ensure a rich and diverse digital discourse. Trolling is a type of hurtful or offensive content that is prevalent in social media, but is underrepresented in datasets for offensive content detection. In this work, we present a dataset that models trolling as a subcategory of offensive content. The dataset was created by collecting samples from well-known datasets and reannotating them along precise definitions of different categories of offensive content. The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech. It encompasses content from Twitter, Reddit and Wikipedia Talk Pages. Models trained on our dataset show appreciable performance without any significant hyperparameter tuning and can potentially learn meaningful linguistic information effectively. We find that these models are sensitive to data ablation which suggests that the dataset is largely devoid of spurious statistical artefacts that could otherwise distract and confuse classification models.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hitkul (4 papers)
  2. Karmanya Aggarwal (4 papers)
  3. Pakhi Bamdev (4 papers)
  4. Debanjan Mahata (25 papers)
  5. Rajiv Ratn Shah (108 papers)
  6. Ponnurangam Kumaraguru (129 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.