Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Detecting Inappropriate Messages on Sensitive Topics that Could Harm a Company's Reputation (2103.05345v1)

Published 9 Mar 2021 in cs.CL

Abstract: Not all topics are equally "flammable" in terms of toxicity: a calm discussion of turtles or fishing less often fuels inappropriate toxic dialogues than a discussion of politics or sexual minorities. We define a set of sensitive topics that can yield inappropriate and toxic messages and describe the methodology of collecting and labeling a dataset for appropriateness. While toxicity in user-generated data is well-studied, we aim at defining a more fine-grained notion of inappropriateness. The core of inappropriateness is that it can harm the reputation of a speaker. This is different from toxicity in two respects: (i) inappropriateness is topic-related, and (ii) inappropriate message is not toxic but still unacceptable. We collect and release two datasets for Russian: a topic-labeled dataset and an appropriateness-labeled dataset. We also release pre-trained classification models trained on this data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Nikolay Babakov (13 papers)
  2. Varvara Logacheva (11 papers)
  3. Olga Kozlova (7 papers)
  4. Nikita Semenov (17 papers)
  5. Alexander Panchenko (92 papers)
Citations (10)

Summary

We haven't generated a summary for this paper yet.