Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Methods for Detoxification of Texts for the Russian Language (2105.09052v1)

Published 19 May 2021 in cs.CL and cs.LG

Abstract: We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models - unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model - and compare them with several baselines. In addition, we describe evaluation setup providing training datasets and metrics for automatic evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Daryna Dementieva (20 papers)
  2. Daniil Moskovskiy (9 papers)
  3. Varvara Logacheva (11 papers)
  4. David Dale (18 papers)
  5. Olga Kozlova (7 papers)
  6. Nikita Semenov (17 papers)
  7. Alexander Panchenko (92 papers)
Citations (19)

Summary

We haven't generated a summary for this paper yet.