Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Text Detoxification using Large Pre-trained Neural Models (2109.08914v2)

Published 18 Sep 2021 in cs.CL and cs.LG

Abstract: We present two novel unsupervised methods for eliminating toxicity in text. Our first method combines two recent ideas: (1) guidance of the generation process with small style-conditional LLMs and (2) use of paraphrasing models to perform style transfer. We use a well-performing paraphraser guided by style-trained LLMs to keep the text content and remove toxicity. Our second method uses BERT to replace toxic words with their non-offensive synonyms. We make the method more flexible by enabling BERT to replace mask tokens with a variable number of words. Finally, we present the first large-scale comparative study of style transfer models on the task of toxicity removal. We compare our models with a number of methods for style transfer. The models are evaluated in a reference-free way using a combination of unsupervised style transfer metrics. Both methods we suggest yield new SOTA results.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. David Dale (18 papers)
  2. Anton Voronov (7 papers)
  3. Daryna Dementieva (20 papers)
  4. Varvara Logacheva (11 papers)
  5. Olga Kozlova (7 papers)
  6. Nikita Semenov (17 papers)
  7. Alexander Panchenko (92 papers)
Citations (70)