MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages (2404.02037v1)
Abstract: Text detoxification is a textual style transfer (TST) task where a text is paraphrased from a toxic surface form, e.g. featuring rude words, to the neutral register. Recently, text detoxification methods found their applications in various task such as detoxification of LLMs (Leong et al., 2023; He et al., 2024; Tang et al., 2023) and toxic speech combating in social networks (Deng et al., 2023; Mun et al., 2023; Agarwal et al., 2023). All these applications are extremely important to ensure safe communication in modern digital worlds. However, the previous approaches for parallel text detoxification corpora collection -- ParaDetox (Logacheva et al., 2022) and APPADIA (Atwell et al., 2022) -- were explored only in monolingual setup. In this work, we aim to extend ParaDetox pipeline to multiple languages presenting MultiParaDetox to automate parallel detoxification corpus collection for potentially any language. Then, we experiment with different text detoxification models -- from unsupervised baselines to LLMs and fine-tuned models on the presented parallel corpora -- showing the great benefit of parallel corpus presence to obtain state-of-the-art text detoxification models for any language.
- Haterephrase: Zero- and few-shot reduction of hate intensity in online posts using large language models. CoRR, abs/2310.13985.
- Bigscience: A case study in the social construction of a multilingual large language model. CoRR, abs/2212.04960.
- Deep learning models for multilingual hate speech detection. CoRR, abs/2004.06465.
- APPDIA: A discourse-aware transformer-based style transfer model for offensive social media conversations. In Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, pages 6063–6074. International Committee on Computational Linguistics.
- Anatoly Belchikov. 2019. Russian language toxic comments. https://www.kaggle.com/blackmoon/russian-language-toxic-comments. Accessed: 2023-12-14.
- Kateryna Bobrovnyk. 2019a. Automated building and analysis of ukrainian twitter corpus for toxic text detection. In COLINS 2019. Volume II: Workshop.
- Kateryna Bobrovnyk. 2019b. Ukrainian obscene lexicon. https://github.com/saganoren/obscene-ukr. Accessed: 2023-12-14.
- Olá, bonjour, salve! XFORMAL: A benchmark for multilingual formality style transfer. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3199–3216, Online. Association for Computational Linguistics.
- Evaluating prose style transfer with the bible. Royal Society open science, 5(10):171920.
- Text detoxification using large pre-trained neural models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, pages 7979–7996. Association for Computational Linguistics.
- Exploring methods for cross-lingual text style transfer: The case of text detoxification. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1083–1101, Nusa Dua, Bali. Association for Computational Linguistics.
- Methods for detoxification of texts for the russian language. Multimodal Technol. Interact., 5(9):54.
- Recent advances towards safe, responsible, and moral dialogue systems: A survey. CoRR, abs/2302.09270.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
- Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 878–891. Association for Computational Linguistics.
- DiffuDetox: A mixed diffusion model for text detoxification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 7566–7574, Toronto, Canada. Association for Computational Linguistics.
- Detoxifying text with marco: Controllable revision with experts and anti-experts. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 228–242. Association for Computational Linguistics.
- You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content. In 2024 IEEE Symposium on Security and Privacy (SP), pages 60–60, Los Alamitos, CA, USA. IEEE Computer Society.
- Self-detoxifying language models via toxification reversal. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4433–4449, Singapore. Association for Computational Linguistics.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
- Multilingual denoising pre-training for neural machine translation. Trans. Assoc. Comput. Linguistics, 8:726–742.
- Paradetox: Detoxification with parallel data. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, pages 6804–6818. Association for Computational Linguistics.
- Crosslingual generalization through multitask finetuning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, Canada, July 9-14, 2023, pages 15991–16111. Association for Computational Linguistics.
- Low-resource text style transfer for Bangla: Data & models. In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 34–47, Singapore. Association for Computational Linguistics.
- Beyond denouncing hate: Strategies for countering implied biases and stereotypes in language. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9759–9777, Singapore. Association for Computational Linguistics.
- Fighting offensive language on social media with unsupervised text style transfer. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 189–194, Melbourne, Australia. Association for Computational Linguistics.
- Detecting and monitoring hate speech in twitter. Sensors, 19(21):4654.
- RoBERTuito: a pre-trained language model for social media text in Spanish. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 7235–7243, Marseille, France. European Language Resources Association.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- Sudha Rao and Joel Tetreault. 2018. Dear sir or madam, may I introduce the GYAFC dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 129–140, New Orleans, Louisiana. Association for Computational Linguistics.
- Aleksandr Semiletov. 2020. Toxic Russian Comments: Labelled comments from the popular Russian social network. https://www.kaggle.com/alexandersemiletov/toxic-russian-comments. Accessed: 2023-12-14.
- Subtle misogyny detection and mitigation: An expert-annotated dataset. CoRR, abs/2311.09443.
- mGPT: Few-Shot Learners Go Multilingual. Transactions of the Association for Computational Linguistics, 12:58–79.
- Detoxify language model step-by-step. CoRR, abs/2308.08295.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Challenges for toxic comment classification: An in-depth error analysis. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2), pages 33–42, Brussels, Belgium. Association for Computational Linguistics.
- Euphemistic abuse – a new dataset and classification experiments for implicitly abusive language. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16280–16297, Singapore. Association for Computational Linguistics.
- Titus Wormer. 2022. Cuss: Map of profanities, slurs, and obscenities to a sureness rating. https://github.com/words/cuss. Accessed: 2023-12-14.
- Daryna Dementieva (20 papers)
- Nikolay Babakov (13 papers)
- Alexander Panchenko (92 papers)