Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

41 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

302

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models (2403.03893v3)

Published 6 Mar 2024 in cs.CL and cs.AI

Abstract: To date, toxicity mitigation in LLMs has almost entirely been focused on single-language settings. As LLMs embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.

PDF HTML Abstract

Exploring Multilingual Toxicity Mitigation in LLMs

Introduction

The rapid adoption of LLMs across various applications has illuminated the profound impact of their multilingual capabilities. However, this linguistically diverse application of LLMs amplifies the need for robust toxicity mitigation techniques that transcend the English language to ensure global usability and safety. The research presented explores the complexities of implementing such multilingual toxicity mitigation techniques. It evaluates the effectiveness of translated data versus in-language data, compares retrieval-augmented techniques against finetuning approaches, and investigates the scalability and versatility of these mitigation strategies across multiple languages.

Mitigation Techniques

The two primary toxicity mitigation techniques examined are DExperts, a finetuning-based method, and Goodtriever, a retrieval-augmented approach. Both techniques utilize a baseline mGPT model, varying in size from 1.3B to 13B parameters, and are tested against a spectrum of nine languages. This broad linguistic range includes high-resource languages (e.g., English, Russian, Italian, French, Portuguese, and Spanish) and mid-resource languages (e.g., Arabic, Hindi, Korean), spread across five distinct scripts.

Datasets and Evaluation

The paper extends established datasets by incorporating translated variants, aiming to address the gap left by the scarcity of in-language toxicity annotation for many languages. For evaluation, the research employs a set of standardized prompts derived from the HolisticBias dataset, translated into the languages of interest. This aids in the consistent assessment of toxicity across languages despite inherent challenges such as cultural nuances and translation inaccuracies.

Findings

A key finding is the surprising efficacy of translated data in reducing toxicity, often surpassing the results of in-language datasets. This phenomenon is observed across both high and mid-resource languages, suggesting that despite potential losses in translation, the core toxicological cues are preserved and effectively mitigated. Further, the retrieval-based Goodtriever method consistently outperforms the finetuning-based DExperts, especially in scenarios involving mid-resource languages or more complex multilingual settings.

Future Directions

The paper sheds light on the importance of continually evolving toxicity mitigation techniques to accommodate the dynamic nature of language and the diversifying spectrum of users engaging with LLMs. It underscores the need for further research into developing more nuanced and culturally sensitive evaluation frameworks that better reflect the multilingual and multicultural reality of global LLM deployment.

Implications

This research marks a pivotal step towards understanding and implementing multilingual toxicity mitigation in LLMs. It paves the way for future explorations into scalable, effective methods that ensure safer, more inclusive language technologies. By demonstrating the potential of both translated data and retrieval-augmented techniques, the paper offers valuable insights for developers and researchers aiming to enhance the global usability and safety of LLMs.

PDF Markdown Bookmark Chat (Pro)

References (63)

Authors (4)

Luiza Pozzobon (5 papers)
Patrick Lewis (37 papers)
Sara Hooker (71 papers)
Beyza Ermis (31 papers)

Citations (6)

View on Semantic Scholar

Tweets

https://twitter.com/CohereForAI/status/1791163081919447372

https://twitter.com/luizapzbn/status/1765698675777642869

https://twitter.com/sarahookr/status/1765821002422907083

https://twitter.com/CohereForAI/status/1774905489241124894

https://twitter.com/CohereForAI/status/1765822326799511584

https://twitter.com/sarahookr/status/1846436706028835141