Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models (2310.07589v1)

Published 11 Oct 2023 in cs.AI

Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes into account its changing nature. We introduce Goodtriever, a flexible methodology that matches the current state-of-the-art toxicity mitigation while achieving 43% relative latency reduction during inference and being more computationally efficient. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation. Our research advocates for an increased focus on adaptable mitigation techniques, which better reflect the data drift models face when deployed in the wild. Code and data are available at https://github.com/for-ai/goodtriever.

Overview of "Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models"

The paper "Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models" by Luiza Pozzobon et al. presents a novel approach to addressing toxicity in large-scale LLMs (LMs). The researchers introduce "Goodtriever," a flexible and efficient methodology that combines state-of-the-art toxicity mitigation with significantly reduced latency and computational resources compared to existing methods.

Key Contributions

  1. Adaptive Mitigation with Retrieval-Augmented Models: Goodtriever introduces a retrieval-based mechanism at decoding time to control the generation of toxic content. This method involves the integration of external datastores containing toxic and non-toxic examples to enhance the ability of LMs to generate controlled text.
  2. Performance Efficiency: The approach achieves a 43% reduction in inference latency compared to existing state-of-the-art methods, without sacrificing toxicity mitigation performance. This is particularly beneficial for real-time applications where response time is critical.
  3. Model Flexibility Across Size and Family: Goodtriever is demonstrated to be effective across multiple LM architectures, including GPT2, Pythia, and OPT, highlighting its versatility. Notably, it preserves mitigation effectiveness even as the base model size scales from 124 million to 6.9 billion parameters.
  4. Continual Learning and Domain Adaptivity: The methodology is tested for continual toxicity mitigation, showing competitive performance in adapting to new toxicity sources over time without requiring retraining across all historical data. This feature aligns with the evolving nature of language and toxic expressions.

Methodological Implementation

Goodtriever's innovation lies in the combination of retrieval mechanisms with a Product of Experts (PoE) framework to adjust the probabilistic predictions of an LM. At inference time, the LM consults two datastores—one toxic and one non-toxic—for contextually similar examples, effectively guiding the model's generative process towards less toxic outputs. This allows for immediate incorporation of new knowledge and dynamic responses to data drift encountered in real-world scenarios.

Results and Insights

The paper presents extensive evaluations across several datasets and model configurations:

  • Toxicity Mitigation: Goodtriever achieves comparable results to previous methods in terms of Expected Maximum Toxicity (EMT) and Toxicity Probability while maintaining coherence and diversity in outputs.
  • Inference Efficiency: Goodtriever demonstrates a significant reduction in computational cost and inference time, which is corroborated by experiments indicating lower memory and processing demands.
  • Robust Performance Across Domains: In tests of continual learning, Goodtriever adapts flexibly to new domains of toxicity while preserving prior knowledge, matching the multitask finetuning baselines in mitigating newly encountered toxic content.
  • Varying Model Parameters: The framework’s effectiveness does not exhibit significant variance across different model sizes and families, implying robust generalization and applicability.

Implications and Future Directions

The research underscores the practicality of integrating retrieval-augmented mechanisms in toxicity mitigation strategies. By leveraging datastores that reflect the fluidity of human language and societal norms, Goodtriever provides a framework for adaptable, low-latency interventions in deployed LMs.

Future work could explore multilingual and cross-cultural applications, as well as extending these adaptive methodologies to broader ethical and bias mitigation scenarios. Moreover, investigating the dynamic management of datastore content could further enhance the adaptability and effectiveness of such systems.

In summary, Goodtriever presents a compelling balance between performance efficiency and mitigation of harmful outputs, offering a promising direction for deploying socially responsible LLMs in diverse real-world environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Intriguing properties of quantization at scale.
  2. Neuro-symbolic language modeling with automaton-augmented retrieval. In International Conference on Machine Learning, pages 468–485. PMLR.
  3. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623.
  4. Adaptation approaches for nearest neighbor language models. arXiv preprint arXiv:2211.07828.
  5. Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373.
  6. Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
  7. Nuanced metrics for measuring unintended bias with real data for text classification. In Companion Proceedings of The 2019 World Wide Web Conference.
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  9. Tessa ES Charlesworth and Mahzarin R Banaji. 2022. Patterns of implicit and explicit stereotypes iii: Long-term change in gender stereotypes. Social Psychological and Personality Science, 13(1):14–26.
  10. Text detoxification using large pre-trained neural models. arXiv preprint arXiv:2109.08914.
  11. Proceedings of the 1st workshop on semiparametric methods in nlp: Decoupling logic from knowledge. In Proceedings of the 1st Workshop on Semiparametric Methods in NLP: Decoupling Logic from Knowledge.
  12. Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164.
  13. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335.
  14. You can’t pick your neighbors, or can you? when and how to rely on retrieval in the k𝑘kitalic_k nn-lm. arXiv preprint arXiv:2210.15859.
  15. Hierarchical neural story generation. arXiv preprint arXiv:1805.04833.
  16. SimCSE: Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP).
  17. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462.
  18. This prompt is measuring< mask>: Evaluating bias evaluation in language models. arXiv preprint arXiv:2305.12757.
  19. An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211.
  20. Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964.
  21. Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
  22. Detoxifying text with marco: Controllable revision with experts and anti-experts. arXiv preprint arXiv:2212.10543.
  23. Geoffrey E Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800.
  24. The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
  25. k𝑘kitalic_k nn-adapter: Efficient domain adaptation for black-box language models. arXiv preprint arXiv:2302.10879.
  26. Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282.
  27. Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
  28. Towards continual knowledge learning of language models. arXiv preprint arXiv:2110.03215.
  29. Learning kernel-smoothed machine translation with retrieved examples. arXiv preprint arXiv:2109.09991.
  30. Lifelong pretraining: Continually adapting language models to emerging corpora. arXiv preprint arXiv:2110.08534.
  31. Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
  32. Nearest neighbor machine translation. arXiv preprint arXiv:2010.00710.
  33. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
  34. Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR.
  35. Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
  36. Gedi: Generative discriminator guided sequence generation. arXiv preprint arXiv:2009.06367.
  37. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  38. A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055.
  39. Dexperts: Decoding-time controlled text generation with experts and anti-experts. arXiv preprint arXiv:2105.03023.
  40. Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
  41. Esther Lopez-Zafra and Rocio Garcia-Retamero. 2021. Are gender stereotypes changing over time? a cross-temporal analysis of perceptions about gender stereotypes in spain (?‘ están cambiando los estereotipos de género con el tiempo? un análisis transtemporal de las percepciones sobre los estereotipos de género en españa). International Journal of Social Psychology, 36(2):330–354.
  42. Fast nearest neighbor machine translation. arXiv preprint arXiv:2105.14528.
  43. Nonparametric masked language modeling. arXiv preprint arXiv:2212.01349.
  44. OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt. Accessed: 2023-06-13.
  45. Semiparametric language models are scalable continual learners. arXiv preprint arXiv:2303.01421.
  46. On the challenges of using black-box apis for toxicity evaluation in research. arXiv preprint arXiv:2304.12397.
  47. Lifelong learning of hate speech classification on social media. arXiv preprint arXiv:2106.02821.
  48. Elle: Efficient lifelong pre-training for emerging data. arXiv preprint arXiv:2203.06311.
  49. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  50. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
  51. The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.
  52. Catriona Silvey. 2016. Speaking our minds: Why human communication is different, and how language evolved to make it special, by thom scott-phillips.
  53. Augmenting self-attention with persistent memory. arXiv preprint arXiv:1907.01470.
  54. Lamol: Language modeling for lifelong language learning. arXiv preprint arXiv:1909.03329.
  55. Exploring the limits of domain-adaptive training for detoxifying large-scale language models. arXiv preprint arXiv:2202.04173.
  56. Challenges in detoxifying language models. arXiv preprint arXiv:2109.07445.
  57. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
  58. Memorizing transformers. arXiv preprint arXiv:2203.08913.
  59. Detoxifying language models risks marginalizing minority voices. arXiv preprint arXiv:2104.06390.
  60. Unified detoxifying and debiasing in language generation via inference-time adaptive optimization. arXiv preprint arXiv:2210.04492.
  61. Adaptive semiparametric language models. Transactions of the Association for Computational Linguistics, 9:362–373.
  62. A survey of controllable text generation using transformer-based pre-trained language models. arXiv preprint arXiv:2201.05337.
  63. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
  64. Is neural topic modelling better than clustering? an empirical study on clustering with contextual embeddings for topics. arXiv preprint arXiv:2204.09874.
  65. Training language models with memory augmentation. arXiv preprint arXiv:2205.12674.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Luiza Pozzobon (5 papers)
  2. Beyza Ermis (31 papers)
  3. Patrick Lewis (37 papers)
  4. Sara Hooker (71 papers)
Citations (18)
Youtube Logo Streamline Icon: https://streamlinehq.com