Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes into account its changing nature. We introduce Goodtriever, a flexible methodology that matches the current state-of-the-art toxicity mitigation while achieving 43% relative latency reduction during inference and being more computationally efficient. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation. Our research advocates for an increased focus on adaptable mitigation techniques, which better reflect the data drift models face when deployed in the wild. Code and data are available at https://github.com/for-ai/goodtriever.
- Intriguing properties of quantization at scale.
- Neuro-symbolic language modeling with automaton-augmented retrieval. In International Conference on Machine Learning, pages 468–485. PMLR.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623.
- Adaptation approaches for nearest neighbor language models. arXiv preprint arXiv:2211.07828.
- Pythia: A suite for analyzing large language models across training and scaling. arXiv preprint arXiv:2304.01373.
- Improving language models by retrieving from trillions of tokens. In International conference on machine learning, pages 2206–2240. PMLR.
- Nuanced metrics for measuring unintended bias with real data for text classification. In Companion Proceedings of The 2019 World Wide Web Conference.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Tessa ES Charlesworth and Mahzarin R Banaji. 2022. Patterns of implicit and explicit stereotypes iii: Long-term change in gender stereotypes. Social Psychological and Personality Science, 13(1):14–26.
- Text detoxification using large pre-trained neural models. arXiv preprint arXiv:2109.08914.
- Proceedings of the 1st workshop on semiparametric methods in nlp: Decoupling logic from knowledge. In Proceedings of the 1st Workshop on Semiparametric Methods in NLP: Decoupling Logic from Knowledge.
- Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164.
- Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335.
- You can’t pick your neighbors, or can you? when and how to rely on retrieval in the k𝑘kitalic_k nn-lm. arXiv preprint arXiv:2210.15859.
- Hierarchical neural story generation. arXiv preprint arXiv:1805.04833.
- SimCSE: Simple contrastive learning of sentence embeddings. In Empirical Methods in Natural Language Processing (EMNLP).
- Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462.
- This prompt is measuring< mask>: Evaluating bias evaluation in language models. arXiv preprint arXiv:2305.12757.
- An empirical investigation of catastrophic forgetting in gradient-based neural networks. arXiv preprint arXiv:1312.6211.
- Don’t stop pretraining: Adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964.
- Retrieval augmented language model pre-training. In International conference on machine learning, pages 3929–3938. PMLR.
- Detoxifying text with marco: Controllable revision with experts and anti-experts. arXiv preprint arXiv:2212.10543.
- Geoffrey E Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural computation, 14(8):1771–1800.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751.
- k𝑘kitalic_k nn-adapter: Efficient domain adaptation for black-box language models. arXiv preprint arXiv:2302.10879.
- Gautier Izacard and Edouard Grave. 2020. Leveraging passage retrieval with generative models for open domain question answering. arXiv preprint arXiv:2007.01282.
- Few-shot learning with retrieval augmented language models. arXiv preprint arXiv:2208.03299.
- Towards continual knowledge learning of language models. arXiv preprint arXiv:2110.03215.
- Learning kernel-smoothed machine translation with retrieved examples. arXiv preprint arXiv:2109.09991.
- Lifelong pretraining: Continually adapting language models to emerging corpora. arXiv preprint arXiv:2110.08534.
- Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
- Nearest neighbor machine translation. arXiv preprint arXiv:2010.00710.
- Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR.
- Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
- Gedi: Generative discriminator guided sequence generation. arXiv preprint arXiv:2009.06367.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
- A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055.
- Dexperts: Decoding-time controlled text generation with experts and anti-experts. arXiv preprint arXiv:2105.03023.
- Summary of chatgpt/gpt-4 research and perspective towards the future of large language models. arXiv preprint arXiv:2304.01852.
- Esther Lopez-Zafra and Rocio Garcia-Retamero. 2021. Are gender stereotypes changing over time? a cross-temporal analysis of perceptions about gender stereotypes in spain (?‘ están cambiando los estereotipos de género con el tiempo? un análisis transtemporal de las percepciones sobre los estereotipos de género en españa). International Journal of Social Psychology, 36(2):330–354.
- Fast nearest neighbor machine translation. arXiv preprint arXiv:2105.14528.
- Nonparametric masked language modeling. arXiv preprint arXiv:2212.01349.
- OpenAI. 2022. Introducing chatgpt. https://openai.com/blog/chatgpt. Accessed: 2023-06-13.
- Semiparametric language models are scalable continual learners. arXiv preprint arXiv:2303.01421.
- On the challenges of using black-box apis for toxicity evaluation in research. arXiv preprint arXiv:2304.12397.
- Lifelong learning of hate speech classification on social media. arXiv preprint arXiv:2106.02821.
- Elle: Efficient lifelong pre-training for emerging data. arXiv preprint arXiv:2203.06311.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446.
- The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.
- Catriona Silvey. 2016. Speaking our minds: Why human communication is different, and how language evolved to make it special, by thom scott-phillips.
- Augmenting self-attention with persistent memory. arXiv preprint arXiv:1907.01470.
- Lamol: Language modeling for lifelong language learning. arXiv preprint arXiv:1909.03329.
- Exploring the limits of domain-adaptive training for detoxifying large-scale language models. arXiv preprint arXiv:2202.04173.
- Challenges in detoxifying language models. arXiv preprint arXiv:2109.07445.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771.
- Memorizing transformers. arXiv preprint arXiv:2203.08913.
- Detoxifying language models risks marginalizing minority voices. arXiv preprint arXiv:2104.06390.
- Unified detoxifying and debiasing in language generation via inference-time adaptive optimization. arXiv preprint arXiv:2210.04492.
- Adaptive semiparametric language models. Transactions of the Association for Computational Linguistics, 9:362–373.
- A survey of controllable text generation using transformer-based pre-trained language models. arXiv preprint arXiv:2201.05337.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Is neural topic modelling better than clustering? an empirical study on clustering with contextual embeddings for topics. arXiv preprint arXiv:2204.09874.
- Training language models with memory augmentation. arXiv preprint arXiv:2205.12674.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.