PREADD: Prefix-Adaptive Decoding for Controlled Text Generation (2307.03214v1)
Abstract: We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated using a prefix-prepended prompt, enabling both positive and negative control with respect to any attribute encapsulated by the prefix. We evaluate PREADD on three tasks -- toxic output mitigation, gender bias reduction, and sentiment control -- and find that PREADD outperforms not only prompting baselines, but also an auxiliary-expert control method, by 12% or more in relative gain on our main metrics for each task.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. ArXiv, abs/2204.05862.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA. Association for Computing Machinery.
- GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. If you use this software, please cite it using these metadata.
- Language (technology) is power: A critical survey of" bias" in nlp. arXiv preprint arXiv:2005.14050.
- Language models are few-shot learners. CoRR, abs/2005.14165.
- Plug and play language models: A simple approach to controlled text generation. CoRR, abs/1912.02164.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805.
- Bert: Pre-training of deep bidirectional transformers for language understanding. ArXiv, abs/1810.04805.
- Latent hatred: A benchmark for understanding implicit hate speech. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 345–363, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Cristina Garbacea and Qiaozhu Mei. 2022. Why is constrained neural language generation particularly challenging? ArXiv, abs/2206.05395.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
- More than a feeling: Accuracy and application of sentiment analysis. International Journal of Research in Marketing, 40(1):75–87.
- Abigail Z Jacobs and Hanna Wallach. 2021. Measurement and fairness. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 375–385.
- Google Jigsaw. 2017. Perspective api.
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. CoRR, abs/1412.6980.
- GeDi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Contrastive decoding: Open-ended text generation as optimization. arXiv preprint arXiv:2210.15097.
- On-the-fly controlled text generation with experts and anti-experts. CoRR, abs/2105.03023.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA. Association for Computational Linguistics.
- Kris McGuffie and Alex Newhouse. 2020. The radicalization risks of GPT-3 and advanced neural language models. CoRR, abs/2009.06807.
- Hate speech detection and racial bias mitigation in social media based on bert model. PloS one, 15(8):e0237861.
- OpenAI. 2021. Chatgpt: Language model for conversational ai. https://openai.com/blog/chatgpt. [Online; accessed 2023-05-25].
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155.
- Pytorch: An imperative style, high-performance deep learning library.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
- Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in NLP. Transactions of the Association for Computational Linguistics, 9:1408–1424.
- The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3407–3412, Hong Kong, China. Association for Computational Linguistics.
- AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4222–4235, Online. Association for Computational Linguistics.
- Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
- Chain-of-thought prompting elicits reasoning in large language models.
- Challenges in detoxifying language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2447–2469, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Huggingface’s transformers: State-of-the-art natural language processing.
- Detoxifying language models risks marginalizing minority voices. ArXiv, abs/2104.06390.
- Kevin Yang and Dan Klein. 2021. FUDGE: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3511–3535, Online. Association for Computational Linguistics.
- Doc: Improving long story coherence with detailed outline control. arXiv preprint arXiv:2212.10077.
- Opt: Open pre-trained transformer language models.
- Gender bias in coreference resolution: Evaluation and debiasing methods. CoRR, abs/1804.06876.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
- Controllable generation from pre-trained language models via inverse prompting. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, KDD ’21, page 2450–2460, New York, NY, USA. Association for Computing Machinery.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.