Language Detoxification with Attribute-Discriminative Latent Space (2210.10329v2)
Abstract: Transformer-based LLMs (LMs) have achieved impressive results on natural language understanding tasks, but they can also generate toxic text such as insults, threats, and profanity, limiting their real-world applications. To overcome this issue, a few text generation approaches aim to detoxify toxic texts using additional LMs or perturbations. However, previous methods require excessive memory, computations, and time which are serious bottlenecks in their real-world application. To address such limitations, we propose an effective yet efficient method for language detoxification using an attribute-discriminative latent space. Specifically, we project the latent space of an original Transformer LM onto a discriminative latent space that well-separates texts by their attributes using a projection block and an attribute discriminator. This allows the LM to control the text generation to be non-toxic with minimal memory and computation overhead. We validate our model, Attribute-Discriminative LLM (ADLM) on detoxified language and dialogue generation tasks, on which our method significantly outperforms baselines both in performance and efficiency.
- Just say no: Analyzing the stance of neural dialogue generation in offensive contexts. Conference on Empirical Methods in Natural Language Processing.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. Association for Computing Machinery.
- A neural probabilistic language model. The journal of machine learning research.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems.
- Cocon: A self-supervised approach for controlled text generation. International Conference on Learning Representations.
- Eric Chu and Peter Liu. 2019. MeanSum: A neural model for unsupervised multi-document abstractive summarization. In Proceedings of the 36th International Conference on Machine Learning.
- ELECTRA: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.
- Transformer-XL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.
- Text detoxification using large pre-trained neural models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
- Plug and play language models: A simple approach to controlled text generation. International Conference on Learning Representations.
- Bert: Pre-training of deep bidirectional transformers for language understanding. Annual Conference of the North American Chapter of the Association for Computational Linguistics.
- All NLP tasks are generation tasks: A general pretraining framework. CoRR.
- Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers).
- The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027.
- Realtoxicityprompts: Evaluating neural toxic degeneration in language models. Conference on Empirical Methods in Natural Language Processing.
- Aaron Gokaslan and Vanya Cohen. 2019. Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus.
- Investigating African-American Vernacular English in transformer-based text generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP).
- A knowledge-enhanced pretraining model for commonsense story generation. Transactions of the Association for Computational Linguistics.
- Don’t stop pretraining: adapt language models to domains and tasks. Annual Conference of the Association for Computational Linguistics.
- Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning.
- Ctrl: A conditional transformer language model for controllable generation. arXiv preprint arXiv:1909.05858.
- A distributional approach to controlled text generation. In International Conference on Learning Representations.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences.
- Reformer: The efficient transformer. In International Conference on Learning Representations.
- GeDi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations.
- Optimus: Organizing sentences via pre-trained modeling of a latent space. In EMNLP.
- A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- DExperts: Decoding-time controlled text generation with experts and anti-experts. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.
- Modulating language models with emotions. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Alexandra Luccioni and Joseph Viviano. 2021. What’s in the box? an analysis of undesirable content in the Common Crawl corpus. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers).
- Recurrent neural network based language model. In Interspeech.
- Language models are unsupervised multitask learners. OpenAI blog.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108.
- The woman worked as a babysitter: On biases in language generation. arXiv preprint arXiv:1909.01326.
- Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing.
- On the safety of conversational models: Taxonomy, dataset, and benchmark. In Findings of the Association for Computational Linguistics: ACL 2022.
- Attention is all you need. In Advances in neural information processing systems.
- Universal adversarial triggers for attacking and analyzing nlp. In EMNLP.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations.
- Detoxifying language models risks marginalizing minority voices. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Kevin Yang and Dan Klein. 2021. FUDGE: Controlled text generation with future discriminators. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
- Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems.
- Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence, volume 31.
- Predicting the type and target of offensive posts in social media. arXiv preprint arXiv:1902.09666.
- Dialogpt: Large-scale generative pre-training for conversational response generation. Annual Conference of the Association for Computational Linguistics system demonstration.
- Challenges in automated debiasing for toxic language detection. arXiv preprint arXiv:2102.00086.
- Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593.
- Jin Myung Kwak (3 papers)
- Minseon Kim (18 papers)
- Sung Ju Hwang (178 papers)