Dynamic Multi-Reward Weighting for Multi-Style Controllable Generation (2402.14146v3)
Abstract: Textual style expresses a diverse set of information, including interpersonal dynamics (e.g., formality) and the author's emotions or attitudes (e.g., disgust). An open question is how LLMs can be explicitly controlled so that they weave together target styles when generating text: for example, to produce text that is both negative and non-toxic. One approach to such controlled generation is multi-objective reinforcement learning (RL), but how best to combine multiple objectives in a reward function is an open question. In this paper, we investigate various formulations of multi-style rewards, including calibrated outputs from discriminators and dynamic weighting by discriminator gradient magnitudes. We find that our proposed dynamic weighting outperforms static weighting approaches with respect to style control while maintaining linguistic quality, and we explore its effectiveness in 2- and 3-style control.
- Tweeteval: Unified benchmark and comparative evaluation for tweet classification. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1644–1650.
- Analyzing the performance of gpt-3.5 and gpt-4 in grammatical error correction. arXiv preprint arXiv:2303.14342.
- Balancing effect of training dataset distribution of multiple styles for multi-style text transfer.
- Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations.
- GoEmotions: A Dataset of Fine-Grained Emotions. In 58th Annual Meeting of the Association for Computational Linguistics (ACL).
- Reinforcement learning based text style transfer without parallel training corpus.
- On calibration of modern neural networks. In International conference on machine learning, pages 1321–1330. PMLR.
- Eduard H Hovy. 1995. The multifunctionality of discourse markers. In Proceedings of the Workshop on Discourse Markers, Egmond-aan-Zee, The Netherlands.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- Google Jigsaw. Jigsaw toxicity dataset.
- Deep learning for text style transfer: A survey. Computational Linguistics, 48(1):155–205.
- Dongyeop Kang and Eduard Hovy. 2021. Style is not a single variable: Case studies for cross-stylistic language understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2376–2387.
- Ctrl: A conditional transformer language model for controllable generation.
- Gedi: Generative discriminator guided sequence generation.
- Gedi: Generative discriminator guided sequence generation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4929–4952.
- Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, Online. Association for Computational Linguistics.
- Dexperts: Decoding-time controlled text generation with experts and anti-experts. arXiv preprint arXiv:2105.03023.
- Multi-attribute controlled text generation with contrastive-generator and external-discriminator. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5904–5913, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- A dual reinforcement learning framework for unsupervised text style transfer.
- Peft: State-of-the-art parameter-efficient fine-tuning methods. https://github.com/huggingface/peft.
- Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the ACL.
- Controllable natural language generation with contrastive prefixes.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290.
- Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards. arXiv preprint arXiv:2306.04488.
- Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards.
- Sudha Rao and Joel Tetreault. 2018. Dear sir or madam, may i introduce the gyafc dataset: Corpus, benchmarks and metrics for formality style transfer. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 129–140.
- Semeval-2017 task 4: Sentiment analysis in twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pages 502–518.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Seattle, Washington, USA. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models.
- Efficient reinforcement learning for unsupervised controlled text generation.
- Semeval-2018 task 3: Irony detection in english tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pages 39–50.
- Evaluate & evaluation on the hub: Better best practices for data and model measurements. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 128–136, Abu Dhabi, UAE. Association for Computational Linguistics.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
- Wikimedia. Wikimedia downloads.
- Fine-grained human feedback gives better rewards for language model training.
- Attribute alignment: Controlling text generation from pre-trained language models.
- Fine-tuning language models from human preferences.
- Ryan Koo (6 papers)
- Dongyeop Kang (72 papers)
- Karin De Langis (10 papers)