Building Persona Consistent Dialogue Agents with Offline Reinforcement Learning (2310.10735v1)
Abstract: Maintaining a consistent persona is a key quality for any open domain dialogue system. Current state-of-the-art systems do this by training agents with supervised learning or online reinforcement learning (RL). However, systems trained with supervised learning often lack consistency as they are never punished for uttering contradictions. Additional training with RL can alleviate some of these issues, however the training process is expensive. Instead, we propose an offline RL framework to improve the persona consistency of dialogue systems. Our framework allows us to combine the advantages of previous methods as we can inexpensively train our model on existing data as in supervised learning, while punishing and rewarding specific utterances as in RL. We also introduce a simple importance sampling method to reduce the variance of importance weights in offline RL training which we call Variance-Reducing MLE-Initialized (VaRMI) importance sampling. Our automatic and human evaluations show that our framework improves both the persona consistency and dialogue quality of a state-of-the-art social chatbot.
- MultiWOZ - a large-scale multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 5016–5026, Brussels, Belgium. Association for Computational Linguistics.
- Michael C. Frank and Noah D. Goodman. 2012. Predicting pragmatic reasoning in language games. Science, 336(6084):998–998.
- Deep reinforcement learning and the deadly triad.
- Gpt-critic: Offline reinforcement learning for end-to-end task-oriented dialogue systems. In International Conference on Learning Representations (ICLR).
- Human-centric dialog training via offline reinforcement learning. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3985–4003, Online. Association for Computational Linguistics.
- Will I sound like me? improving persona consistency in dialogues through pragmatic self-consciousness. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 904–916, Online. Association for Computational Linguistics.
- Importance of search and evaluation strategies in neural dialogue modeling. In Proceedings of the 12th International Conference on Natural Language Generation, pages 76–87, Tokyo, Japan. Association for Computational Linguistics.
- Offline reinforcement learning: Tutorial, review, and perspectives on open problems. ArXiv, abs/2005.01643.
- Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4715–4728, Online. Association for Computational Linguistics.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097.
- You impress me: Dialogue generation via mutual persona perception. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1417–1427, Online. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- ParlAI: A dialog research software platform. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 79–84, Copenhagen, Denmark. Association for Computational Linguistics.
- Safe and efficient off-policy reinforcement learning.
- Text generation by learning from demonstrations. In International Conference on Learning Representations.
- Eligibility traces for off-policy policy evaluation. In International Conference on Machine Learning.
- Recipes for building an open-domain chatbot.
- A deep reinforcement learning chatbot.
- Refine and imitate: Reducing repetition and inconsistency in persuasion dialogues via reinforcement learning and human demonstration. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3478–3492, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage.
- Context-aware language modeling for goal-oriented dialogue systems. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2351–2366, Seattle, United States. Association for Computational Linguistics.
- Offline RL for natural language generation with implicit language q learning. In The Eleventh International Conference on Learning Representations.
- Generate, delete and rewrite: A three-stage framework for improving persona consistency of dialogue generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5821–5831, Online. Association for Computational Linguistics.
- Exploiting persona information for diverse generation of conversational responses. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19, pages 5190–5196. International Joint Conferences on Artificial Intelligence Organization.
- Generating persona consistent dialogues by exploiting natural language inference. In AAAI Conference on Artificial Intelligence.
- Lamda: Language models for dialog applications.
- CHAI: A CHatbot AI for task-oriented dialogue with offline reinforcement learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4471–4491, Seattle, United States. Association for Computational Linguistics.
- Neural text generation with unlikelihood training.
- Dialogue natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3731–3741, Florence, Italy. Association for Computational Linguistics.
- A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122. Association for Computational Linguistics.
- Ronald J. Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8(3–4):229–256.
- Behavior regularized offline reinforcement learning.
- DeepCopy: Grounded response generation with hierarchical pointer networks. In Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue, pages 122–132, Stockholm, Sweden. Association for Computational Linguistics.
- Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213, Melbourne, Australia. Association for Computational Linguistics.