Inconsistent dialogue responses and how to recover from them (2401.10353v1)
Abstract: One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recovery utterances are authored by annotators. This covers the life span of inconsistencies, namely introduction, understanding, and resolution. Building on this, we introduce a set of tasks centered on dialogue consistency, specifically focused on its detection and resolution. Our experimental findings indicate that our dataset significantly helps the progress in identifying and resolving conversational inconsistencies, and current popular LLMs like ChatGPT which are good at resolving inconsistencies however still struggle with detection.
- Evaluating coherence in dialogue systems using entailment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3806–3812, Minneapolis, Minnesota. Association for Computational Linguistics.
- Abg-CoQA: Clarifying ambiguity in conversational question answering.
- The curious case of neural text degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- OCNLI: Original Chinese Natural Language Inference. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3512–3526, Online. Association for Computational Linguistics.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12):1–38.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526.
- Factuality enhanced language models for open-ended text generation. Advances in Neural Information Processing Systems, 35:34586–34599.
- BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
- Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4715–4728, Online. Association for Computational Linguistics.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Roberta: A robustly optimized bert pretraining approach. ArXiv preprint, abs/1907.11692.
- Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
- Self-contradictory hallucinations of large language models: Evaluation, detection and mitigation. ArXiv preprint, abs/2305.15852.
- I like fish, especially dolphins: Addressing contradictions in dialogue modeling. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1699–1713, Online. Association for Computational Linguistics.
- OpenAI. 2023. Gpt-4 technical report. ArXiv preprint, abs/2303.08774.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Don’t be contradicted with anything! CI-ToD: Towards benchmarking consistency for task-oriented dialogue system. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2357–2367, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21:140:1–140:67.
- Increasing faithfulness in knowledge-grounded dialogue with controllable features. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 704–718, Online. Association for Computational Linguistics.
- Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325, Online. Association for Computational Linguistics.
- ConjNLI: Natural language inference over conjunctive sentences. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8240–8252, Online. Association for Computational Linguistics.
- Rome was built in 1776: A case study on factual correctness in knowledge-grounded response generation. ArXiv preprint, abs/2110.05456.
- Cpt: A pre-trained unbalanced transformer for both chinese language understanding and generation. ArXiv preprint, abs/2109.05729.
- Am I me or you? state-of-the-art dialogue models cannot maintain an identity. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2367–2387, Seattle, United States. Association for Computational Linguistics.
- Yixuan Su and Nigel Collier. 2022. Contrastive search is what you need for neural text generation. ArXiv preprint, abs/2210.14140.
- Naturalconv: A chinese dialogue dataset towards multi-turn topic-driven conversation. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 14006–14014. AAAI Press.
- A large-scale chinese short-text conversation dataset. In Natural Language Processing and Chinese Computing: 9th CCF International Conference, NLPCC 2020, Zhengzhou, China, October 14–18, 2020, Proceedings, Part I 9, pages 91–103. Springer.
- Neural text generation with unlikelihood training. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Dialogue natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3731–3741, Florence, Italy. Association for Computational Linguistics.
- Chatgpt or grammarly? evaluating chatgpt on grammatical error correction benchmark. ArXiv preprint, abs/2303.13648.
- Generating clarifying questions for information retrieval. In Proceedings of The Web Conference 2020, WWW ’20, pages 418–428, New York, NY, USA. Association for Computing Machinery.
- Safeconv: Explaining and correcting conversational unsafe behavior. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 22–35.
- Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213, Melbourne, Australia. Association for Computational Linguistics.
- UER: An open-source toolkit for pre-training models. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, pages 241–246, Hong Kong, China. Association for Computational Linguistics.
- CDConv: A benchmark for contradiction detection in Chinese conversations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 18–29, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.