Red Teaming Language Models for Processing Contradictory Dialogues (2405.10128v3)
Abstract: Most LLMs currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understanding of contradictions often necessitate detailed explanations. We develop a dataset comprising contradictory dialogues, in which one side of the conversation contradicts itself. Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. The framework detects and attempts to explain the dialogue, then modifies the existing contradictory content using the explanation. Our experiments demonstrate that the framework improves the ability to detect contradictory dialogues and provides valid explanations. Additionally, it showcases distinct capabilities for modifying such dialogues. Our study highlights the importance of the logical inconsistency problem in conversational AI.
- Do language models know when they’re hallucinating references? arXiv preprint arXiv:2305.18248.
- Amos Azaria and Tom Mitchell. 2023. The internal state of an llm knows when its lying. arXiv preprint arXiv:2304.13734.
- Rishabh Bhardwaj and Soujanya Poria. 2023. Red-teaming large language models using chain of utterances for safety-alignment. CoRR, abs/2308.09662.
- A survey on evaluation of large language models. CoRR, abs/2307.03109.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Scaling instruction-finetuned language models. CoRR, abs/2210.11416.
- Selection-inference: Exploiting large language models for interpretable logical reasoning. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Elastic weight removal for faithful and abstractive dialogue generation. arXiv preprint arXiv:2303.17574.
- Finding contradictions in text. In Proceedings of ACL-08: HLT, pages 1039–1047, Columbus, Ohio. Association for Computational Linguistics.
- Recent advances towards safe, responsible, and moral dialogue systems: A survey. arXiv preprint arXiv:2302.09270.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics.
- Wizard of wikipedia: Knowledge-powered conversational agents. In International Conference on Learning Representations.
- B.H. Dowden. 2017. Logical Reasoning. Open textbook library. Philosophy Department.
- Evaluating coherence in dialogue systems using entailment. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 3806–3812, Minneapolis, Minnesota. Association for Computational Linguistics.
- FaithDial: A faithful benchmark for information-seeking dialogue. Transactions of the Association for Computational Linguistics, 10:1473–1490.
- Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv preprint arXiv:2209.07858.
- q2superscript𝑞2q^{2}italic_q start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT: Evaluating factual consistency in knowledge-grounded dialogues via question generation and question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 7856–7870, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Meta-learning the difference: Preparing large language models for efficient adaptation. Transactions of the Association for Computational Linguistics, 10:1249–1265.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- Mistral 7b. CoRR, abs/2310.06825.
- Learning to improve persona consistency in multi-party dialogue generation via text knowledge enhancement. In Proceedings of the 29th International Conference on Computational Linguistics, pages 298–309, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221.
- Large language models struggle to learn long-tail knowledge. In International Conference on Machine Learning, pages 15696–15707. PMLR.
- Scaling laws for neural language models. CoRR, abs/2001.08361.
- Few-shot reranking for multi-hop QA via language model prompting. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15882–15897, Toronto, Canada. Association for Computational Linguistics.
- Will I sound like me? improving persona consistency in dialogues through pragmatic self-consciousness. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 904–916, Online. Association for Computational Linguistics.
- ProsocialDialog: A prosocial backbone for conversational agents. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 4005–4029, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- (QA)2: Question answering with questionable assumptions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8466–8487, Toronto, Canada. Association for Computational Linguistics.
- Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4715–4728, Online. Association for Computational Linguistics.
- Mitigating contradictions in dialogue based on contrastive learning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 2781–2788, Dublin, Ireland. Association for Computational Linguistics.
- DailyDialog: A manually labelled multi-turn dialogue dataset. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 986–995, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Chatgpt as a factual inconsistency evaluator for abstractive text summarization. arXiv preprint arXiv:2303.15621.
- Personalizing dialogue agents via meta-learning. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5454–5459, Florence, Italy. Association for Computational Linguistics.
- Bernardo Magnini and Elena Cabrio. 2010. Contradiction-focused qualitative evaluation of textual entailment. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pages 86–94, Uppsala, Sweden. University of Antwerp.
- SimVecs: Similarity-based vectors for utterance representation in conversational AI systems. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 708–717, Hong Kong, China. Association for Computational Linguistics.
- Gary Marcus. 2018. Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.
- On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1906–1919, Online. Association for Computational Linguistics.
- Reducing conversational agents’ overconfidence through linguistic calibration. Transactions of the Association for Computational Linguistics, 10:857–872.
- Self-contradictory hallucinations of large language models: Evaluation, detection and mitigation. CoRR, abs/2305.15852.
- Clifford Nass and Youngme Moon. 2000. Machines and mindlessness: Social responses to computers. Journal of social issues, 56(1):81–103.
- Recent advances in deep learning based dialogue systems: a systematic survey. Artif. Intell. Rev., 56(4):3055–3155.
- I like fish, especially dolphins: Addressing contradictions in dialogue modeling. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1699–1713, Online. Association for Computational Linguistics.
- Training language models to follow instructions with human feedback. In NeurIPS.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
- Red teaming language models with language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 3419–3448, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Cold decoding: Energy-based constrained text generation with langevin dynamics. In Advances in Neural Information Processing Systems, volume 35, pages 9538–9551. Curran Associates, Inc.
- Red-teaming large language models.
- Recipes for building an open-domain chatbot. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 300–325, Online. Association for Computational Linguistics.
- APOLLO: A simple approach for adaptive pretraining of language models for logical reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6308–6321, Toronto, Canada. Association for Computational Linguistics.
- Red teaming language model detectors with language models. CoRR, abs/2305.19713.
- Language models that seek for knowledge: Modular search & generation for dialogue and prompt completion. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 373–393, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288.
- Mina Valizadeh and Natalie Parde. 2022. The AI doctor is in: A survey of task-oriented dialogue systems for healthcare applications. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6638–6660, Dublin, Ireland. Association for Computational Linguistics.
- Universal adversarial triggers for attacking and analyzing NLP. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2153–2162, Hong Kong, China. Association for Computational Linguistics.
- PINTO: faithful language reasoning using prompt-generated rationales. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS.
- Neural text generation with unlikelihood training. In International Conference on Learning Representations.
- Dialogue natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3731–3741, Florence, Italy. Association for Computational Linguistics.
- Sequential topic selection model with latent variable for topic-grounded dialogue. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1209–1219, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- TOD-BERT: Pre-trained natural language understanding for task-oriented dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 917–929, Online. Association for Computational Linguistics.
- Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658.
- Bartscore: Evaluating generated text as text generation. In Advances in Neural Information Processing Systems, volume 34, pages 27263–27277. Curran Associates, Inc.
- Bertscore: Evaluating text generation with BERT. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net.
- Dialoglm: Pre-trained model for long dialogue understanding and summarization. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022, pages 11765–11773. AAAI Press.
- Towards topic-guided conversational recommender system. In Proceedings of the 28th International Conference on Computational Linguistics, pages 4128–4139, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Learning dialogue representations from consecutive utterances. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 754–768, Seattle, United States. Association for Computational Linguistics.