Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

$\textit{Dial BeInfo for Faithfulness}$: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning (2311.09800v2)

Published 16 Nov 2023 in cs.CL

Abstract: Factuality is a crucial requirement in information seeking dialogue: the system should respond to the user's queries so that the responses are meaningful and aligned with the knowledge provided to the system. However, most modern LLMs suffer from hallucinations, that is, they generate responses not supported by or contradicting the knowledge source. To mitigate the issue and increase faithfulness of information-seeking dialogue systems, we introduce BeInfo, a simple yet effective method that applies behavioural tuning to aid information-seeking dialogue. Relying on three standard datasets, we show that models tuned with BeInfo} become considerably more faithful to the knowledge source both for datasets and domains seen during BeInfo-tuning, as well as on unseen domains, when applied in a zero-shot manner. In addition, we show that the models with 3B parameters (e.g., Flan-T5) tuned with BeInfo demonstrate strong performance on data from real `production' conversations and outperform GPT4 when tuned on a limited amount of such realistic in-domain dialogues.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Evaluating correctness and faithfulness of instruction-following models for question answering. arXiv preprint arXiv:2307.16877.
  2. TopiOCQA: Open-domain conversational question answering with topic switching. Transactions of the Association for Computational Linguistics, 10:468–483.
  3. Falcon-40B: an open large language model with state-of-the-art performance.
  4. DoQA - accessing domain-specific FAQs via conversational QA. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7302–7314, Online. Association for Computational Linguistics.
  5. Instruction mining: High-quality instruction data selection for large language models. arXiv preprint arXiv:2307.06290.
  6. Alpagasus: Training a better alpaca with fewer data. arXiv preprint arXiv:2307.08701.
  7. Dola: Decoding by contrasting layers improves factuality in large language models. arXiv preprint arXiv:2309.03883.
  8. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  9. Elastic weight removal for faithful and abstractive dialogue generation. arXiv preprint arXiv:2303.17574.
  10. Controllable factuality in document-grounded dialog systems using a noisy channel model. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1365–1381, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  12. Towards faithful dialogues via focus learning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4554–4566, Toronto, Canada. Association for Computational Linguistics.
  13. Wizard of wikipedia: Knowledge-powered conversational agents. In International Conference on Learning Representations.
  14. FaithDial: A faithful benchmark for information-seeking dialogue. Transactions of the Association for Computational Linguistics, 10:1473–1490.
  15. Neural path hunter: Reducing hallucination in dialogue systems via path grounding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2197–2214, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  16. On the origin of hallucinations in conversational models: Is it the datasets or the models? In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5271–5285, Seattle, United States. Association for Computational Linguistics.
  17. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 422–428, Marseille, France. European Language Resources Association.
  18. MultiDoc2Dial: Modeling dialogues grounded in multiple documents. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6162–6176, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  19. Topical-chat: Towards knowledge-grounded open-domain conversations. In Interspeech 2019.
  20. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 708–719, New Orleans, Louisiana. Association for Computational Linguistics.
  21. An empirical analysis of compute-optimal large language model training. Advances in Neural Information Processing Systems, 35:30016–30030.
  22. Beyond domain APIs: Task-oriented conversational modeling with unstructured knowledge access. In Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 278–289, 1st virtual meeting. Association for Computational Linguistics.
  23. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  24. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  25. TruthfulQA: Measuring how models mimic human falsehoods. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 3214–3252, Dublin, Ireland. Association for Computational Linguistics.
  26. Ilya Loshchilov and Frank Hutter. 2019. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations.
  27. Learning to relate to previous turns in conversational search. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York, NY, USA. Association for Computing Machinery.
  28. A comprehensive overview of large language models. arXiv preprint arXiv:2307.06435.
  29. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
  30. Increasing faithfulness in knowledge-grounded dialogue with controllable features. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 704–718, Online. Association for Computational Linguistics.
  31. CoQA: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7:249–266.
  32. Sebastian Ruder. 2021. Recent Advances in Language Model Fine-tuning. http://ruder.io/recent-advances-lm-fine-tuning.
  33. Interpretation of natural language rules in conversational machine reading. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2087–2097, Brussels, Belgium. Association for Computational Linguistics.
  34. Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207.
  35. Trusting your evidence: Hallucinate less with context-aware decoding. arXiv preprint arXiv:2305.14739.
  36. Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  37. Contrastive learning reduces hallucination in conversations. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 13618–13626.
  38. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  39. Learning to retrieve in-context examples for large language models. arXiv preprint arXiv:2307.07164.
  40. Self-instruct: Aligning language models with self-generated instructions. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 13484–13508, Toronto, Canada. Association for Computational Linguistics.
  41. Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 5085–5109, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  42. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  43. Can pretrained language models (yet) reason deductively? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 1447–1462, Dubrovnik, Croatia. Association for Computational Linguistics.
  44. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
  45. Siren’s song in the ai ocean: A survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
  46. A dataset for document grounded conversations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 708–713, Brussels, Belgium. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Evgeniia Razumovskaia (10 papers)
  2. Ivan Vulić (130 papers)
  3. Pavle Marković (1 paper)
  4. Tomasz Cichy (2 papers)
  5. Qian Zheng (58 papers)
  6. Tsung-Hsien Wen (27 papers)
  7. Paweł Budzianowski (27 papers)
Citations (9)