Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation (2404.03491v1)

Published 4 Apr 2024 in cs.CL and cs.AI

Abstract: Empowered by the large-scale pretrained LLMs, existing dialogue systems have demonstrated impressive performance conducting fluent and natural-sounding conversations. However, they are still plagued by the hallucination problem, causing unpredictable factual errors in the generated responses. Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination. Following the idea of getting high-quality knowledge, a few efforts have achieved pretty good performance on this issue. As some inevitable knowledge noises may also lead to hallucinations, it is emergent to investigate the reason and future directions for building noise-tolerant methods in KGD tasks. In this paper, we analyze the causal story behind this problem with counterfactual reasoning methods. Based on the causal effect analysis, we propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction. Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance, while keeping adaptive to different generation models. We hope our efforts can support and call for more attention to developing lightweight techniques towards robust and trusty dialogue systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Plato-xl: Exploring the large-scale pre-training of dialogue generation. arXiv preprint arXiv:2109.09519.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Reinforced counterfactual data augmentation for dual sentiment classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 269–278.
  4. A survey on dialogue systems: Recent advances and new frontiers. Acm Sigkdd Explorations Newsletter, 19(2):25–35.
  5. What makes a good conversation? challenges in designing truly conversational agents. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, pages 1–12.
  6. Transformer-xl: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2978–2988.
  7. Wizard of wikipedia: Knowledge-powered conversational agents. arXiv preprint arXiv:1811.01241.
  8. GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland. Association for Computational Linguistics.
  9. Neural path hunter: Reducing hallucination in dialogue systems via path grounding. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 2197–2214.
  10. Evaluating groundedness in dialogue systems: The begin benchmark. arXiv preprint arXiv:2105.00071.
  11. A knowledge-grounded neural conversation model. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32-1.
  12. Topical-chat: Towards knowledge-grounded open-domain conversations. In INTERSPEECH, pages 1891–1895.
  13. Counterfactual visual explanations. In International Conference on Machine Learning, pages 2376–2384. PMLR.
  14. The curious case of neural text degeneration. In International Conference on Learning Representations.
  15. Plato-kag: Unsupervised knowledge-grounded conversation via joint modeling. In Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI, pages 143–154.
  16. Survey of hallucination in natural language generation. ACM Comput. Surv, 1(1).
  17. Xlore2: large-scale cross-lingual knowledge graph construction and application. Data Intelligence, 1(1):77–98.
  18. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. arXiv preprint arXiv:1603.08023.
  19. Gpt understands, too. arXiv preprint arXiv:2103.10385.
  20. A survey of document grounded dialogue systems (dgds). arXiv e-prints, pages arXiv–2004.
  21. A double-blind study. Plastic & Reconstructive surgery, 104(7):2261–2266.
  22. Gary Marcus. 2020. The next decade in ai: four steps towards robust artificial intelligence. arXiv preprint arXiv:2002.06177.
  23. Towards exploiting background knowledge for building conversation systems. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 2322–2332.
  24. Clarifying causal mediation analysis for the applied researcher: Defining effects based on what we want to learn. Psychological Methods, 26(2):255.
  25. Counterfactual vqa: A cause-effect look at language bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12700–12710.
  26. Understanding factuality in abstractive summarization with frank: A benchmark for factuality metrics. arXiv preprint arXiv:2104.13346.
  27. Judea Pearl. 2001. Direct and indirect effects. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pages 411–420.
  28. Judea Pearl and Dana Mackenzie. 2018. The book of why: the new science of cause and effect. Basic books.
  29. Judea Pearl et al. 2000. Models, reasoning and inference. Cambridge, UK: CambridgeUniversityPress, 19:2.
  30. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473.
  31. James Robins. 1986. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical modelling, 7(9-12):1393–1512.
  32. James M Robins. 2003. Semantics of causal dag models and the identification of direct and indirect effects. Oxford Statistical Science Series, pages 70–82.
  33. James M Robins and Sander Greenland. 1992. Identifiability and exchangeability for direct and indirect effects. Epidemiology, pages 143–155.
  34. Neal J Roese. 1997. Counterfactual thinking. Psychological bulletin, 121(1):133.
  35. Recipes for building an open-domain chatbot. arXiv preprint arXiv:2004.13637.
  36. Donald B Rubin. 1978. Bayesian inference for causal effects: The role of randomization. The Annals of statistics, pages 34–58.
  37. Rome was built in 1776: A case study on factual correctness in knowledge-grounded response generation. arXiv preprint arXiv:2110.05456.
  38. Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803.
  39. Tyler J VanderWeele. 2013. A three-way decomposition of a total effect into direct, indirect, and interactive effects. Epidemiology, pages 224–232.
  40. A survey on knowledge graph embeddings for link prediction. Symmetry, 13(3):485.
  41. A large-scale chinese short-text conversation dataset. In CCF International Conference on Natural Language Processing and Chinese Computing, pages 91–103. Springer.
  42. Challenges in data-to-document generation. arXiv preprint arXiv:1707.08052.
  43. Polyjuice: Generating counterfactuals for explaining, evaluating, and improving models. arXiv preprint arXiv:2101.00288.
  44. Retrieval-free knowledge-grounded dialogue response generation with adapters. arXiv preprint arXiv:2105.06232.
  45. Korc: Knowledge oriented reading comprehension benchmark for deep text understanding. In Findings of the Association for Computational Linguistics: ACL 2023, pages 11689–11707.
  46. Kola: Carefully benchmarking world knowledge of large language models. In The Twelfth International Conference on Learning Representations.
  47. Xdai: A tuning-free framework for exploiting pre-trained language models in knowledge grounded dialogue generation. In Proceedings of the 2022 Conference on Knowledge Discovery and Data Mining (KDD).
  48. Glm-dialog: Noise-tolerant pre-training for knowledge-grounded dialogue generation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 5564–5575.
  49. Personalizing dialogue agents: I have a dog, do you have pets too? In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204–2213.
  50. Cpm-2: Large-scale cost-effective pre-trained language models. AI Open, 2:216–224.
  51. Chujie Zheng and Minlie Huang. 2021. Exploring prompt-based few-shot learning for grounded dialog generation. arXiv preprint arXiv:2109.06513.
  52. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593.
  53. Eva: An open-domain chinese dialogue system with large-scale generative pre-training. arXiv preprint arXiv:2108.01547.
  54. Commonsense knowledge aware conversation generation with graph attention. In IJCAI, pages 4623–4629.
  55. Kdconv: A chinese multi-domain dialogue dataset towards multi-turn knowledge-driven conversation. arXiv preprint arXiv:2004.04100.
  56. Counterfactual data augmentation for mitigating gender stereotypes in languages with rich morphology. arXiv preprint arXiv:1906.04571.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Jifan Yu (49 papers)
  2. Xiaohan Zhang (78 papers)
  3. Yifan Xu (92 papers)
  4. Xuanyu Lei (10 papers)
  5. Zijun Yao (50 papers)
  6. Jing Zhang (730 papers)
  7. Lei Hou (127 papers)
  8. Juanzi Li (144 papers)
Citations (1)