Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models (2404.03577v1)

Published 4 Apr 2024 in cs.CL

Abstract: Providing knowledge documents for LLMs has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external knowledge that conflicts with their memory. While previous studies have explained to what extent LLMs extract conflicting knowledge from the provided text, they neglect the necessity to reason with conflicting knowledge. Furthermore, there lack a detailed analysis on strategies to enable LLMs to resolve conflicting knowledge via prompting, decoding strategy, and supervised fine-tuning. To address these limitations, we construct a new dataset, dubbed KNOT, for knowledge conflict resolution examination in the form of question answering. KNOT facilitates in-depth analysis by dividing reasoning with conflicting knowledge into three levels: (1) Direct Extraction, which directly extracts conflicting knowledge to answer questions. (2) Explicit Reasoning, which reasons with conflicting knowledge when the reasoning path is explicitly provided in the question. (3) Implicit Reasoning, where reasoning with conflicting knowledge requires LLMs to infer the reasoning path independently to answer questions. We also conduct extensive experiments on KNOT to establish empirical guidelines for LLMs to utilize conflicting knowledge in complex circumstances. Dataset and associated codes can be accessed at https://github.com/THU-KEG/KNOT .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models.
  2. Translating embeddings for modeling multi-relational data. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5-8, 2013, Lake Tahoe, Nevada, United States.
  3. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems.
  4. Rich knowledge sources bring complex knowledge conflicts: Recalibrating models to reflect conflicting evidence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  5. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  6. Scaling instruction-finetuned language models.
  7. Together Computer. 2022. Releasing GPT-JT powered by open-source AI.
  8. Together Computer. 2023. OpenChatKit: An Open Toolkit and Base Model for Dialogue-style Applications.
  9. Synthetic disinformation attacks on automated fact verification systems. In Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022 Virtual Event, February 22 - March 1, 2022.
  10. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics.
  11. REALM: Retrieval augmented language model pre-training. In ICML.
  12. Gautier Izacard and Edouard Grave. 2020. Distilling knowledge from reader to retriever for question answering. arXiv preprint arXiv:2012.04584.
  13. BeliefBank: Adding memory to a pre-trained language model for a systematic notion of belief. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  14. Entity-based knowledge conflicts in question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  15. Is multi-hop reasoning really explainable? towards benchmarking reasoning interpretability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
  16. Inverse scaling prize: Second round winners.
  17. Stephen Merity. 2016. The wikitext long term dependency language modeling dataset. Salesforce Metamind.
  18. Tong Niu and Mohit Bansal. 2018. Adversarial over-sensitivity and over-stability strategies for dialogue models. In Proceedings of the 22nd Conference on Computational Natural Language Learning.
  19. Improving coherence and consistency in neural sequence models with dual-system, neuro-symbolic reasoning. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual.
  20. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems.
  21. Contraqa: Question answering under contradicting contexts. ArXiv preprint.
  22. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP).
  23. Matt Post and David Vilar. 2018. Fast lexically constrained decoding with dynamic beam allocation for neural machine translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
  24. Knowledge graph embedding for link prediction: A comparative analysis. ACM Transactions on Knowledge Discovery from Data (TKDD).
  25. Enhancing retrieval-augmented large language models with iterative retrieval-generation synergy. ArXiv preprint.
  26. Trusting your evidence: Hallucinate less with context-aware decoding. ArXiv preprint.
  27. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. ArXiv preprint.
  28. Prompting gpt-3 to be reliable. ArXiv preprint.
  29. CommonsenseQA: A question answering challenge targeting commonsense knowledge. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers).
  30. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models.
  31. Ul2: Unifying language learning paradigms. In The Eleventh International Conference on Learning Representations.
  32. Llama: Open and efficient foundation language models. ArXiv preprint.
  33. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  34. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. ArXiv preprint.
  35. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  36. KEPLER: A unified model for knowledge embedding and pre-trained language representation. Transactions of the Association for Computational Linguistics.
  37. How far can camels go? exploring the state of instruction tuning on open resources.
  38. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837.
  39. Non-monotonic sequential text generation. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, Proceedings of Machine Learning Research.
  40. Consistency of a recurrent language model with respect to incomplete decoding. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5553–5568.
  41. Neural text generation with unlikelihood training. arXiv preprint arXiv:1908.04319.
  42. UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing.
  43. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.
  44. Korc: Knowledge oriented reading comprehension benchmark for deep text understanding. In Findings of ACL.
  45. Glm-dialog: Noise-tolerant pre-training for knowledge-grounded dialogue generation. ArXiv preprint.
  46. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.
  47. Context-faithful prompting for large language models. ArXiv preprint.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yantao Liu (13 papers)
  2. Zijun Yao (50 papers)
  3. Xin Lv (38 papers)
  4. Yuchen Fan (44 papers)
  5. Shulin Cao (23 papers)
  6. Jifan Yu (49 papers)
  7. Lei Hou (127 papers)
  8. Juanzi Li (144 papers)
Citations (1)