Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering (2403.19631v2)

Published 28 Mar 2024 in cs.CL, cs.AI, and cs.LG
Retrieval-enhanced Knowledge Editing in Language Models for Multi-Hop Question Answering

Abstract: LLMs have shown proficiency in question-answering tasks but often struggle to integrate real-time knowledge, leading to potentially outdated or inaccurate responses. This problem becomes even more challenging when dealing with multi-hop questions, since they require LLMs to update and integrate multiple knowledge pieces relevant to the questions. To tackle the problem, we propose the Retrieval-Augmented model Editing (RAE) framework for multi-hop question answering. RAE first retrieves edited facts and then refines the LLM through in-context learning. Specifically, our retrieval approach, based on mutual information maximization, leverages the reasoning abilities of LLMs to identify chain facts that traditional similarity-based searches might miss. In addition, our framework includes a pruning strategy to eliminate redundant information from the retrieved facts, which enhances the editing accuracy and mitigates the hallucination problem. Our framework is supported by theoretical justification for its fact retrieval efficacy. Finally, comprehensive evaluation across various LLMs validates RAE's ability in providing accurate answers with updated knowledge. Our code is available at: https://github.com/sycny/RAE.

Enhancing Multi-Hop Question Answering in LLMs with Retrieval-Augmented Model Editing

Introduction to Retrieval-Augmented Model Editing (RAE)

The paper introduces a novel Retrieval-Augmented model Editing (RAE) framework designed specifically for multi-hop question answering using LLMs. Recognizing the challenge in incorporating real-time knowledge updates, especially in a multi-hop context, RAE leverages a strategy that first retrieves edited facts and then refines the model's response through in-context learning. The framework is distinct in its use of mutual information maximization for retrieval, effectively enhancing the model's ability to identify and integrate relevant knowledge pieces.

Key Contributions

  • Novel Retrieval Approach: Utilization of mutual information maximization to retrieve the most relevant multi-hop edited facts, effectively using the reasoning capabilities of LLMs.
  • Pruning Strategy: Introduction of a knowledge pruning method to eliminate redundant information post-retrieval, ensuring that only pertinent information influences the model’s output.
  • Theoretical Justification: Provision of a theoretical foundation validating the approach for fact retrieval efficacy.
  • Extensive Validation: Empirical demonstrations of RAE's effectiveness across multiple LLMs, substantiating its superiority in handling multi-hop questions over several state-of-the-art methods.

Methodological Framework

The RAE framework consists of two main components:

  1. Edited Facts Retrieval: By maximizing mutual information, the method focuses on retrieving a knowledge graph subset most informative about the query. This process is dependent on the effective estimation of conditional probabilities using the next-word prediction capability of LLMs.
  2. Knowledge Pruning: Upon retrieving a broad set of potentially relevant facts, RAE prunes this set based on the editing uncertainty, which is quantified by the output entropy of the LLM when presented with each subset of facts.

Theoretical Underpinnings

RAE's effectiveness stems from its grounding in information theory, particularly the principles surrounding mutual information. By attempting to maximize the mutual information between the question and the retrieved facts, RAE ensures that the edits align closely with the knowledge required to answer the question accurately. This approach is theoretically justified to align the retrieval process with the end goal of accurate and relevant model editing.

Empirical Evaluation

The RAE framework was subjected to rigorous testing across various datasets and models, demonstrating its adaptability and effectiveness. It consistently outperformed other model editing techniques, particularly in scenarios involving complex multi-hop question answering. These results highlight its practical utility and the potential for broader applications in real-world scenarios where LLMs need to dynamically integrate updated knowledge.

Future Directions

  • Scalability and Efficiency: Improving the efficiency of the retrieval process, possibly by enhancing the mutual information estimation techniques or integrating more computationally efficient models.
  • Broader Applicability: Extending the framework to other forms of dynamic knowledge integration such as real-time information updates from continuous data streams.
  • Domain-Specific Adaptations: Customizing the RAE framework for specific domains like medical or legal question answering, where accuracy and up-to-date information are critical.

The RAE framework marks a significant step forward in the field of knowledge-intensive applications for LLMs, particularly enhancing their capability to handle multi-hop question answering through effective retrieval and editing of relevant facts. Its success opens avenues for more sophisticated and context-aware AI systems, capable of adapting to evolving information landscapes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. The falcon series of open language models. arXiv preprint arXiv:2311.16867 (2023).
  2. Leonard E Baum and Ted Petrie. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The annals of mathematical statistics 37, 6 (1966), 1554–1563.
  3. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. https://doi.org/10.5281/zenodo.5297715 If you use this software, please cite it using these metadata..
  4. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://lmsys.org/blog/2023-03-30-vicuna/
  5. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416 (2022).
  6. Evaluating the ripple effects of knowledge editing in language models. arXiv preprint arXiv:2307.12976 (2023).
  7. Knowledge neurons in pretrained transformers. arXiv preprint arXiv:2104.08696 (2021).
  8. Multi-step entity-centric information retrieval for multi-hop question answering. In Proceedings of the 2nd Workshop on Machine Reading for Question Answering. 113–118.
  9. Improving Sequential Model Editing with Fact Retrieval. In Findings of the Association for Computational Linguistics: EMNLP 2023. 11209–11224.
  10. Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors. arXiv preprint arXiv:2211.11031 (2022).
  11. Unsupervised dense information retrieval with contrastive learning. arXiv preprint arXiv:2112.09118 (2021).
  12. Great Truths are Always Simple: A Rather Simple Knowledge Encoder for Enhancing the Commonsense Reasoning Capacity of Pre-Trained Models. arXiv preprint arXiv:2205.01841 (2022).
  13. Unikgqa: Unified retrieval and reasoning for solving multi-hop question answering over knowledge graph. arXiv preprint arXiv:2212.00959 (2022).
  14. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
  15. The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. arXiv preprint arXiv:2401.03205 (2024).
  16. HELMA: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. arXiv preprint arXiv:2305.11747 (2023).
  17. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems 35 (2022), 17359–17372.
  18. Mass-editing memory in a transformer. arXiv preprint arXiv:2210.07229 (2022).
  19. Fast model editing at scale. arXiv preprint arXiv:2110.11309 (2021).
  20. Memory-based model editing at scale. In International Conference on Machine Learning. PMLR, 15817–15831.
  21. Answering Any-hop Open-domain Questions with Iterative Document Reranking. arXiv preprint arXiv:2009.07465 (2020).
  22. OpenAI. 2023a. GPT-3.5. https://openai.com/blog/gpt-3-5/. Accessed on [Date].
  23. OpenAI. 2023b. Models Referred to as GPT-3.5. https://platform.openai.com/docs/models/gpt-3-5. Accessed on [Date].
  24. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  25. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  26. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE. 1977. Speech Understanding Systems. Summary of Results of the Five-Year Research Effort at Carnegie-Mellon University.
  27. Combining Lexical and Dense Retrieval for Computationally Efficient Multi-hop Question Answering. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, Nafise Sadat Moosavi, Iryna Gurevych, Angela Fan, Thomas Wolf, Yufang Hou, Ana Marasović, and Sujith Ravi (Eds.). Association for Computational Linguistics, Virtual, 58–63. https://doi.org/10.18653/v1/2021.sustainlp-1.7
  28. Open domain question answering using early fusion of knowledge bases and text. arXiv preprint arXiv:1809.00782 (2018).
  29. Think-on-graph: Deep and responsible reasoning of large language model with knowledge graph. arXiv preprint arXiv:2307.07697 (2023).
  30. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  32. Denny Vrandečić and Markus Krötzsch. 2014. Wikidata: a free collaborative knowledgebase. Commun. ACM 57, 10 (2014), 78–85.
  33. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  34. Retrieval-augmented Multilingual Knowledge Editing. arXiv preprint arXiv:2312.13040 (2023).
  35. Knowledgpt: Enhancing large language models with retrieval and storage access on knowledge bases. arXiv preprint arXiv:2308.11761 (2023).
  36. DeepEdit: Knowledge Editing as Decoding with Constraints. arXiv preprint arXiv:2401.10471 (2024).
  37. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
  38. Towards Personalized Cold-Start Recommendation with Prompts. arXiv preprint arXiv:2306.17256 (2023).
  39. An explanation of in-context learning as implicit bayesian inference. arXiv preprint arXiv:2111.02080 (2021).
  40. Simple yet effective bridge reasoning for open-domain multi-hop question answering. arXiv preprint arXiv:1909.07597 (2019).
  41. Unsupervised alignment-based iterative evidence retrieval for multi-hop question answering. arXiv preprint arXiv:2005.01218 (2020).
  42. QA-GNN: Reasoning with language models and knowledge graphs for question answering. arXiv preprint arXiv:2104.06378 (2021).
  43. Investigating the Catastrophic Forgetting in Multimodal Large Language Models. arXiv preprint arXiv:2309.10313 (2023).
  44. Subgraph retrieval enhanced model for multi-hop knowledge base question answering. arXiv preprint arXiv:2202.13296 (2022).
  45. Can We Edit Factual Knowledge by In-Context Learning? arXiv preprint arXiv:2305.12740 (2023).
  46. MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions. arXiv preprint arXiv:2305.14795 (2023).
  47. Modifying memories in transformer models. arXiv preprint arXiv:2012.00363 (2020).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yucheng Shi (30 papers)
  2. Qiaoyu Tan (36 papers)
  3. Xuansheng Wu (21 papers)
  4. Shaochen Zhong (15 papers)
  5. Kaixiong Zhou (52 papers)
  6. Ninghao Liu (98 papers)
Citations (6)
X Twitter Logo Streamline Icon: https://streamlinehq.com