Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating the External and Parametric Knowledge Fusion of Large Language Models (2405.19010v1)

Published 29 May 2024 in cs.CL, cs.AI, and cs.IR

Abstract: Integrating external knowledge into LLMs presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric knowledge remains largely unexplored, especially in cases where external knowledge is incomplete and necessitates supplementation by their parametric knowledge. We propose to deconstruct knowledge fusion into four distinct scenarios, offering the first thorough investigation of LLM behavior across each. We develop a systematic pipeline for data construction and knowledge infusion to simulate these fusion scenarios, facilitating a series of controlled experiments. Our investigation reveals that enhancing parametric knowledge within LLMs can significantly bolster their capability for knowledge integration. Nonetheless, we identify persistent challenges in memorizing and eliciting parametric knowledge, and determining parametric knowledge boundaries. Our findings aim to steer future explorations on harmonizing external and parametric knowledge within LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Physics of language models: Part 3.1, knowledge storage and extraction. ArXiv, abs/2309.14316, 2023a.
  2. Physics of language models: Part 3.2, knowledge manipulation. ArXiv, abs/2309.14402, 2023b.
  3. Palm 2 technical report. ArXiv, abs/2305.10403, 2023.
  4. Contextgpt: Infusing llms knowledge into neuro-symbolic activity recognition models. ArXiv, abs/2403.06586, 2024.
  5. Self-rag: Learning to retrieve, generate, and critique through self-reflection. ArXiv, abs/2310.11511, 2023.
  6. Qwen technical report. ArXiv, abs/2309.16609, 2023.
  7. The reversal curse: Llms trained on" a is b" fail to learn" b is a". ArXiv, abs/2309.12288, 2023.
  8. The reversal curse: LLMs trained on “a is b” fail to learn “b is a”. In The Twelfth International Conference on Learning Representations, 2024.
  9. Improving language models by retrieving from trillions of tokens. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 2206–2240. PMLR, 2022.
  10. Benchmarking large language models in retrieval-augmented generation. ArXiv, abs/2309.01431, 2023.
  11. Benchmarking large language models in retrieval-augmented generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16):17754–17762, Mar. 2024.
  12. The power of noise: Redefining retrieval for rag systems. ArXiv, abs/2401.14887, 2024.
  13. A survey on rag meets llms: Towards retrieval-augmented large language models. ArXiv, abs/2405.06211, 2024.
  14. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022.
  15. Continual pre-training for cross-lingual llm adaptation: Enhancing japanese language capabilities. ArXiv, abs/2404.17790, 2024.
  16. Retrieval-augmented generation for large language models: A survey. ArXiv, abs/2312.10997, 2023.
  17. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021a.
  18. Transformer feed-forward layers are key-value memories. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, editors, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, 2021b.
  19. Knowledge is a region in weight space for fine-tuned language models. In Houda Bouamor, Juan Pino, and Kalika Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 1350–1370, 2023.
  20. Rethinking with retrieval: Faithful large language model inference. ArXiv, abs/2301.00303, 2022.
  21. A survey of knowledge enhanced pre-trained language models. IEEE Transactions on Knowledge and Data Engineering, 2023.
  22. Few-shot learning with retrieval augmented language models. ArXiv, abs/2208.03299, 2022.
  23. Adaptive-rag: Learning to adapt retrieval-augmented large language models through question complexity. ArXiv, abs/2403.14403, 2024.
  24. Active retrieval augmented generation. ArXiv, abs/2305.06983, 2023.
  25. Panda llm: Training data and evaluation for open-sourced chinese instruction-following large language models. ArXiv, abs/2305.03025, 2023.
  26. Language models (mostly) know what they know. ArXiv, abs/2207.05221, 2022.
  27. Large language models struggle to learn long-tail knowledge. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 15696–15707. PMLR, 2023.
  28. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466, 2019.
  29. Internet-augmented language models through few-shot prompting for open-domain question answering. ArXiv, abs/2203.05115, 2022.
  30. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc., 2020.
  31. Large language models with controllable working memory. ArXiv, abs/2211.05110, 2022.
  32. Chain-of-knowledge: Grounding large language models via dynamic knowledge adapting over heterogeneous sources. In The Twelfth International Conference on Learning Representations, 2024.
  33. Ra-dit: Retrieval-augmented dual instruction tuning. ArXiv, abs/2310.01352, 2023.
  34. We’re afraid language models aren’t modeling ambiguity. ArXiv, abs/2304.14399, 2023a.
  35. Recall: A benchmark for llms robustness against external counterfactual knowledge. ArXiv, abs/2311.08147, 2023b.
  36. An empirical study of catastrophic forgetting in large language models during continual fine-tuning. ArXiv, abs/2308.08747, 2023.
  37. When not to trust language models: Investigating effectiveness and limitations of parametric and non-parametric memories. ArXiv, abs/2212.10511, 2022.
  38. Locating and editing factual associations in gpt. In Advances in Neural Information Processing Systems, volume 35, pages 17359–17372, 2022.
  39. Augmented language models: a survey. ArXiv, abs/2302.07842, 2023.
  40. Skill: Structured knowledge infusion for large language models. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1581–1588, 2022.
  41. Webgpt: Browser-assisted question-answering with human feedback. ArXiv, abs/2112.09332, 2021.
  42. A comprehensive overview of large language models. ArXiv, abs/2307.06435, 2024.
  43. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023.
  44. Fine-tuning or retrieval? comparing knowledge injection in llms. ArXiv, abs/2312.05934, 2023.
  45. Contraqa: Question answering under contradicting contexts. ArXiv, abs/2110.07803, 2021.
  46. Not all contexts are equal: Teaching llms credibility-aware generation. ArXiv, abs/2404.06809, 2024.
  47. Language models as knowledge bases? In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, 2019.
  48. "merge conflicts!" exploring the impacts of external distractors to parametric knowledge graphs. ArXiv, abs/2309.08594, 2023.
  49. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. In The Twelfth International Conference on Learning Representations, 2024.
  50. Know what you don’t know: Unanswerable questions for SQuAD. In Iryna Gurevych and Yusuke Miyao, editors, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784–789, 2018.
  51. In-context retrieval-augmented language models. ArXiv, abs/2302.00083, 2023.
  52. Investigating the factual knowledge boundary of large language models with retrieval augmentation. ArXiv, abs/2307.11019, 2023a.
  53. Investigating the factual knowledge boundary of large language models with retrieval augmentation. arXiv, abs/2307.11019, 2023b.
  54. Toolformer: Language models can teach themselves to use tools. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  55. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face. ArXiv, abs/2303.17580, 2023.
  56. Continual learning of large language models: A comprehensive survey. ArXiv, abs/2404.16789, 2024.
  57. Llama: Open and efficient foundation language models. ArXiv, abs/2302.13971, 2023.
  58. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. ArXiv, abs/2310.07521, 2023a.
  59. Resolving knowledge conflicts in large language models. ArXiv, abs/2310.00935, 2023b.
  60. Self-knowledge guided retrieval augmentation for large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 10303–10315, 2023c.
  61. Fake alignment: Are llms really aligned well? ArXiv, abs/2311.05915, 2023d.
  62. Memorizing transformers. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022, 2022.
  63. Adaptive chameleon or stubborn sloth: Unraveling the behavior of large language models in knowledge conflicts. ArXiv, abs/2305.13300, 2023.
  64. Adaptive chameleon or stubborn sloth: Revealing the behavior of large language models in knowledge conflicts. In The Twelfth International Conference on Learning Representations, 2024.
  65. Corrective retrieval augmented generation. ArXiv, abs/2401.15884, 2024.
  66. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380, 2018.
  67. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 8653–8665, 2023a.
  68. Do large language models know what they don’t know? In Findings of the Association for Computational Linguistics: ACL 2023, pages 8653–8665. Association for Computational Linguistics, 2023b.
  69. Making retrieval-augmented language models robust to irrelevant context. ArXiv, abs/2310.01558, 2023.
  70. A comprehensive study of knowledge editing for large language models. ArXiv, abs/2401.01286, 2024.
  71. Merging generated and retrieved knowledge for open-domain QA. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4710–4728, 2023a.
  72. Merging generated and retrieved knowledge for open-domain QA. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4710–4728, Singapore, 2023b. Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Hao Zhang (947 papers)
  2. Yuyang Zhang (28 papers)
  3. Xiaoguang Li (71 papers)
  4. Wenxuan Shi (7 papers)
  5. Haonan Xu (11 papers)
  6. Huanshuo Liu (3 papers)
  7. Yasheng Wang (91 papers)
  8. Lifeng Shang (90 papers)
  9. Qun Liu (230 papers)
  10. Yong Liu (721 papers)
  11. Ruiming Tang (171 papers)
Citations (3)
X Twitter Logo Streamline Icon: https://streamlinehq.com