Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Physics of Language Models: Part 3.2, Knowledge Manipulation (2309.14402v2)

Published 25 Sep 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs can store vast factual knowledge, yet their ability to flexibly use this knowledge for downstream tasks (e.g., via instruction finetuning) remains questionable. This paper investigates four fundamental knowledge manipulation tasks: retrieval (e.g., "What is person A's attribute X?"), classification (e.g., "Is A's attribute X even or odd?"), comparison (e.g., "Is A greater than B in attribute X?"), and inverse search (e.g., "Which person's attribute X equals T?"). We show that LLMs excel in knowledge retrieval but struggle even in the simplest classification or comparison tasks unless Chain of Thoughts (CoTs) are employed during both training and inference. Moreover, their performance in inverse knowledge search is virtually 0%, regardless of the prompts. Our primary contribution is a controlled, synthetic experiment that confirms these weaknesses are inherent to LLMs: they cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored in the models, despite adequate training and sufficient model size. Our findings also apply to modern pretrained LLMs such as GPT-4, thus giving rise to many Turing tests to distinguish Humans from contemporary AIs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. ArXiv e-prints, September 2023.
  2. The Reversal Curse: LLMs trained on ”A is B” fail to learn ”B is A”. arXiv preprint arXiv:2309.12288, September 2023.
  3. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
  4. Recent advances in retrieval-augmented text generation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3417–3419, 2022.
  5. Measuring and manipulating knowledge representations in language models. arXiv preprint arXiv:2304.00740, 2023.
  6. Lora: Low-rank adaptation of large language models. In ICLR, 2021.
  7. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023.
  8. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566, 2021.
  9. Generating text from structured data with application to the biography domain. CoRR, abs/1603.07771, 2016. URL http://arxiv.org/abs/1603.07771.
  10. Retrieval-augmented generation for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf.
  11. Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405, 2020.
  12. Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553, 2020.
  13. A semantics-aware transformer model of relation linking for knowledge base question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 256–262, Online, August 2021. Association for Computational Linguistics.
  14. Chatgpt versus traditional question answering for knowledge graphs: Current status and future directions towards knowledge graph chatbots. arXiv preprint arXiv:2302.06466, 2023.
  15. OpenAI. Gpt-4 technical report, 2023.
  16. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601, 2021.
  17. Copen: Probing conceptual knowledge in pre-trained language models. arXiv preprint arXiv:2211.04079, 2022.
  18. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
  19. Language models are unsupervised multitask learners. 2019.
  20. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083, 2023.
  21. What does my QA model know? devising controlled probes using expert knowledge. Transactions of the Association for Computational Linguistics, 8:572–588, 2020. doi: 10.1162/tacl˙a˙00331. URL https://aclanthology.org/2020.tacl-1.37.
  22. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
  23. Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17, 2023.
  24. Roformer: Enhanced transformer with rotary position embedding, 2021.
  25. Head-to-tail: How knowledgeable are large language models (llm)? aka will llms replace knowledge graphs? arXiv preprint arXiv:2308.10168, 2023.
  26. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  27. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
Citations (58)

Summary

We haven't generated a summary for this paper yet.