Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Dissecting Recall of Factual Associations in Auto-Regressive Language Models (2304.14767v3)

Published 28 Apr 2023 in cs.CL

Abstract: Transformer-based LLMs (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute. With interventions on attention edges, we first identify two critical points where information propagates to the prediction: one from the relation positions followed by another from the subject positions. Next, by analyzing the information at these points, we unveil a three-step internal mechanism for attribute extraction. First, the representation at the last-subject position goes through an enrichment process, driven by the early MLP sublayers, to encode many subject-related attributes. Second, information from the relation propagates to the prediction. Third, the prediction representation "queries" the enriched subject to extract the attribute. Perhaps surprisingly, this extraction is typically done via attention heads, which often encode subject-attribute mappings in their parameters. Overall, our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs, facilitating future research on knowledge localization and editing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Gradient-Based Attribution Methods, pages 169–191. Springer International Publishing, Cham.
  2. Layer normalization. arXiv preprint arXiv:1607.06450.
  3. Jasmijn Bastings and Katja Filippova. 2020. The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 149–155, Online. Association for Computational Linguistics.
  4. Crawling the internal knowledge-base of language models. In Findings of the Association for Computational Linguistics: EACL 2023, pages 1856–1869, Dubrovnik, Croatia. Association for Computational Linguistics.
  5. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, Dublin, Ireland. Association for Computational Linguistics.
  6. Analyzing transformers in embedding space. arXiv preprint arXiv:2209.02535.
  7. Editing factual knowledge in language models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6491–6506, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  8. Extraction of salient sentences from labelled documents. arXiv preprint arXiv:1412.6815.
  9. A mathematical framework for transformer circuits. Transformer Circuits Thread. Https://transformer-circuits.pub/2021/framework/index.html.
  10. The state of the art in semantic relatedness: a framework for comparison. The Knowledge Engineering Review, 32:e10.
  11. LM-debugger: An interactive tool for inspection and intervention in transformer-based language models. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 12–21, Abu Dhabi, UAE. Association for Computational Linguistics.
  12. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 30–45, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  13. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  14. An introduction to genetic analysis. Macmillan.
  15. Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models. arXiv preprint arXiv:2301.04213.
  16. Understanding transformer memorization recall through idioms. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 248–264, Dubrovnik, Croatia. Association for Computational Linguistics.
  17. Measuring and manipulating knowledge representations in language models. arXiv preprint arXiv:2304.00740.
  18. John Hewitt and Christopher D. Manning. 2019. A structural probe for finding syntax in word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota. Association for Computational Linguistics.
  19. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438.
  20. Visualizing and understanding neural models in NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 681–691, San Diego, California. Association for Computational Linguistics.
  21. Locating and editing factual associations in GPT. In Advances in Neural Information Processing Systems.
  22. Mass-editing memory in a transformer.
  23. How to dissect a Muppet: The structure of transformer embedding spaces. Transactions of the Association for Computational Linguistics, 10:981–996.
  24. Fast model editing at scale. In International Conference on Learning Representations.
  25. Quantifying context mixing in transformers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3378–3400, Dubrovnik, Croatia. Association for Computational Linguistics.
  26. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217.
  27. Nostalgebraist. 2020. interpreting GPT: the logit lens.
  28. Chris Olah. 2022. Mechanistic interpretability, variables, and the importance of interpretable bases. Transformer Circuits Thread(June 27). http://www. transformer-circuits. pub/2022/mech-interp-essay/index. html.
  29. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  30. Language models are unsupervised multitask learners. .
  31. What are you token about? dense retrieval as distributions over the vocabulary. arXiv preprint arXiv:2212.10380.
  32. Visualizing and measuring the geometry of bert. Advances in Neural Information Processing Systems, 32.
  33. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5418–5426, Online. Association for Computational Linguistics.
  34. Okapi at trec-3. Nist Special Publication Sp, 109:109.
  35. Inseq: An interpretability toolkit for sequence generation models. arXiv preprint arXiv:2302.13942.
  36. BERT rediscovers the classical NLP pipeline. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4593–4601, Florence, Italy. Association for Computational Linguistics.
  37. Martin J Tymms and Ismail Kola. 2008. Gene knockout protocols, volume 158. Springer Science & Business Media.
  38. Attention is all you need. Advances in neural information processing systems, 30.
  39. The bottom-up evolution of representations in the transformer: A study with machine translation and language modeling objectives. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4396–4406, Hong Kong, China. Association for Computational Linguistics.
  40. BERTnesia: Investigating the capture and forgetting of knowledge in BERT. In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 174–183, Online. Association for Computational Linguistics.
  41. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  42. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593.
  43. Kayo Yin and Graham Neubig. 2022. Interpreting language models with contrastive explanations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 184–198, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  44. Torsten Zesch and Iryna Gurevych. 2010. Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Natural Language Engineering, 16(1):25–59.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Mor Geva (58 papers)
  2. Jasmijn Bastings (19 papers)
  3. Katja Filippova (13 papers)
  4. Amir Globerson (87 papers)
Citations (203)

Summary

Dissecting Recall of Factual Associations in Auto-Regressive LLMs

The paper "Dissecting Recall of Factual Associations in Auto-Regressive LLMs" presents a detailed investigation into the mechanisms by which transformer-based LLMs (LMs) recall factual information during inference. The focus of this paper is on auto-regressive, decoder-only models, which are known for their ability to store and retrieve vast amounts of factual knowledge encoded within their parameters.

Key Findings

The research outlines a comprehensive three-step mechanism for the extraction of attributes in LMs:

  1. Subject Representation Enrichment: The initial phase involves the enhancement of subject-related representations at the final subject-token position. This enrichment process is primarily facilitated by early layers of the multi-layer perceptron (MLP) submodules, which encode a wealth of attributes associated with the subject.
  2. Information Propagation from Relation Positions: The paper identifies critical points in the transformer architecture where information flows directly from relation-related tokens to final-position tokens. This stage precedes the integration of subject and attribute information, indicating the involvement of different layers for processing relational information.
  3. Attribute Extraction via Attention Mechanism: The final stage involves the 'query' of the enriched subject representation by the prediction, facilitated through attention mechanisms. Notably, multi-head self-attention (MHSA) parameters encode subject-attribute mappings, highlighting the role of attention heads in consolidating factual predictions.

Methodology

The researchers employed a meticulous approach by simulating interventions within the model's layers, specifically targeting the multi-head self-attention sublayers to block certain attention pathways. This allowed the identification of key edges and layers contributing to the successful recall of attributes. Furthermore, the paper quantified the enrichment by examining the attribute rates, derived from Wikipedia contexts of subjects, within different model layers.

Implications

The findings shift the understanding of where factual knowledge is located and emphasize the dual role of MLP and MHSA layers in knowledge extraction processes. Contrary to previous assumptions concentrating on mid-layer MLPs, this research highlights the significant contributions of lower-layer MLPs and MHSA parameters.

Future Directions

The insights provided by this paper open avenues for enhancing model transparency and interpretability, presenting new directions for targeted knowledge localization and model editing techniques. Future work might explore more extensive interventions and alternative architectures, extending beyond the current boundaries of decoder-only frameworks. The understanding of factual recall procedures could also inform developments in improving contextual accuracy in AI systems and expand the scope of AI applications that require explicit factual grounding.

Overall, this paper provides an important expansion to the existing body of knowledge on the internal workings of LMs, contributing to a deeper understanding of how complex associations are handled within such models.

Youtube Logo Streamline Icon: https://streamlinehq.com