Dissecting Recall of Factual Associations in Auto-Regressive Language Models (2304.14767v3)

Published 28 Apr 2023 in cs.CL

Abstract: Transformer-based LLMs (LMs) are known to capture factual knowledge in their parameters. While previous work looked into where factual associations are stored, only little is known about how they are retrieved internally during inference. We investigate this question through the lens of information flow. Given a subject-relation query, we study how the model aggregates information about the subject and relation to predict the correct attribute. With interventions on attention edges, we first identify two critical points where information propagates to the prediction: one from the relation positions followed by another from the subject positions. Next, by analyzing the information at these points, we unveil a three-step internal mechanism for attribute extraction. First, the representation at the last-subject position goes through an enrichment process, driven by the early MLP sublayers, to encode many subject-related attributes. Second, information from the relation propagates to the prediction. Third, the prediction representation "queries" the enriched subject to extract the attribute. Perhaps surprisingly, this extraction is typically done via attention heads, which often encode subject-attribute mappings in their parameters. Overall, our findings introduce a comprehensive view of how factual associations are stored and extracted internally in LMs, facilitating future research on knowledge localization and editing.

References (44)

Authors (4)

Mor Geva (58 papers)
Jasmijn Bastings (19 papers)
Katja Filippova (13 papers)
Amir Globerson (87 papers)

Citations (203)

View on Semantic Scholar

Summary

Dissecting Recall of Factual Associations in Auto-Regressive LLMs

The paper "Dissecting Recall of Factual Associations in Auto-Regressive LLMs" presents a detailed investigation into the mechanisms by which transformer-based LLMs (LMs) recall factual information during inference. The focus of this paper is on auto-regressive, decoder-only models, which are known for their ability to store and retrieve vast amounts of factual knowledge encoded within their parameters.

Key Findings

The research outlines a comprehensive three-step mechanism for the extraction of attributes in LMs:

Subject Representation Enrichment: The initial phase involves the enhancement of subject-related representations at the final subject-token position. This enrichment process is primarily facilitated by early layers of the multi-layer perceptron (MLP) submodules, which encode a wealth of attributes associated with the subject.
Information Propagation from Relation Positions: The paper identifies critical points in the transformer architecture where information flows directly from relation-related tokens to final-position tokens. This stage precedes the integration of subject and attribute information, indicating the involvement of different layers for processing relational information.
Attribute Extraction via Attention Mechanism: The final stage involves the 'query' of the enriched subject representation by the prediction, facilitated through attention mechanisms. Notably, multi-head self-attention (MHSA) parameters encode subject-attribute mappings, highlighting the role of attention heads in consolidating factual predictions.

Methodology

The researchers employed a meticulous approach by simulating interventions within the model's layers, specifically targeting the multi-head self-attention sublayers to block certain attention pathways. This allowed the identification of key edges and layers contributing to the successful recall of attributes. Furthermore, the paper quantified the enrichment by examining the attribute rates, derived from Wikipedia contexts of subjects, within different model layers.

Implications

The findings shift the understanding of where factual knowledge is located and emphasize the dual role of MLP and MHSA layers in knowledge extraction processes. Contrary to previous assumptions concentrating on mid-layer MLPs, this research highlights the significant contributions of lower-layer MLPs and MHSA parameters.

Future Directions

The insights provided by this paper open avenues for enhancing model transparency and interpretability, presenting new directions for targeted knowledge localization and model editing techniques. Future work might explore more extensive interventions and alternative architectures, extending beyond the current boundaries of decoder-only frameworks. The understanding of factual recall procedures could also inform developments in improving contextual accuracy in AI systems and expand the scope of AI applications that require explicit factual grounding.

Overall, this paper provides an important expansion to the existing body of knowledge on the internal workings of LMs, contributing to a deeper understanding of how complex associations are handled within such models.

PDF Markdown

YouTube

Show All Videos