Unveiling the Dynamics of Factual Knowledge in LLMs through Latent Representations
Introduction to the Study
The paper explores the factual knowledge encoded in the latent space of LLMs when challenged with the task of claim verification. It introduces an end-to-end framework that deciphers the latent representations of LLMs into structured factual knowledge and traces its evolution across the model layers using a temporal knowledge graph. Notably, the framework employs activation patching as a novel approach for dynamic intervention in model inference, negating the need for external models or additional training processes.
Understanding the Framework and Methodology
The proposed framework operates by interfacing with a model across its hidden layers during inference, extracting the semantics of factual claims. The process involves several key steps:
- Preliminary Prompt Construction: The model receives semantically structured prompts that guide it to process factual claims, aiming to generate outputs as ground predicates (asserted or negated).
- Latent Representation Patching: This critical phase manipulates the model's latent representations by replacing the embedding of a designated token with a weighted summary from the source prompt's latent representations. This enables probing of how the encoded knowledge evolves and is manipulated across layers.
- Temporal Knowledge Graph Construction: The output predictions, structured as ground predicates, are then translated into a knowledge graph representation, with the model's layers serving as a temporal dimension. This approach facilitates a granular analysis of how factual knowledge transforms throughout the inference process.
Results and Implications
This paper unveils several key findings regarding the latent dynamics of factual knowledge within LLMs. The local interpretability analysis exposes latent errors, ranging from entity resolution to multi-hop reasoning faults. Globally, it reveals distinct patterns of knowledge evolution - entity resolution focus in early layers, comprehensive encoding of factual knowledge about subject entities in middle layers, and a decline in factual expressiveness in the final layers.
Concluding Remarks
This work represents a significant step forward in understanding the internal mechanisms of LLMs, particularly in how they encode, manipulate, and apply factual knowledge. By leveraging a patching-based approach, this framework opens new avenues for probing the under-explored depths of LLMs’ latent spaces, offering insights into their operational dynamics without the need for external intervention. Future research could extend this framework to explore the interaction between larger context sizes and the resolution process of factual knowledge within LLMs.