Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unveiling LLMs: The Evolution of Latent Representations in a Dynamic Knowledge Graph (2404.03623v2)

Published 4 Apr 2024 in cs.CL, cs.AI, and cs.CY

Abstract: LLMs demonstrate an impressive capacity to recall a vast range of factual knowledge. However, understanding their underlying reasoning and internal mechanisms in exploiting this knowledge remains a key research area. This work unveils the factual information an LLM represents internally for sentence-level claim verification. We propose an end-to-end framework to decode factual knowledge embedded in token representations from a vector space to a set of ground predicates, showing its layer-wise evolution using a dynamic knowledge graph. Our framework employs activation patching, a vector-level technique that alters a token representation during inference, to extract encoded knowledge. Accordingly, we neither rely on training nor external models. Using factual and common-sense claims from two claim verification datasets, we showcase interpretability analyses at local and global levels. The local analysis highlights entity centrality in LLM reasoning, from claim-related information and multi-hop reasoning to representation errors causing erroneous evaluation. On the other hand, the global reveals trends in the underlying evolution, such as word-based knowledge evolving into claim-related facts. By interpreting semantics from LLM latent representations and enabling graph-related analyses, this work enhances the understanding of the factual knowledge resolution process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, pp.  2, 2023.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  3. Yizong Cheng. Mean shift, mode seeking, and clustering. IEEE transactions on pattern analysis and machine intelligence, 17(8):790–799, 1995.
  4. Towards automated circuit discovery for mechanistic interpretability. Advances in Neural Information Processing Systems, 36, 2024.
  5. Neural probabilistic logic programming in discrete-continuous domains. In Uncertainty in Artificial Intelligence, pp.  529–538. PMLR, 2023.
  6. Climate-fever: A dataset for verification of real-world climate claims. Tackling Climate Change with Machine Learning workshop at NeurIPS 2020, 2020.
  7. Patchscopes: A unifying framework for inspecting hidden representations of language models, 2024.
  8. Linearity of relation decoding in transformer language models. arXiv preprint arXiv:2308.09124, 2023.
  9. How can we know what language models know? Transactions of the Association for Computational Linguistics, 8:423–438, 2020.
  10. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  9802–9822, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.546. URL https://aclanthology.org/2023.acl-long.546.
  11. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
  12. Chris Olah. Mechanistic Interpretability, Variables, and the Importance of Interpretable Bases, 2022. URL https://transformer-circuits.pub/2022/mech-interp-essay/index.html.
  13. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  14. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019.
  15. Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), pp.  3125–3132. ACM, 2020.
  16. Multi-scale attributed node embedding. Journal of Complex Networks, 9(2):cnab014, 2021.
  17. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980, 2020.
  18. College: Concept embedding generation for large language models. arXiv preprint arXiv:2403.15362, 2024.
  19. FEVER: a large-scale dataset for fact extraction and VERification. In NAACL-HLT, 2018.
  20. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  21. Do large language models latently perform multi-hop reasoning? arXiv preprint arXiv:2402.16837, 2024.
  22. Towards best practices of activation patching in language models: Metrics and methods. arXiv preprint arXiv:2309.16042, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Marco Bronzini (2 papers)
  2. Carlo Nicolini (8 papers)
  3. Bruno Lepri (120 papers)
  4. Jacopo Staiano (38 papers)
  5. Andrea Passerini (72 papers)
Citations (2)

Summary

Unveiling the Dynamics of Factual Knowledge in LLMs through Latent Representations

Introduction to the Study

The paper explores the factual knowledge encoded in the latent space of LLMs when challenged with the task of claim verification. It introduces an end-to-end framework that deciphers the latent representations of LLMs into structured factual knowledge and traces its evolution across the model layers using a temporal knowledge graph. Notably, the framework employs activation patching as a novel approach for dynamic intervention in model inference, negating the need for external models or additional training processes.

Understanding the Framework and Methodology

The proposed framework operates by interfacing with a model across its hidden layers during inference, extracting the semantics of factual claims. The process involves several key steps:

  • Preliminary Prompt Construction: The model receives semantically structured prompts that guide it to process factual claims, aiming to generate outputs as ground predicates (asserted or negated).
  • Latent Representation Patching: This critical phase manipulates the model's latent representations by replacing the embedding of a designated token with a weighted summary from the source prompt's latent representations. This enables probing of how the encoded knowledge evolves and is manipulated across layers.
  • Temporal Knowledge Graph Construction: The output predictions, structured as ground predicates, are then translated into a knowledge graph representation, with the model's layers serving as a temporal dimension. This approach facilitates a granular analysis of how factual knowledge transforms throughout the inference process.

Results and Implications

This paper unveils several key findings regarding the latent dynamics of factual knowledge within LLMs. The local interpretability analysis exposes latent errors, ranging from entity resolution to multi-hop reasoning faults. Globally, it reveals distinct patterns of knowledge evolution - entity resolution focus in early layers, comprehensive encoding of factual knowledge about subject entities in middle layers, and a decline in factual expressiveness in the final layers.

Concluding Remarks

This work represents a significant step forward in understanding the internal mechanisms of LLMs, particularly in how they encode, manipulate, and apply factual knowledge. By leveraging a patching-based approach, this framework opens new avenues for probing the under-explored depths of LLMs’ latent spaces, offering insights into their operational dynamics without the need for external intervention. Future research could extend this framework to explore the interaction between larger context sizes and the resolution process of factual knowledge within LLMs.