Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VISIT: Visualizing and Interpreting the Semantic Information Flow of Transformers (2305.13417v2)

Published 22 May 2023 in cs.CL

Abstract: Recent advances in interpretability suggest we can project weights and hidden states of transformer-based LLMs (LMs) to their vocabulary, a transformation that makes them more human interpretable. In this paper, we investigate LM attention heads and memory values, the vectors the models dynamically create and recall while processing a given input. By analyzing the tokens they represent through this projection, we identify patterns in the information flow inside the attention mechanism. Based on our discoveries, we create a tool to visualize a forward pass of Generative Pre-trained Transformers (GPTs) as an interactive flow graph, with nodes representing neurons or hidden states and edges representing the interactions between them. Our visualization simplifies huge amounts of data into easy-to-read plots that can reflect the models' internal processing, uncovering the contribution of each component to the models' final prediction. Our visualization also unveils new insights about the role of layer norms as semantic filters that influence the models' output, and about neurons that are always activated during forward passes and act as regularization vectors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Samira Abnar and Willem Zuidema. 2020. Quantifying attention flow in transformers. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4190–4197, Online. Association for Computational Linguistics.
  2. Layer normalization. stat, 1050:21.
  3. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112.
  4. Knowledge neurons in pretrained transformers. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8493–8502, Dublin, Ireland. Association for Computational Linguistics.
  5. Analyzing transformers in embedding space. arXiv preprint arXiv:2209.02535.
  6. Jump to conclusions: Short-cutting transformers with linear transformations. arXiv preprint arXiv:2303.09435.
  7. A mathematical framework for transformer circuits.
  8. Kawin Ethayarajh and Dan Jurafsky. 2021. Attention flows are shapley value explanations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 49–54, Online. Association for Computational Linguistics.
  9. LM-debugger: An interactive tool for inspection and intervention in transformer-based language models. In Proceedings of the The 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 12–21, Abu Dhabi, UAE. Association for Computational Linguistics.
  10. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 30–45, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  11. Transformer feed-forward layers are key-value memories. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495.
  12. Understanding transformer memorization recall through idioms. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, pages 248–264. Association for Computational Linguistics.
  13. exbert: A visual analysis tool to explore learned representations in transformer models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 187–196.
  14. Plotly Technologies Inc. 2015. Collaborative data science.
  15. Bert busters: Outlier dimensions that disrupt transformers. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3392–3405.
  16. Max Lamparth and Anka Reuel. 2023. Analyzing and editing inner mechanisms of backdoored language models. arXiv preprint arXiv:2302.12461.
  17. Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.
  18. Locating and editing factual associations in GPT. Advances in Neural Information Processing Systems, 36.
  19. Mass-editing memory in a transformer. International Conference on Learning Representations.
  20. nostalgebraist. 2020. interpreting gpt: the logit lens.
  21. Outliers dimensions that disrupt transformers are driven by frequency. arXiv preprint arXiv:2205.11380.
  22. Language models are unsupervised multitask learners. OpenAI blog.
  23. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
  24. Lutz Roeder. 2017. Netron, Visualizer for neural network, deep learning, and machine learning models.
  25. A primer in BERTology: What we know about how BERT works. Transactions of the Association for Computational Linguistics, 8:842–866.
  26. Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Transactions on Visualization and Computer Graphics, 24(01):667–676.
  27. Seq2seq-vis: A visual debugging tool for sequence-to-sequence models. IEEE transactions on visualization and computer graphics, 25(1):353–363.
  28. William Timkey and Marten van Schijndel. 2021. All bark and no bite: Rogue dimensions in transformer language models obscure representational quality. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 4527–4546.
  29. Attention is all you need. Advances in neural information processing systems, 30.
  30. Jesse Vig and Yonatan Belinkov. 2019. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 63–76.
  31. Ben Wang and Aran Komatsuzaki. 2021. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax.
  32. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. In NeurIPS ML Safety Workshop.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Shahar Katz (5 papers)
  2. Yonatan Belinkov (111 papers)
Citations (18)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com