Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Identifying Semantic Induction Heads to Understand In-Context Learning (2402.13055v2)

Published 20 Feb 2024 in cs.CL and cs.AI

Abstract: Although LLMs have demonstrated remarkable performance, the lack of transparency in their inference logic raises concerns about their trustworthiness. To gain a better understanding of LLMs, we conduct a detailed analysis of the operations of attention heads and aim to better understand the in-context learning of LLMs. Specifically, we investigate whether attention heads encode two types of relationships between tokens present in natural languages: the syntactic dependency parsed from sentences and the relation within knowledge graphs. We find that certain attention heads exhibit a pattern where, when attending to head tokens, they recall tail tokens and increase the output logits of those tail tokens. More crucially, the formulation of such semantic induction heads has a close correlation with the emergence of the in-context learning ability of LLMs. The study of semantic attention heads advances our understanding of the intricate operations of attention heads in transformers, and further provides new insights into the in-context learning of LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. Rethinking the role of scale for in-context learning: An interpretability-based case study at 66 billion scale. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11833–11856, Toronto, Canada. Association for Computational Linguistics.
  2. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  3. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  4. Thread: Circuits. Distill. Https://distill.pub/2020/circuits.
  5. Noam Chomsky. 1957. Syntactic Structures. De Gruyter Mouton, Berlin, Boston.
  6. Enolp musk@SMM4H’22 : Leveraging pre-trained language models for stance and premise classification. In Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task, pages 156–159, Gyeongju, Republic of Korea. Association for Computational Linguistics.
  7. A mathematical framework for transformer circuits. Transformer Circuits Thread.
  8. Dissecting recall of factual associations in auto-regressive language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 12216–12235, Singapore. Association for Computational Linguistics.
  9. spaCy: Industrial-strength Natural Language Processing in Python.
  10. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361.
  11. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2668–2677. PMLR.
  12. Text Generation from Knowledge Graphs with Graph Transformers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2284–2293, Minneapolis, Minnesota. Association for Computational Linguistics.
  13. Visualizing and understanding neural models in nlp. arXiv preprint arXiv:1506.01066.
  14. Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. AAAI’18/IAAI’18/EAAI’18. AAAI Press.
  15. Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla. arXiv preprint arXiv:2307.09458.
  16. Scott M. Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pages 4765–4774.
  17. DecompX: Explaining transformers decisions by propagating token decomposition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2649–2664, Toronto, Canada. Association for Computational Linguistics.
  18. Quantifying context mixing in transformers. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3378–3400, Dubrovnik, Croatia. Association for Computational Linguistics.
  19. In-context learning and induction heads. Transformer Circuits Thread.
  20. Sanvis: Visual analytics for understanding self-attention networks. In 2019 IEEE Visualization Conference (VIS), pages 146–150.
  21. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China. Association for Computational Linguistics.
  22. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144.
  23. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
  24. SlimPajama: A 627B token cleaned and deduplicated version of RedPajama.
  25. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, ICML, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328. PMLR.
  26. InternLM Team. 2023. Internlm: A multilingual language model with progressively enhanced capabilities. https://github.com/InternLM/InternLM.
  27. Craft: Concept recursive activation factorization for explainability. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  28. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  29. Jesse Vig. 2019. A multiscale visualization of attention in the transformer model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 37–42, Florence, Italy. Association for Computational Linguistics.
  30. Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. In The Eleventh International Conference on Learning Representations.
  31. Emergent abilities of large language models. Trans. Mach. Learn. Res., 2022.
  32. Local interpretation of transformer based on linear decomposition. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 10270–10287, Toronto, Canada. Association for Computational Linguistics.
  33. Attentionviz: A global view of transformer attention. IEEE Transactions on Visualization and Computer Graphics, 30(1):262–272.
  34. Probing GPT-3’s linguistic knowledge on semantic tasks. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 297–304, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jie Ren (329 papers)
  2. Qipeng Guo (72 papers)
  3. Hang Yan (86 papers)
  4. Dongrui Liu (34 papers)
  5. Xipeng Qiu (257 papers)
  6. Dahua Lin (336 papers)
  7. Quanshi Zhang (81 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets