Spectral Filters, Dark Signals, and Attention Sinks (2402.09221v1)
Abstract: Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.
- Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023.
- Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023.
- Analyzing transformers in embedding space. arXiv preprint arXiv:2209.02535, 2022.
- Jump to conclusions: Short-cutting transformers with linear transformations. arXiv preprint arXiv:2303.09435, 2023.
- A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
- Toy models of superposition. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/toy_model/index.html.
- Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913, 2020.
- Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. arXiv preprint arXiv:2203.14680, 2022.
- Interpreting transformer’s attention dynamic memory and visualizing the semantic information flow of gpt. arXiv preprint arXiv:2305.13417, 2023.
- Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022. 10.1126/science.abq1158. https://www.science.org/doi/abs/10.1126/science.abq1158.
- Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
- Evan Miller. Attention is off by one. https://www.evanmiller.org/attention-is-off-by-one.html. https://www.evanmiller.org/attention-is-off-by-one.html.
- Nostalgebraist. interpreting gpt: the logit lens. https://www.alignmentforum.org/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens.
- In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
- Information-theoretic probing for linguistic structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4609–4622. Association for Computational Linguistics, 2020.
- Taking features out of superposition with sparse autoencoders. In AI Alignment Forum, 2022.
- The truth is in there: Improving reasoning in language models with layer-selective rank reduction. arXiv preprint arXiv:2312.13558, 2023.
- Noam Shazeer. Glu variants improve transformer. arXiv preprint arXiv:2002.05202, 2020.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, page 127063, 2023.
- Function vectors in large language models. arXiv preprint arXiv:2310.15213, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Information-theoretic probing with minimum description length. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 183–196, 2020.
- Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418, 2019.
- Neurons in large language models: Dead, n-gram, positional. arXiv preprint arXiv:2309.04827, 2023.
- GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
- Ccnet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359, 2019.
- Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2023.
- Root mean square layer normalization. Advances in Neural Information Processing Systems, 32, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.