Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 79 tok/s
Gemini 2.5 Pro 60 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 117 tok/s Pro
Kimi K2 201 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Spectral Filters, Dark Signals, and Attention Sinks (2402.09221v1)

Published 14 Feb 2024 in cs.AI and cs.CL

Abstract: Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Eliciting latent predictions from transformers with the tuned lens. arXiv preprint arXiv:2303.08112, 2023.
  2. Sparse autoencoders find highly interpretable features in language models. arXiv preprint arXiv:2309.08600, 2023.
  3. Analyzing transformers in embedding space. arXiv preprint arXiv:2209.02535, 2022.
  4. Jump to conclusions: Short-cutting transformers with linear transformations. arXiv preprint arXiv:2303.09435, 2023.
  5. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
  6. Toy models of superposition. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/toy_model/index.html.
  7. Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913, 2020.
  8. Transformer feed-forward layers build predictions by promoting concepts in the vocabulary space. arXiv preprint arXiv:2203.14680, 2022.
  9. Interpreting transformer’s attention dynamic memory and visualizing the semantic information flow of gpt. arXiv preprint arXiv:2305.13417, 2023.
  10. Competition-level code generation with alphacode. Science, 378(6624):1092–1097, 2022. 10.1126/science.abq1158. https://www.science.org/doi/abs/10.1126/science.abq1158.
  11. Locating and editing factual associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372, 2022.
  12. Evan Miller. Attention is off by one. https://www.evanmiller.org/attention-is-off-by-one.html. https://www.evanmiller.org/attention-is-off-by-one.html.
  13. Nostalgebraist. interpreting gpt: the logit lens. https://www.alignmentforum.org/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens.
  14. In-context learning and induction heads. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
  15. Information-theoretic probing for linguistic structure. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4609–4622. Association for Computational Linguistics, 2020.
  16. Taking features out of superposition with sparse autoencoders. In AI Alignment Forum, 2022.
  17. The truth is in there: Improving reasoning in language models with layer-selective rank reduction. arXiv preprint arXiv:2312.13558, 2023.
  18. Noam Shazeer. Glu variants improve transformer. arXiv preprint arXiv:2002.05202, 2020.
  19. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, page 127063, 2023.
  20. Function vectors in large language models. arXiv preprint arXiv:2310.15213, 2023.
  21. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  22. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  23. Information-theoretic probing with minimum description length. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 183–196, 2020.
  24. Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned. arXiv preprint arXiv:1905.09418, 2019.
  25. Neurons in large language models: Dead, n-gram, positional. arXiv preprint arXiv:2309.04827, 2023.
  26. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https://github.com/kingoflolz/mesh-transformer-jax, May 2021.
  27. Ccnet: Extracting high quality monolingual datasets from web crawl data. arXiv preprint arXiv:1911.00359, 2019.
  28. Efficient streaming language models with attention sinks. arXiv preprint arXiv:2309.17453, 2023.
  29. Root mean square layer normalization. Advances in Neural Information Processing Systems, 32, 2019.
Citations (8)

Summary

  • The paper presents spectral filters and logit spectroscopy to uncover the role of dark signals in enabling attention sinking in LLaMa2 models.
  • It demonstrates that partitioning the embedding spectrum into spectral bands reveals a strong link between tail signals and layer-specific attention dynamics.
  • Experiments on 7B to 70B LLaMa2 models show that preserving attention sinks minimizes performance loss even with significant spectral compression.

An Examination of Spectral Filters and Attention Sinking in LLMs

In "Spectral Filters, Dark Signals, and Attention Sinks," Nicola Cancedda presents an enhanced interpretative approach for transformer-based LLMs, focusing on the projection of intermediate representations onto the vocabulary – commonly referred to as the "logit lens." The paper extends this method to define "spectral filters" on intermediate model representations. These filters involve partitioning the singular vectors of vocabulary embedding and unembedding matrices into specific spectral bands, offering new insights into the properties of these representations within the model's processing layers.

Cancedda identifies that signals occupying the tail end of the spectrum correlate with a phenomenon termed "attention sinking," providing a detailed explanation of this occurrence and its significance. Notably, the suppression of substantial portions of the embedding spectrum, modified in a layer-dependent manner, does not significantly increase model loss, provided attention sinking is preserved. This insight reveals that the projections of attention-receiving tokens align strongly with the less attended spectral regions.

The research primarily utilizes the LLaMa2 models, varying in scale from 7 billion to 70 billion parameters. These select models feature open-access weights, facilitating in-depth analysis, particularly focusing on their residual stream – a shared channel for component communication. Cancedda demonstrates that dark signals are instrumental for maintaining low perplexity by acting as attention sinks, enabling certain components of the model to attract excess attention when it is not contextually relevant to other tokens.

A significant contribution of this paper is the introduction of "spectral filters" and "logit spectroscopy." As Cancedda elucidates, spectral analysis of residual stream contents, alongside the parameter matrices' interaction, offers a detailed view of underlying communication within the model. This exploration uncovers the contribution of MLP layers and attention heads regarding their capacity to read from and write to what is termed as the "dark subspace." The most revealing finding is that, contrary to being mere noise, these dark signals are critical in facilitating attention sinking – a recently documented phenomenon that allows disproportionate attention on the Beginning of Sentence (BoS) Token 0.

The systematic approach to applying spectral filters to particular layers, assessing subsequent effects on negative log likelihood (NLL) across token samples, reveals dark signals' relevance. Notably, this approach suggests the potential for layer-specific spectral compression without substantial performance trade-offs, emphasizing efficiency in deep learning models.

The concept of attention bars and their relation to dark signals further underscores the complexity and adaptability of LLaMa2's architecture. While a definitive conclusion regarding attention bars as auxiliary attention sinks remains elusive, the paper proposes that different attention patterns might share functionality mirroring attention sinks through alternative spectral subspaces.

Overall, Cancedda's work suggests a nuanced dimension of transformer models, offering speculative yet practical pathways for model simplification and efficiency improvement. Future investigations might explore even finer-grained spectral definitions, advancing our understanding of LLM intricacies and their contextual adaptation capacities. The potential benefits extend to enhancing model interpretability and driving further innovation in efficient model architectures and training methodologies.

Authors (1)

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 posts and received 37 likes.

Youtube Logo Streamline Icon: https://streamlinehq.com