Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 79 tok/s

Gemini 2.5 Pro 60 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 29 tok/s Pro

GPT-4o 117 tok/s Pro

Kimi K2 201 tok/s Pro

GPT OSS 120B 466 tok/s Pro

Claude Sonnet 4.5 37 tok/s Pro

2000 character limit reached

Spectral Filters, Dark Signals, and Attention Sinks (2402.09221v1)

Published 14 Feb 2024 in cs.AI and cs.CL

Abstract: Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.

References (29)

Citations (8)

View on Semantic Scholar

Summary

The paper presents spectral filters and logit spectroscopy to uncover the role of dark signals in enabling attention sinking in LLaMa2 models.
It demonstrates that partitioning the embedding spectrum into spectral bands reveals a strong link between tail signals and layer-specific attention dynamics.
Experiments on 7B to 70B LLaMa2 models show that preserving attention sinks minimizes performance loss even with significant spectral compression.

An Examination of Spectral Filters and Attention Sinking in LLMs

In "Spectral Filters, Dark Signals, and Attention Sinks," Nicola Cancedda presents an enhanced interpretative approach for transformer-based LLMs, focusing on the projection of intermediate representations onto the vocabulary – commonly referred to as the "logit lens." The paper extends this method to define "spectral filters" on intermediate model representations. These filters involve partitioning the singular vectors of vocabulary embedding and unembedding matrices into specific spectral bands, offering new insights into the properties of these representations within the model's processing layers.

Cancedda identifies that signals occupying the tail end of the spectrum correlate with a phenomenon termed "attention sinking," providing a detailed explanation of this occurrence and its significance. Notably, the suppression of substantial portions of the embedding spectrum, modified in a layer-dependent manner, does not significantly increase model loss, provided attention sinking is preserved. This insight reveals that the projections of attention-receiving tokens align strongly with the less attended spectral regions.

The research primarily utilizes the LLaMa2 models, varying in scale from 7 billion to 70 billion parameters. These select models feature open-access weights, facilitating in-depth analysis, particularly focusing on their residual stream – a shared channel for component communication. Cancedda demonstrates that dark signals are instrumental for maintaining low perplexity by acting as attention sinks, enabling certain components of the model to attract excess attention when it is not contextually relevant to other tokens.

A significant contribution of this paper is the introduction of "spectral filters" and "logit spectroscopy." As Cancedda elucidates, spectral analysis of residual stream contents, alongside the parameter matrices' interaction, offers a detailed view of underlying communication within the model. This exploration uncovers the contribution of MLP layers and attention heads regarding their capacity to read from and write to what is termed as the "dark subspace." The most revealing finding is that, contrary to being mere noise, these dark signals are critical in facilitating attention sinking – a recently documented phenomenon that allows disproportionate attention on the Beginning of Sentence (BoS) Token 0.

The systematic approach to applying spectral filters to particular layers, assessing subsequent effects on negative log likelihood (NLL) across token samples, reveals dark signals' relevance. Notably, this approach suggests the potential for layer-specific spectral compression without substantial performance trade-offs, emphasizing efficiency in deep learning models.

The concept of attention bars and their relation to dark signals further underscores the complexity and adaptability of LLaMa2's architecture. While a definitive conclusion regarding attention bars as auxiliary attention sinks remains elusive, the paper proposes that different attention patterns might share functionality mirroring attention sinks through alternative spectral subspaces.

Overall, Cancedda's work suggests a nuanced dimension of transformer models, offering speculative yet practical pathways for model simplification and efficiency improvement. Future investigations might explore even finer-grained spectral definitions, advancing our understanding of LLM intricacies and their contextual adaptation capacities. The potential benefits extend to enhancing model interpretability and driving further innovation in efficient model architectures and training methodologies.