Head Relevance Vectors (HRVs)
- HRVs are per-head vectors or scalars that measure the contribution of individual attention heads in transformer models, providing clear quantification of task-specific relevance.
- They use techniques such as token attribution, contrastive retrieval, and latent disentanglement to identify and manipulate the most discriminative heads without altering model structure.
- HRVs enhance model efficiency and control by enabling selective resource allocation and improved interpretability across applications like multimodal LLMs, diffusion models, retrieval, and audio processing.
Head Relevance Vectors (HRVs) quantify the task-specific or concept-level contribution of individual heads within attention-based neural architectures, enabling the principled identification, selection, and manipulation of the most discriminative or semantically aligned heads without intrusive model modifications. HRVs have been formalized, utilized, and experimentally validated across domains including multimodal LLMs, text-to-image generative models, attention-based retrieval/reranking, audio representation learning, and causal LLM steering.
1. Mathematical Formulations and Notational Scope
HRVs are typically constructed as per-head vectors or scalars reflecting each head's relevance to a downstream objective or human-interpretable visual concept. Across transformers with layers and heads per layer, the model-wise HRV is a concatenation of per-layer vectors for , resulting in or, for cross-attention, per concept .
For attention-based visual relevance in multimodal LLMs (Wang et al., 5 Jun 2025), scalar scores for head are defined as: where and iff the maximal-attention key for output token falls within the set of image tokens associated with .
In concept-aligned diffusion models (Park et al., 2024), an HRV for concept is defined as: where each element counts (and is later L1-normalized across heads) the number of generations in which head most responds to .
For contrastive retrieval, a relevance score per head is given by an InfoNCE-like contrast between positive and negative document attention (Tran et al., 2 Oct 2025): where is the mean attention from query to document under head .
In causal LLM steering (Zhan et al., 10 Jun 2025), per-head HRVs are constructed as the concatenation of the (discrete or latent) units within a VQ-AE representation identified as behavior-discriminative via supervised contrast.
2. Algorithms for HRV Computation
Training-Free Response Analysis (SparseMM)
- Extract all attention matrices for a set of annotated image-text pairs.
- For each output token , determine the set (image tokens corresponding to ).
- For each head, increment by if .
- Normalize all over heads to get .
- HRVs are then per-layer vectors .
Mechanistic Interpretability in Diffusion Models
- Given concepts and heads, for each prompt, timestep, and head, find the concept with top average spatial activation.
- Increment .
- After all data, normalize each so that .
Contrastive Retrieval Head Scoring
- Aggregate per-head query-to-document attention for both gold and negative documents.
- Apply a softmax-based contrastive metric .
- Select top heads by average over samples.
Latent Disentanglement for Behavioral Relevance
- Train, per head, a VQ-AE on last-token activations, partition code as per semantic units.
- Add a supervised contrastive loss forcing separation of encodings from aligned vs violating behaviors.
- Designate as HRV the units with high class-separability; final score is given by a binary classification (AUC) of generated codes.
Audio Relevance Heads
- Decompose time-frequency filterbank output into sub-bands, each processed by a two-layer FC network to generate a soft mask over sub-band bins.
- The relevance mask serves as the HRV for head .
3. Applications and Empirical Insights
HRVs have driven advances in model efficiency, interpretability, retrieval, and controlled generation:
| Application | Head Selection Criterion | Empirical Highlights |
|---|---|---|
| SparseMM MLLMs | Visual alignment via token attribution | <5% heads suffice for visual tasks, 1.38× speedup, 52% memory reduction (Wang et al., 5 Jun 2025) |
| Cross-attn Diffusion | Human concept-alignment in CA heads | HRVs enable concept-strengthening, reducing polysemy errors from 63%→15.9% (Park et al., 2024) |
| Retrieval Reranking | InfoNCE-style contrast of gold vs negatives | <1% heads optimal, +1–4 nDCG points, 20% latency/40% memory savings after layer pruning (Tran et al., 2 Oct 2025) |
| Audio Classification | Mask generation over local TF sub-bands | 10–23% accuracy gains at <0.1% param increase (Dutta et al., 2021) |
| Causal LLM Steering | VQ-AE/contrastive latent separation | 20% accuracy boost for truthfulness interventions (Zhan et al., 10 Jun 2025) |
Preserving only the top- heads by HRV scores often matches or outperforms full-head baselines, with heads concentrated in mid-layers and task-relevant heads forming a small, robust subset.
4. Inference Manipulation and Resource Allocation
SparseMM operationalizes HRVs for memory and compute savings by asymmetric KV-cache allocation (Wang et al., 5 Jun 2025):
- Each head receives a combined KV budget:
with local window , uniform baseline , and remaining cache allocated in proportion to .
- During decoding, heads retain only their most-attended keys up to their respective .
- Ablating low-relevance heads (95%+) yields negligible accuracy drop; on DocVQA, 5.3% of full cache suffices for Qwen2-VL-7B.
In generative vision models (Park et al., 2024), HRVs enable direct rescaling of per-head cross-attention weights for concept strengthening and adjusting:
- For desired concept , rescale CA maps as .
- For both desired and undesired concepts, interpolate head-wise as .
For causal behavioral steering, HRVs identify which heads to intervene on and provide per-head importance weights for steering vectors (Zhan et al., 10 Jun 2025).
5. Interpretability, Clustering, and Specialization
HRVs empirically align with human-specified or downstream concepts:
- Ordered weakening: systematically ablating heads in order of decreasing HRV for a concept causes earlier and steeper loss of that concept in generative output (Park et al., 2024).
- In clustering analyses, HRVs for semantically similar concepts cluster distinctly in the HRV space, reinforcing interpretability claims.
- In audio, visualizing demonstrates functional specialization—e.g., one head accentuating high-frequency transients, another heightening low-frequency backgrounds (Dutta et al., 2021).
- In retrieval, aggregation over a single high-relevance head can outperform full-head schemes (Tran et al., 2 Oct 2025).
6. Limitations and Prospective Developments
Limitations include:
- Degraded ranking/weak interpretability for diffuse or ambiguous concepts (e.g., numeracy, facial expressions) (Park et al., 2024).
- Simple HRV normalizations may be inadequate for very large head counts (); alternative scaling or clamping may be needed (Park et al., 2024).
- For causal interventions, incomplete disentanglement or over-pruning can limit transferability (Zhan et al., 10 Jun 2025).
Prospective directions span:
- Fully automated pipelines for target token/concept selection.
- Improved HRV normalization for large-head models.
- Extension to other architectures (e.g., non-diffusion multimodal models).
- Deeper investigation of HRVs in self-attention vs. cross-attention, and the effects of architecture or fine-tuning.
7. Summary Table: HRV Methodologies in Recent Literature
| Domain/Model | HRV Definition | Selection/Analysis Method | Notable Results | Reference |
|---|---|---|---|---|
| MLLMs (SparseMM) | Visual token attribution | Training-free response analysis | <5% heads needed for accuracy, 1.38× speed, 52% KV reduction | (Wang et al., 5 Jun 2025) |
| Text-to-Image Diffusion | Concept activation counts | CA map activation + clustering | 4–12% metric gains, drastic polysemy error drop | (Park et al., 2024) |
| Retrieval/Reranking | InfoNCE contrast | Contrastive gold-vs-neg analysis | 1% heads optimal; layer pruning yields efficiency | (Tran et al., 2 Oct 2025) |
| Audio Representation | Sub-band context masking | Per-head 2-layer nets, end-to-end | +10–23% accuracy improvements over baseline | (Dutta et al., 2021) |
| Causal LLM Steering | VQ-AE latent partitioning | Behavior discriminative contrast | 20–81.5% boost in target steering, zero-shot transfer | (Zhan et al., 10 Jun 2025) |
Across all observed settings, HRVs offer a principled, interpretable, and efficient mechanism for fine-grained network analysis, head selection, memory/computation savings, and targeted model control.