Papers
Topics
Authors
Recent
Search
2000 character limit reached

Head Relevance Vectors (HRVs)

Updated 15 March 2026
  • HRVs are per-head vectors or scalars that measure the contribution of individual attention heads in transformer models, providing clear quantification of task-specific relevance.
  • They use techniques such as token attribution, contrastive retrieval, and latent disentanglement to identify and manipulate the most discriminative heads without altering model structure.
  • HRVs enhance model efficiency and control by enabling selective resource allocation and improved interpretability across applications like multimodal LLMs, diffusion models, retrieval, and audio processing.

Head Relevance Vectors (HRVs) quantify the task-specific or concept-level contribution of individual heads within attention-based neural architectures, enabling the principled identification, selection, and manipulation of the most discriminative or semantically aligned heads without intrusive model modifications. HRVs have been formalized, utilized, and experimentally validated across domains including multimodal LLMs, text-to-image generative models, attention-based retrieval/reranking, audio representation learning, and causal LLM steering.

1. Mathematical Formulations and Notational Scope

HRVs are typically constructed as per-head vectors or scalars reflecting each head's relevance to a downstream objective or human-interpretable visual concept. Across transformers with LL layers and HH heads per layer, the model-wise HRV is a concatenation of per-layer vectors HRVRH\mathrm{HRV}_\ell \in \mathbb R^H for =1,,L\ell = 1,\ldots,L, resulting in HRVRLH\mathrm{HRV} \in \mathbb R^{L \cdot H} or, for cross-attention, HRVnRH\mathrm{HRV}_n \in \mathbb R^H per concept CnC_n.

For attention-based visual relevance in multimodal LLMs (Wang et al., 5 Jun 2025), scalar scores v,hv_{\ell,h} for head (,h)(\ell,h) are defined as: v,h=S,h/,hS,hv_{\ell,h} = S_{\ell,h} \bigg/ \sum_{\ell',h'} S_{\ell',h'} where S,h=i=1Nhit,h(yi)/IyiS_{\ell,h} = \sum_{i=1}^N \text{hit}_{\ell,h}(y_i)/|I_{y_i}| and hit,h(yi)=1\text{hit}_{\ell,h}(y_i)=1 iff the maximal-attention key for output token yiy_i falls within the set IyiI_{y_i} of image tokens associated with yiy_i.

In concept-aligned diffusion models (Park et al., 2024), an HRV for concept CnC_n is defined as: HRVnRH\mathrm{HRV}_{n} \in \mathbb R^H where each element counts (and is later L1-normalized across heads) the number of generations in which head hh most responds to CnC_n.

For contrastive retrieval, a relevance score per head is given by an InfoNCE-like contrast between positive and negative document attention (Tran et al., 2 Oct 2025): SCoRe(h)=exp(sposh/t)exp(sposh/t)+iexp(sneg,ih/t)S_{\mathrm{CoRe}}(h) = \frac{ \exp(s_{\text{pos}}^h / t) } { \exp(s_{\text{pos}}^h / t) + \sum_i \exp(s_{\text{neg},i}^h / t) } where sdhs_d^h is the mean attention from query to document dd under head hh.

In causal LLM steering (Zhan et al., 10 Jun 2025), per-head HRVs are constructed as the concatenation of the (discrete or latent) units within a VQ-AE representation identified as behavior-discriminative via supervised contrast.

2. Algorithms for HRV Computation

Training-Free Response Analysis (SparseMM)

  • Extract all A,hA_{\ell,h} attention matrices for a set of NN annotated image-text pairs.
  • For each output token yiy_i, determine the set IyiI_{y_i} (image tokens corresponding to yiy_i).
  • For each head, increment S,hS_{\ell,h} by 1/Iyi1/|I_{y_i}| if argmaxt[A,h]i,tIyi\operatorname{argmax}_t [A_{\ell,h}]_{i,t} \in I_{y_i}.
  • Normalize all S,hS_{\ell,h} over heads to get v,hv_{\ell,h}.
  • HRVs are then per-layer vectors [v,1,...,v,H][v_{\ell,1},...,v_{\ell,H}]^\top.

Mechanistic Interpretability in Diffusion Models

  • Given NN concepts and HH heads, for each prompt, timestep, and head, find the concept nn^* with top average spatial activation.
  • Increment HRVn[h]\mathrm{HRV}_{n^*}[h].
  • After all data, normalize each HRVn\mathrm{HRV}_n so that h=1HHRVn[h]=H\sum_{h=1}^H \mathrm{HRV}_n[h] = H.

Contrastive Retrieval Head Scoring

  • Aggregate per-head query-to-document attention for both gold and negative documents.
  • Apply a softmax-based contrastive metric SCoRe(h)S_{\mathrm{CoRe}}(h).
  • Select top heads by average SCoRe(h)S_{\mathrm{CoRe}}(h) over samples.

Latent Disentanglement for Behavioral Relevance

  • Train, per head, a VQ-AE on last-token activations, partition code as per semantic units.
  • Add a supervised contrastive loss forcing separation of encodings from aligned vs violating behaviors.
  • Designate as HRV the units with high class-separability; final score is given by a binary classification (AUC) of generated codes.

Audio Relevance Heads

  • Decompose time-frequency filterbank output into HH sub-bands, each processed by a two-layer FC network to generate a soft mask RhR_h over sub-band bins.
  • The relevance mask RhR_h serves as the HRV for head hh.

3. Applications and Empirical Insights

HRVs have driven advances in model efficiency, interpretability, retrieval, and controlled generation:

Application Head Selection Criterion Empirical Highlights
SparseMM MLLMs Visual alignment via token attribution <5% heads suffice for visual tasks, 1.38× speedup, 52% memory reduction (Wang et al., 5 Jun 2025)
Cross-attn Diffusion Human concept-alignment in CA heads HRVs enable concept-strengthening, reducing polysemy errors from 63%→15.9% (Park et al., 2024)
Retrieval Reranking InfoNCE-style contrast of gold vs negatives <1% heads optimal, +1–4 nDCG points, 20% latency/40% memory savings after layer pruning (Tran et al., 2 Oct 2025)
Audio Classification Mask generation over local TF sub-bands 10–23% accuracy gains at <0.1% param increase (Dutta et al., 2021)
Causal LLM Steering VQ-AE/contrastive latent separation 20% accuracy boost for truthfulness interventions (Zhan et al., 10 Jun 2025)

Preserving only the top-KK heads by HRV scores often matches or outperforms full-head baselines, with heads concentrated in mid-layers and task-relevant heads forming a small, robust subset.

4. Inference Manipulation and Resource Allocation

SparseMM operationalizes HRVs for memory and compute savings by asymmetric KV-cache allocation (Wang et al., 5 Jun 2025):

  • Each head (,h)(\ell,h) receives a combined KV budget:

b,h=w+r+b,hscoreb_{\ell,h} = w + r + b_{\ell,h}^{\text{score}}

with local window ww, uniform baseline rr, and remaining cache allocated in proportion to v,hv_{\ell,h}.

  • During decoding, heads retain only their most-attended keys up to their respective b,hb_{\ell,h}.
  • Ablating low-relevance heads (95%+) yields negligible accuracy drop; on DocVQA, 5.3% of full cache suffices for Qwen2-VL-7B.

In generative vision models (Park et al., 2024), HRVs enable direct rescaling of per-head cross-attention weights for concept strengthening and adjusting:

  • For desired concept CdC_d, rescale CA maps as Ai,j(t,h)HRVd[h]Ai,j(t,h)A^{(t,h)}_{i,j^*} \leftarrow \mathrm{HRV}_d[h] \cdot A^{(t,h)}_{i,j^*}.
  • For both desired and undesired concepts, interpolate head-wise as r=2HRVd1HRVur = 2\,\mathrm{HRV}_d - 1\,\mathrm{HRV}_u.

For causal behavioral steering, HRVs identify which heads to intervene on and provide per-head importance weights for steering vectors (Zhan et al., 10 Jun 2025).

5. Interpretability, Clustering, and Specialization

HRVs empirically align with human-specified or downstream concepts:

  • Ordered weakening: systematically ablating heads in order of decreasing HRV for a concept causes earlier and steeper loss of that concept in generative output (Park et al., 2024).
  • In clustering analyses, HRVs for semantically similar concepts cluster distinctly in the HRV space, reinforcing interpretability claims.
  • In audio, visualizing RhR_h demonstrates functional specialization—e.g., one head accentuating high-frequency transients, another heightening low-frequency backgrounds (Dutta et al., 2021).
  • In retrieval, aggregation over a single high-relevance head can outperform full-head schemes (Tran et al., 2 Oct 2025).

6. Limitations and Prospective Developments

Limitations include:

  • Degraded ranking/weak interpretability for diffuse or ambiguous concepts (e.g., numeracy, facial expressions) (Park et al., 2024).
  • Simple HRV normalizations may be inadequate for very large head counts (H>1000H>1000); alternative scaling or clamping may be needed (Park et al., 2024).
  • For causal interventions, incomplete disentanglement or over-pruning can limit transferability (Zhan et al., 10 Jun 2025).

Prospective directions span:

  • Fully automated pipelines for target token/concept selection.
  • Improved HRV normalization for large-head models.
  • Extension to other architectures (e.g., non-diffusion multimodal models).
  • Deeper investigation of HRVs in self-attention vs. cross-attention, and the effects of architecture or fine-tuning.

7. Summary Table: HRV Methodologies in Recent Literature

Domain/Model HRV Definition Selection/Analysis Method Notable Results Reference
MLLMs (SparseMM) Visual token attribution Training-free response analysis <5% heads needed for accuracy, 1.38× speed, 52% KV reduction (Wang et al., 5 Jun 2025)
Text-to-Image Diffusion Concept activation counts CA map activation + clustering 4–12% metric gains, drastic polysemy error drop (Park et al., 2024)
Retrieval/Reranking InfoNCE contrast Contrastive gold-vs-neg analysis 1% heads optimal; layer pruning yields efficiency (Tran et al., 2 Oct 2025)
Audio Representation Sub-band context masking Per-head 2-layer nets, end-to-end +10–23% accuracy improvements over baseline (Dutta et al., 2021)
Causal LLM Steering VQ-AE latent partitioning Behavior discriminative contrast 20–81.5% boost in target steering, zero-shot transfer (Zhan et al., 10 Jun 2025)

Across all observed settings, HRVs offer a principled, interpretable, and efficient mechanism for fine-grained network analysis, head selection, memory/computation savings, and targeted model control.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Head Relevance Vectors (HRVs).