Head Relevance Vectors (HRVs)

Updated 15 March 2026

HRVs are per-head vectors or scalars that measure the contribution of individual attention heads in transformer models, providing clear quantification of task-specific relevance.
They use techniques such as token attribution, contrastive retrieval, and latent disentanglement to identify and manipulate the most discriminative heads without altering model structure.
HRVs enhance model efficiency and control by enabling selective resource allocation and improved interpretability across applications like multimodal LLMs, diffusion models, retrieval, and audio processing.

Head Relevance Vectors (HRVs) quantify the task-specific or concept-level contribution of individual heads within attention-based neural architectures, enabling the principled identification, selection, and manipulation of the most discriminative or semantically aligned heads without intrusive model modifications. HRVs have been formalized, utilized, and experimentally validated across domains including multimodal LLMs, text-to-image generative models, attention-based retrieval/reranking, audio representation learning, and causal LLM steering.

1. Mathematical Formulations and Notational Scope

HRVs are typically constructed as per-head vectors or scalars reflecting each head's relevance to a downstream objective or human-interpretable visual concept. Across transformers with $L$ layers and $H$ heads per layer, the model-wise HRV is a concatenation of per-layer vectors $\mathrm{HRV}_\ell \in \mathbb R^H$ for $\ell = 1,\ldots,L$ , resulting in $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ or, for cross-attention, $\mathrm{HRV}_n \in \mathbb R^H$ per concept $C_n$ .

For attention-based visual relevance in multimodal LLMs (Wang et al., 5 Jun 2025), scalar scores $v_{\ell,h}$ for head $(\ell,h)$ are defined as: $v_{\ell,h} = S_{\ell,h} \bigg/ \sum_{\ell',h'} S_{\ell',h'}$ where $H$ 0 and $H$ 1 iff the maximal-attention key for output token $H$ 2 falls within the set $H$ 3 of image tokens associated with $H$ 4.

In concept-aligned diffusion models (Park et al., 2024), an HRV for concept $H$ 5 is defined as: $H$ 6 where each element counts (and is later L1-normalized across heads) the number of generations in which head $H$ 7 most responds to $H$ 8.

For contrastive retrieval, a relevance score per head is given by an InfoNCE-like contrast between positive and negative document attention (Tran et al., 2 Oct 2025): $H$ 9 where $\mathrm{HRV}_\ell \in \mathbb R^H$ 0 is the mean attention from query to document $\mathrm{HRV}_\ell \in \mathbb R^H$ 1 under head $\mathrm{HRV}_\ell \in \mathbb R^H$ 2.

In causal LLM steering (Zhan et al., 10 Jun 2025), per-head HRVs are constructed as the concatenation of the (discrete or latent) units within a VQ-AE representation identified as behavior-discriminative via supervised contrast.

2. Algorithms for HRV Computation

Training-Free Response Analysis (SparseMM)

Extract all $\mathrm{HRV}_\ell \in \mathbb R^H$ 3 attention matrices for a set of $\mathrm{HRV}_\ell \in \mathbb R^H$ 4 annotated image-text pairs.
For each output token $\mathrm{HRV}_\ell \in \mathbb R^H$ 5, determine the set $\mathrm{HRV}_\ell \in \mathbb R^H$ 6 (image tokens corresponding to $\mathrm{HRV}_\ell \in \mathbb R^H$ 7).
For each head, increment $\mathrm{HRV}_\ell \in \mathbb R^H$ 8 by $\mathrm{HRV}_\ell \in \mathbb R^H$ 9 if $\ell = 1,\ldots,L$ 0.
Normalize all $\ell = 1,\ldots,L$ 1 over heads to get $\ell = 1,\ldots,L$ 2.
HRVs are then per-layer vectors $\ell = 1,\ldots,L$ 3.

Mechanistic Interpretability in Diffusion Models

Given $\ell = 1,\ldots,L$ 4 concepts and $\ell = 1,\ldots,L$ 5 heads, for each prompt, timestep, and head, find the concept $\ell = 1,\ldots,L$ 6 with top average spatial activation.
Increment $\ell = 1,\ldots,L$ 7.
After all data, normalize each $\ell = 1,\ldots,L$ 8 so that $\ell = 1,\ldots,L$ 9.

Contrastive Retrieval Head Scoring

Aggregate per-head query-to-document attention for both gold and negative documents.
Apply a softmax-based contrastive metric $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 0.
Select top heads by average $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 1 over samples.

Latent Disentanglement for Behavioral Relevance

Train, per head, a VQ-AE on last-token activations, partition code as per semantic units.
Add a supervised contrastive loss forcing separation of encodings from aligned vs violating behaviors.
Designate as HRV the units with high class-separability; final score is given by a binary classification (AUC) of generated codes.

Audio Relevance Heads

Decompose time-frequency filterbank output into $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 2 sub-bands, each processed by a two-layer FC network to generate a soft mask $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 3 over sub-band bins.
The relevance mask $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 4 serves as the HRV for head $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 5.

3. Applications and Empirical Insights

HRVs have driven advances in model efficiency, interpretability, retrieval, and controlled generation:

Application	Head Selection Criterion	Empirical Highlights
SparseMM MLLMs	Visual alignment via token attribution	<5% heads suffice for visual tasks, 1.38× speedup, 52% memory reduction (Wang et al., 5 Jun 2025)
Cross-attn Diffusion	Human concept-alignment in CA heads	HRVs enable concept-strengthening, reducing polysemy errors from 63%→15.9% (Park et al., 2024)
Retrieval Reranking	InfoNCE-style contrast of gold vs negatives	<1% heads optimal, +1–4 nDCG points, 20% latency/40% memory savings after layer pruning (Tran et al., 2 Oct 2025)
Audio Classification	Mask generation over local TF sub-bands	10–23% accuracy gains at <0.1% param increase (Dutta et al., 2021)
Causal LLM Steering	VQ-AE/contrastive latent separation	20% accuracy boost for truthfulness interventions (Zhan et al., 10 Jun 2025)

Preserving only the top- $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 6 heads by HRV scores often matches or outperforms full-head baselines, with heads concentrated in mid-layers and task-relevant heads forming a small, robust subset.

4. Inference Manipulation and Resource Allocation

SparseMM operationalizes HRVs for memory and compute savings by asymmetric KV-cache allocation (Wang et al., 5 Jun 2025):

Each head $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 7 receives a combined KV budget:

$\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 8

with local window $\mathrm{HRV} \in \mathbb R^{L \cdot H}$ 9, uniform baseline $\mathrm{HRV}_n \in \mathbb R^H$ 0, and remaining cache allocated in proportion to $\mathrm{HRV}_n \in \mathbb R^H$ 1.

During decoding, heads retain only their most-attended keys up to their respective $\mathrm{HRV}_n \in \mathbb R^H$ 2.
Ablating low-relevance heads (95%+) yields negligible accuracy drop; on DocVQA, 5.3% of full cache suffices for Qwen2-VL-7B.

In generative vision models (Park et al., 2024), HRVs enable direct rescaling of per-head cross-attention weights for concept strengthening and adjusting:

For desired concept $\mathrm{HRV}_n \in \mathbb R^H$ 3, rescale CA maps as $\mathrm{HRV}_n \in \mathbb R^H$ 4.
For both desired and undesired concepts, interpolate head-wise as $\mathrm{HRV}_n \in \mathbb R^H$ 5.

For causal behavioral steering, HRVs identify which heads to intervene on and provide per-head importance weights for steering vectors (Zhan et al., 10 Jun 2025).

5. Interpretability, Clustering, and Specialization

HRVs empirically align with human-specified or downstream concepts:

Ordered weakening: systematically ablating heads in order of decreasing HRV for a concept causes earlier and steeper loss of that concept in generative output (Park et al., 2024).
In clustering analyses, HRVs for semantically similar concepts cluster distinctly in the HRV space, reinforcing interpretability claims.
In audio, visualizing $\mathrm{HRV}_n \in \mathbb R^H$ 6 demonstrates functional specialization—e.g., one head accentuating high-frequency transients, another heightening low-frequency backgrounds (Dutta et al., 2021).
In retrieval, aggregation over a single high-relevance head can outperform full-head schemes (Tran et al., 2 Oct 2025).

6. Limitations and Prospective Developments

Limitations include:

Degraded ranking/weak interpretability for diffuse or ambiguous concepts (e.g., numeracy, facial expressions) (Park et al., 2024).
Simple HRV normalizations may be inadequate for very large head counts ( $\mathrm{HRV}_n \in \mathbb R^H$ 7); alternative scaling or clamping may be needed (Park et al., 2024).
For causal interventions, incomplete disentanglement or over-pruning can limit transferability (Zhan et al., 10 Jun 2025).

Prospective directions span:

Fully automated pipelines for target token/concept selection.
Improved HRV normalization for large-head models.
Extension to other architectures (e.g., non-diffusion multimodal models).
Deeper investigation of HRVs in self-attention vs. cross-attention, and the effects of architecture or fine-tuning.

7. Summary Table: HRV Methodologies in Recent Literature

Domain/Model	HRV Definition	Selection/Analysis Method	Notable Results	Reference
MLLMs (SparseMM)	Visual token attribution	Training-free response analysis	<5% heads needed for accuracy, 1.38× speed, 52% KV reduction	(Wang et al., 5 Jun 2025)
Text-to-Image Diffusion	Concept activation counts	CA map activation + clustering	4–12% metric gains, drastic polysemy error drop	(Park et al., 2024)
Retrieval/Reranking	InfoNCE contrast	Contrastive gold-vs-neg analysis	1% heads optimal; layer pruning yields efficiency	(Tran et al., 2 Oct 2025)
Audio Representation	Sub-band context masking	Per-head 2-layer nets, end-to-end	+10–23% accuracy improvements over baseline	(Dutta et al., 2021)
Causal LLM Steering	VQ-AE latent partitioning	Behavior discriminative contrast	20–81.5% boost in target steering, zero-shot transfer	(Zhan et al., 10 Jun 2025)

Across all observed settings, HRVs offer a principled, interpretable, and efficient mechanism for fine-grained network analysis, head selection, memory/computation savings, and targeted model control.