Generality of the KV-retrieval probe for head selection

Investigate whether head selection based on the synthetic key–value retrieval ablation probe identifies all components required for other retrieval settings, such as multi-hop retrieval, and determine which additional components, if any, are needed for these scenarios.

Background

The method ranks and retains attention heads using an ablation-based synthetic KV-retrieval probe that correlates with retrieval-heavy performance. While effective on benchmarks considered, its coverage of diverse retrieval phenomena is uncertain.

The authors explicitly flag the possibility that tasks requiring different retrieval types (e.g., multi-hop retrieval) may depend on components not captured by the KV-retrieval probe, making it important to assess and possibly extend the probe’s adequacy.

References

Our study leaves several open questions about when retrieval-aware distillation transfers cleanly. Second, head selection relies on the synthetic KV-retrieval probe of \citet{gather_and_aggregate}. While it captures many benchmarks, other retrieval settings (e.g., multi-hop retrieval) may rely on components not identified by this probe.

Retrieval-Aware Distillation for Transformer-SSM Hybrids  (2602.11374 - Bick et al., 11 Feb 2026) in Section: Conclusion and Future Work