Generality of the KV-retrieval probe for head selection
Investigate whether head selection based on the synthetic key–value retrieval ablation probe identifies all components required for other retrieval settings, such as multi-hop retrieval, and determine which additional components, if any, are needed for these scenarios.
References
Our study leaves several open questions about when retrieval-aware distillation transfers cleanly. Second, head selection relies on the synthetic KV-retrieval probe of \citet{gather_and_aggregate}. While it captures many benchmarks, other retrieval settings (e.g., multi-hop retrieval) may rely on components not identified by this probe.
— Retrieval-Aware Distillation for Transformer-SSM Hybrids
(2602.11374 - Bick et al., 11 Feb 2026) in Section: Conclusion and Future Work