Generalization of NanoVDR beyond visual document retrieval with text-only queries
Determine whether the NanoVDR asymmetric cross-modal distillation framework generalizes beyond visual document retrieval first-stage ranking with text-only queries to other retrieval settings.
References
We evaluate exclusively on visual document retrieval (first-stage ranking) with text-only queries; whether the framework generalizes to other retrieval settings is left to future investigation.
— NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
(2603.12824 - Liu et al., 13 Mar 2026) in Limitations