Contextual reasoning over visual histories for image retrieval
Develop robust algorithms for corpus-level contextual reasoning over user visual histories that can accurately retrieve the target image sets specified by natural-language queries in the DeepImageSearch task and its DISBench benchmark, requiring multi-step exploration and cross-event association discovery.
References
Extensive experiments demonstrate that this task poses significant challenges for state-of-the-art models, confirming that contextual reasoning over visual histories remains an open problem.
— DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
(2602.10809 - Deng et al., 11 Feb 2026) in Section 6. Conclusion