Temporal Encoding for VLM-Based View Selection
Develop an effective representation and encoding strategy for the continuously expanding stream of egocentric visual observations that enables pre-trained vision-language models to perform robust temporal reasoning when selecting future viewpoints for mapless open-vocabulary visual navigation.
Sponsor
References
A major open challenge that arises in the context of VLM-based view selection is how to effectively encode the continuously expanding stream of observations to endow pre-trained VLMs with temporal reasoning capabilities.
— ImagineNav++: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
(2512.17435 - Wang et al., 19 Dec 2025) in Section 1, Introduction