Spatiotemporally Grounded Multimodal History for Long-Horizon UAV VLN
Develop a unified method to elevate multimodal historical information—specifically historical visual observations and flight trajectory history—from static memory into a spatiotemporally grounded context that is tightly coupled with natural language instructions and navigation decisions for long-horizon unmanned aerial vehicle vision-and-language navigation.
References
Overall, how to elevate multimodal historical information from static memory to a spatiotemporally grounded context that is tightly coupled with language and navigation remains a key open problem in long-horizon UAV VLN.
— LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration
(2512.22010 - Jiang et al., 26 Dec 2025) in Section 2.2 (Long-horizon vision-and-language navigation for UAVs)