Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Robust Visual Localization via Semantic-Guided Multi-Scale Transformer (2506.08526v1)

Published 10 Jun 2025 in cs.CV

Abstract: Visual localization remains challenging in dynamic environments where fluctuating lighting, adverse weather, and moving objects disrupt appearance cues. Despite advances in feature representation, current absolute pose regression methods struggle to maintain consistency under varying conditions. To address this challenge, we propose a framework that synergistically combines multi-scale feature learning with semantic scene understanding. Our approach employs a hierarchical Transformer with cross-scale attention to fuse geometric details and contextual cues, preserving spatial precision while adapting to environmental changes. We improve the performance of this architecture with semantic supervision via neural scene representation during training, guiding the network to learn view-invariant features that encode persistent structural information while suppressing complex environmental interference. Experiments on TartanAir demonstrate that our approach outperforms existing pose regression methods in challenging scenarios with dynamic objects, illumination changes, and occlusions. Our findings show that integrating multi-scale processing with semantic guidance offers a promising strategy for robust visual localization in real-world dynamic environments.

Summary

We haven't generated a summary for this paper yet.