Night-to-Day Image Translation for Retrieval-based Localization (1809.09767v2)

Published 26 Sep 2018 in cs.CV

Abstract: Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query's pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings. In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN - a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

Authors (5)

Asha Anoosheh (4 papers)
Torsten Sattler (72 papers)
Radu Timofte (299 papers)
Marc Pollefeys (230 papers)
Luc Van Gool (570 papers)

Citations (206)

View on Semantic Scholar

Summary

The paper presents ToDayGAN, a tailored GAN that translates nighttime images into daytime-like representations to mitigate domain shifts in visual localization.
The methodology adapts ComboGAN with multiple discriminators focusing on color, luminance, and gradients, integrated with DenseVLAD for robust feature extraction.
Quantitative results show a 250% performance improvement on the RobotCar Seasons dataset, highlighting practical benefits for autonomous driving.

Overview of "Night-to-Day Image Translation for Retrieval-based Localization"

The paper "Night-to-Day Image Translation for Retrieval-based Localization" presents a novel methodology aimed at improving visual localization systems, particularly for scenarios involving drastic differences in illumination conditions, such as from nighttime to daytime images. Localization within robotics relies heavily on determining a robot's position and orientation through visual cues, often employing image retrieval techniques that compare a query image with a database of geotagged reference images. Despite advancements in neural models, such tasks present challenges, especially when query and reference images differ significantly in lighting conditions.

Core Contribution: ToDayGAN

The authors introduce ToDayGAN, a modified image-translation model designed to convert nighttime images into daytime-like representations, facilitating more reliable localization by leveraging similarities with daytime reference images. The primary goal of ToDayGAN is to address the domain shift between different lighting conditions, translating nighttime driving images for more effective retrieval-based localization as compared to their daytime counterparts.

Key Methodological Insights

Image-Translation Model: ToDayGAN utilizes a neural architecture built upon ComboGAN, specifically modified to accommodate the task of image translation between night-to-day domains. The model incorporates multiple discriminators, each focusing on distinct image features such as color, luminance, and gradients, thus enhancing the translated image's utility for DenseVLAD comparison, a feature extractor for image matching tasks.
DenseVLAD Integration: The approach improves visual localization by employing the DenseVLAD framework, known for producing robust image descriptors, particularly valuable in this context for handling complex, contrastive lighting scenarios.
Quantitative Results: A significant outcome of their approach is the 250% improvement in localization performance over state-of-the-art methods, evaluated on the RobotCar Seasons dataset. This is evidenced by enhanced accuracy in determining the pose of nighttime queries when translated to daytime representations.

Implications and Future Directions

The paper highlights the potential of leveraging generative adversarial networks for practical applications like autonomous driving, where localizing vehicles based on prior visual data is crucial. The integration of image-to-image translation techniques within localization pipelines addresses domain adaptation challenges effectively, thus opening avenues for utilizing similar strategies in broader contexts with domain shifts such as weather variations.

The use of multi-discriminators focusing on independent image characteristics within ToDayGAN suggests future work could explore fine-tuning discriminators to leverage domain-specific features or expand to other environmental conditions. The effectiveness of combining translation with robust featurization techniques like DenseVLAD demonstrates the utility of hybrid models in tackling computer vision tasks involving significant environmental variations.

In summary, the paper delivers a methodologically sound and empirically validated contribution to the field of visual localization. By addressing the illumination variance challenge inherent in image matching tasks through ToDayGAN, the work sets a precedent for further exploration of image translation models in practical, real-world applications across robotics and autonomous systems.

PDF Markdown