Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

Published 25 Mar 2020 in cs.CV | (2003.11211v1)

Abstract: We propose an efficient pipeline for large-scale landmark image retrieval that addresses the diversity of the dataset through two-stage discriminative re-ranking. Our approach is based on embedding the images in a feature-space using a convolutional neural network trained with a cosine softmax loss. Due to the variance of the images, which include extreme viewpoint changes such as having to retrieve images of the exterior of a landmark from images of the interior, this is very challenging for approaches based exclusively on visual similarity. Our proposed re-ranking approach improves the results in two steps: in the sort-step, $k$-nearest neighbor search with soft-voting to sort the retrieved results based on their label similarity to the query images, and in the insert-step, we add additional samples from the dataset that were not retrieved by image-similarity. This approach allows overcoming the low visual diversity in retrieved images. In-depth experimental results show that the proposed approach significantly outperforms existing approaches on the challenging Google Landmarks Datasets. Using our methods, we achieved 1st place in the Google Landmark Retrieval 2019 challenge and 3rd place in the Google Landmark Recognition 2019 challenge on Kaggle. Our code is publicly available here: \url{https://github.com/lyakaap/Landmark2019-1st-and-3rd-Place-Solution}

Abstract PDF Upgrade to Chat

Citations (26)

View on Semantic Scholar

Summary

The paper presents a novel two-stage re-ranking method that integrates label similarity with CNN embeddings to enhance landmark retrieval accuracy.
It utilizes a k-nearest neighbor search with soft voting in the sort-step followed by an insert-step to incorporate overlooked yet relevant samples.
The approach outperforms traditional methods, achieving top placement on the Google Landmarks Dataset and addressing visual diversity challenges.

Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval

In the context of image retrieval, particularly for landmark images, the challenge is to accurately retrieve visually diverse images from a large dataset. The paper "Two-stage Discriminative Re-ranking for Large-scale Landmark Retrieval" by Yokoo et al. addresses this issue through a novel two-stage discriminative re-ranking approach. This approach leverages a convolutional neural network (CNN) trained with a cosine softmax loss to embed images into a feature space, thereby overcoming the limitations of relying solely on visual similarity.

Methodology

The proposed methodology comprises two main steps: Sort-step and Insert-step. Initially, the paper employs a k-nearest neighbor (k-NN) search with soft voting to rank the retrieved results, prioritizing label similarity to the query image. Subsequently, in the Insert-step, it introduces additional samples from the dataset, which were not initially retrieved based solely on image similarity. This two-pronged strategy enhances image retrieval performance by addressing the low visual diversity often found in typical retrieval tasks.

Experimental Results

The paper highlights significant improvements over existing methods, particularly when tested on the Google Landmarks Dataset (GLD). Notably, the methodology achieved top placement in the Google Landmark Retrieval and Recognition Challenges of 2019. These experimental results underscore the effectiveness of the approach, with the re-ranking strategy outperforming baselines and other re-ranking methods like spatial verification and diffusion.

Discussion

The ability to handle diverse and visually dissimilar images—such as distinguishing between interior and exterior shots of landmarks—is a prominent strength of this approach. By incorporating label information into the re-ranking process, the method sidesteps the pitfalls of a purely visual-based retrieval. The research also proposes an automated data cleaning strategy that, although costly, improves the quality of the training set by using spatial verification.

Implications and Future Directions

The implications of this research extend to various applications within computer vision and large-scale image retrieval systems. As the dataset diversity becomes more representative of real-world scenarios, the methods discussed could substantially improve both user experience and retrieval accuracy in practical systems. Future work could explore the integration of additional contextual information or explore other re-ranking methodologies that further enhance retrieval precision.

In conclusion, this paper presents a robust methodology for improving landmark image retrieval, leveraging a sophisticated re-ranking mechanism that significantly advances the state-of-the-art in this domain. The integration of label information and a discriminative approach outline a promising direction for future research in efficient image retrieval.

Markdown