Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval (2004.01804v2)

Published 3 Apr 2020 in cs.CV

Abstract: While image retrieval and instance recognition techniques are progressing rapidly, there is a need for challenging datasets to accurately measure their performance -- while posing novel challenges that are relevant for practical applications. We introduce the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks. The ground truth construction involved over 800 hours of human annotator work. Our new dataset has several challenging properties inspired by real world applications that previous datasets did not consider: An extremely long-tailed class distribution, a large fraction of out-of-domain test photos and large intra-class variability. The dataset is sourced from Wikimedia Commons, the world's largest crowdsourced collection of landmark photos. We provide baseline results for both recognition and retrieval tasks based on state-of-the-art methods as well as competitive results from a public challenge. We further demonstrate the suitability of the dataset for transfer learning by showing that image embeddings trained on it achieve competitive retrieval performance on independent datasets. The dataset images, ground-truth and metric scoring code are available at https://github.com/cvdfoundation/google-landmark.

Citations (322)

View on Semantic Scholar

Summary

The paper introduces GLDv2 as the largest instance-level benchmark, featuring over 5 million images and more than 200,000 landmark labels.
It details a rigorous construction process, sourcing diverse images from Wikimedia Commons and utilizing 800+ hours of human annotation.
Baseline evaluations and public challenges highlight GLDv2’s impact on advancing robust recognition and transfer learning techniques.

Analysis of the Google Landmarks Dataset v2

The paper "Google Landmarks Dataset v2: A Large-Scale Benchmark for Instance-Level Recognition and Retrieval" presents the Google Landmarks Dataset v2 (GLDv2), which serves as a robust benchmark for image retrieval and instance recognition, specifically in the identification of human-made and natural landmarks. This dataset marks a significant contribution in terms of scale and intricacy, surpassing previous datasets like the original Oxford and Paris datasets, both in the number of images and landmark classes it encompasses.

Dataset Composition

GLDv2 comprises over 5 million images and more than 200,000 distinct instance labels, establishing itself as the largest dataset of its kind. It is partitioned into three subsets: a training set with 4.1 million images, an index set with 762,000 images, and a query set with 118,000 images, specially designed to mimic real-world applications. A notable feature is its long-tailed class distribution, challenging researchers to address issues of class imbalance effectively.

Methodological Framework

The paper outlines the rigorous construction process of the dataset. Images were sourced primarily from Wikimedia Commons, thus ensuring diversity and real-world applicability. Annotation involved over 800 hours of human effort to enhance the dataset's reliability. The dataset not only serves immediate image retrieval tasks but offers transfer learning prospects—models trained on GLDv2 have demonstrated competitive performance on other datasets.

Challenges and Evaluation

GLDv2 introduces several challenges reflecting real-world scenarios: class imbalance and significant intra-class variability, with images depicting landmarks from different viewpoints, under various lighting conditions, and across different seasons. Evaluation metrics have been designed to assess both retrieval and recognition accuracies with a focus on precision and robustness against out-of-domain queries.

Baseline Results and Public Challenge

The paper provides baseline results utilizing state-of-the-art methods. These benchmark results serve as reference points for future research. In addition, GLDv2 was part of public challenges on Kaggle, encouraging a wide array of approaches and solutions, which further demonstrate the dataset's applicability and scope.

Implications and Future Directions

The implications of GLDv2 are substantial for both the practical deployment of image recognition systems and theoretical advancements in the field. By offering a dataset with notable scale and complexity, GLDv2 incentivizes the development of new algorithms to handle real-world data more robustly. Future directions may involve enhancing class representation further, improving transfer learning capabilities, and adapting the dataset for other instance-level recognition tasks.

In sum, the Google Landmarks Dataset v2 is an essential milestone in the landscape of visual recognition research, providing a rich resource for addressing the contemporary challenges of instance-level recognition and retrieval.

PDF Markdown

Related Papers

GitHub

GitHub - cvdfoundation/google-landmark: Dataset with 5 million images depicting human-made and natural landmarks spanning 200 thousand classes. (793 stars)

YouTube

Show All Videos