University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization (2002.12186v2)

Published 27 Feb 2020 in cs.CV

Abstract: We consider the problem of cross-view geo-localization. The primary challenge of this task is to learn the robust feature against large viewpoint changes. Existing benchmarks can help, but are limited in the number of viewpoints. Image pairs, containing two viewpoints, e.g., satellite and ground, are usually provided, which may compromise the feature learning. Besides phone cameras and satellites, in this paper, we argue that drones could serve as the third platform to deal with the geo-localization problem. In contrast to the traditional ground-view images, drone-view images meet fewer obstacles, e.g., trees, and could provide a comprehensive view when flying around the target place. To verify the effectiveness of the drone platform, we introduce a new multi-view multi-source benchmark for drone-based geo-localization, named University-1652. University-1652 contains data from three platforms, i.e., synthetic drones, satellites and ground cameras of 1,652 university buildings around the world. To our knowledge, University-1652 is the first drone-based geo-localization dataset and enables two new tasks, i.e., drone-view target localization and drone navigation. As the name implies, drone-view target localization intends to predict the location of the target place via drone-view images. On the other hand, given a satellite-view query image, drone navigation is to drive the drone to the area of interest in the query. We use this dataset to analyze a variety of off-the-shelf CNN features and propose a strong CNN baseline on this challenging dataset. The experiments show that University-1652 helps the model to learn the viewpoint-invariant features and also has good generalization ability in the real-world scenario.

Citations (198)

View on Semantic Scholar

Summary

The paper introduces a novel benchmark dataset combining drone, satellite, and ground imagery to overcome limitations of traditional two-view benchmarks.
It employs multi-branch CNNs with instance loss to extract invariant features, significantly improving drone-to-satellite image retrieval accuracy.
Experimental results demonstrate that multi-view inputs enhance geo-localization performance, offering practical benefits for navigation and other real-world applications.

University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization

The research paper presents University-1652, an innovative multi-view multi-source benchmark dataset specifically crafted to advance the field of cross-view geo-localization. This dataset uniquely incorporates data from drones, ground cameras, and satellite images, a novel combination that allows for comprehensive studies in geo-localization tasks. Traditional benchmarks in this research area typically revolve around matching image pairs sourced from ground views and satellite views, which can restrict the learning model's ability to adapt to viewpoint variations. By including drone imagery, the University-1652 benchmark addresses the inherent limitations of what the authors describe as traditional two-viewpoint datasets.

Dataset Overview

University-1652 contains data pertaining to 1,652 university buildings globally, captured from three distinct viewpoints: synthetic drone images, satellite images, and ground images. The dataset was meticulously curated with the aid of 3D models from Google Earth to simulate realistic drone flight paths, providing 54 synthetic images per location. These images adjust for the variations in scale and angle, offering a rich data corpus for learning robust, invariant features adaptable to a diverse array of viewpoints.

Geo-localization Tasks

This dataset facilitates two primary geo-localization tasks:

Drone-view Target Localization (Drone → Satellite): This involves using drone-view images to predict and match target locations with corresponding satellite images.
Drone Navigation (Satellite → Drone): This task entails navigating a drone based on satellite view queries to find the location it has passed over, using aerial image retrieval techniques.

Methodology and Baseline Models

To analyze the utility of the University-1652 dataset, the authors propose several models employing convolutional neural networks (CNNs) to extract visual features, applying different loss functions like instance loss, contrastive loss, and triplet loss to optimize these models. Notably, the paper shows that instance loss, coupled with a multi-branch CNN architecture, is particularly effective for training models on this dataset. The CNNs, by sharing weights across certain branches, leverage inter-platform relationships to learn a shared feature space, enhancing the model's capability to generalize across different viewpoints and platforms.

Experimental Findings

Experiments on the dataset revealed several insights:

Drone-view queries significantly outperform ground-view queries in retrieving satellite images, establishing the drone’s vantage point as more conducive to bridging the visual gap between ground and aerial views.
Utilizing multiple drone-view queries leads to enhanced retrieval accuracy, suggesting the merit of multi-view input in geo-localization systems.
Comparison with existing models finely tuned on extensive datasets like ImageNet emphasized the superior performance of models trained directly on University-1652, affirming the dataset’s efficacy in fostering viewpoint-invariant feature learning.

Practical and Theoretical Implications

The introduction of drone imagery into geo-localization benchmarks opens new avenues for practical applications spanning agriculture, navigation, and event detection, where multi-faceted views provide additional context crucial for precision localization. Theoretically, this dataset and accompanying methodological insights advance the understanding of feature learning under diverse environmental conditions. The dataset significantly augments the workstation of geo-localization models, enhancing their adaptability and precision in real-world scenarios.

Future Directions

The research identifies areas for future exploration, particularly in leveraging the multi-view nature of the dataset for more advanced geo-localization tasks. Integrating real drone flights and refining synthetic data generation could further optimize model performance. Additionally, exploring the metadata associated with the dataset could yield new methodologies in enhancing geo-localization efficiency and accuracy.

In summary, University-1652 sets a new benchmark in the domain of cross-view geo-localization by providing a comprehensive dataset and robust tasks that collectively drive the development and evaluation of more adaptive and effective localization models.

PDF Markdown

Related Papers

YouTube

Show All Videos