Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges (2102.12219v2)

Published 24 Feb 2021 in cs.CV

Abstract: In the past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird's-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper,we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous challenges run on DOTA have attracted more than 1300 teams worldwide. We believe that the expanded large-scale DOTA dataset, the extensive baselines, the code library and the challenges can facilitate the designs of robust algorithms and reproducible research on the problem of object detection in aerial images.

Citations (317)

View on Semantic Scholar

Summary

The paper introduces DOTA, a massive dataset with 1,793,658 aerial object instances across 18 categories.
It employs oriented bounding boxes to precisely address challenges of variable scales and arbitrary object orientations.
The study establishes comprehensive baselines by evaluating 10 algorithms over 70 configurations to guide future research.

Overview of the "Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges" Paper

This paper provides a comprehensive paper on object detection in aerial images, a domain that has lagged behind natural image object detection in terms of development and progress. The paper highlights the twin challenges of variable object scales and orientations in aerial imagery due to the unique bird's-eye view. Crucially, it addresses one major hurdle: the lack of a large-scale benchmark dataset, introducing the Dataset of Object deTection in Aerial (DOTA) images, and further enhancing this repository to facilitate research in this area.

DOTA Dataset and Baselines

The DOTA dataset is meticulously curated to overcome existing limitations. It consists of 1,793,658 object instances spanning 18 categories, annotated with oriented bounding boxes (OBBs) collected from 11,268 aerial images. This dataset is considerably larger than previous efforts, providing a rich foundation for training and evaluating object detection models.

The significance of the DOTA dataset is further reinforced by the establishment of comprehensive baselines. Researchers evaluated 10 state-of-the-art algorithms across more than 70 configurations. Metrics related to speed and accuracy were thoroughly assessed, creating a unified benchmark for future studies. This initiative also included a code library and a dedicated website, encouraging consistent and reproducible evaluation.

Key Contributions and Dataset Characteristics

Dataset Scale and Scope: DOTA is the largest public dataset for object detection in aerial imagery, covering a wide array of real-world scenarios with varying densities, orientations, and scales of objects. This makes it ideal for developing robust detection algorithms that can generalize across diverse conditions.
Challenges for Algorithm Development: The inclusion of OBBs distinguishes DOTA by allowing precise localization, which is crucial given the arbitrary orientations typical in aerial images. The authors also emphasize that conventional horizontal bounding box representations (as used in datasets like COCO) would be inadequate for capturing the nuances of aerial object detection.
Baseline Frameworks and Evaluations: The paper presents baseline algorithms that leverage this dataset. Notably, algorithms such as Faster R-CNN and RetinaNet were modified to handle the specific challenges of aerial imagery. The significance of geometric transformations, rotational augmentations, and multi-scale approaches were collectively validated.

Implications and Future Perspectives

The introduction of DOTA and its extensive evaluation protocols have significant implications:

Algorithm Robustness: Models benchmarked on DOTA are expected to handle variability in orientation and scale with greater efficacy, potentially benefiting other vision tasks requiring similar properties.
Advancing Dataset Usage: The high number of instances and diverse scenarios present in DOTA position it as a vital resource for pushing the boundaries of object detection.
Promoting Collaborative Research: By openly providing resources and capturing worldwide attention through challenges, the authors set a collaborative platform for future research developments.

The research extends beyond creating a dataset; it instills a decidedly collaborative framework that aligns with the future trajectory of AI, particularly in aerial and remote sensing applications. The groundwork laid can catalyze further research into more generalized and adaptive detection methods, potentially influencing parallel domains within computer vision.

PDF Markdown