iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images (1905.12886v2)

Published 30 May 2019 in cs.CV and cs.LG

Abstract: Existing Earth Vision datasets are either suitable for semantic segmentation or object detection. In this work, we introduce the first benchmark dataset for instance segmentation in aerial imagery that combines instance-level object detection and pixel-level segmentation tasks. In comparison to instance segmentation in natural scenes, aerial images present unique challenges e.g., a huge number of instances per image, large object-scale variations and abundant tiny objects. Our large-scale and densely annotated Instance Segmentation in Aerial Images Dataset (iSAID) comes with 655,451 object instances for 15 categories across 2,806 high-resolution images. Such precise per-pixel annotations for each instance ensure accurate localization that is essential for detailed scene analysis. Compared to existing small-scale aerial image based instance segmentation datasets, iSAID contains 15$\times$ the number of object categories and 5$\times$ the number of instances. We benchmark our dataset using two popular instance segmentation approaches for natural images, namely Mask R-CNN and PANet. In our experiments we show that direct application of off-the-shelf Mask R-CNN and PANet on aerial images provide suboptimal instance segmentation results, thus requiring specialized solutions from the research community. The dataset is publicly available at: https://captain-whu.github.io/iSAID/index.html

Citations (296)

View on Semantic Scholar

Summary

The paper introduces a dataset with 655,451 annotated instances across 15 categories, surpassing prior aerial segmentation benchmarks.
It benchmarks established models like Mask R-CNN and PANet, revealing their limitations when applied to aerial images.
The dataset drives research into developing specialized algorithms for handling high instance density, scale variation, and precise localization in aerial imagery.

An Overview of the iSAID Dataset for Instance Segmentation in Aerial Images

The paper "iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images" presents a comprehensive dataset specifically designed to address the unique challenges of instance segmentation in aerial imagery. Historically, datasets for Earth Vision, the domain encompassing satellite and aerial images, have primarily focused on either semantic segmentation or object detection. The iSAID dataset marks a significant step forward as it integrates instance-level object detection with pixel-level segmentation tasks, uniquely tailored for aerial images.

Overview and Dataset Composition

iSAID stands out with its substantial scale, consisting of 655,451 object instances across 15 distinct categories in 2,806 high-resolution images. This makes it markedly larger than prior aerial image datasets, with iSAID offering 15 times the number of object categories and five times the number of instances compared to existing benchmarks. The dataset annotates each instance with accuracy, allowing precise pixel-level localization essential for detailed scene analysis.

Methodology and Benchmarking

The authors benchmark iSAID using two established instance segmentation approaches originally designed for natural images: Mask R-CNN and PANet. The results demonstrate the inherent challenges when directly applying these off-the-shelf algorithms on aerial datasets. The performance was found suboptimal, highlighting the necessity for specialized solutions tailored to the virtual environment presented by aerial imagery.

The dataset exhibits several distinctive features that contribute to these challenges, including a large number of instances per image, significant object scale variation, and a high frequency of tiny objects. Such characteristics demand methodological advancements within the research community to adapt existing algorithms or develop novel techniques that cater specifically to the nuances of aerial images.

Implications and Future Directions

The introduction of iSAID provides a significant impetus for advancements in the high-level interpretation of aerial imagery. Practically, this can aid in various application domains like surveillance, urban planning, and environmental monitoring by enhancing our ability to accurately identify and localize objects in complex aerial scenarios. Theoretically, the dataset offers a fertile ground for exploring novel algorithmic methodologies that extend beyond the capabilities of current deep learning models.

Future research directions might include developing models that can effectively handle the imbalanced distribution of classes, the high density of objects per image, and the large scale variations that are characteristic of the iSAID dataset. Moreover, integrating this dataset with hybrid data sources or other urban datasets could provide comprehensive insights into multimodal scene understanding.

Conclusion

The iSAID dataset is a strategic contribution to the field of instance segmentation in aerial images, bridging a crucial gap in data availability. By presenting a large-scale, richly annotated resource, this work enables the community to explore and overcome the unique challenges posed by aerial imagery. The numerical results indicate both the challenges and opportunities present in adapting existing technologies to aerial contexts, setting the stage for significant scientific exploration and practical advancements in aerial image analysis.

PDF Markdown

Related Papers

GitHub

iSAID