Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery (1805.09512v1)

Published 24 May 2018 in cs.CV

Abstract: Detection of small objects in large swaths of imagery is one of the primary problems in satellite imagery analytics. While object detection in ground-based imagery has benefited from research into new deep learning approaches, transitioning such technology to overhead imagery is nontrivial. Among the challenges is the sheer number of pixels and geographic extent per image: a single DigitalGlobe satellite image encompasses >64 km2 and over 250 million pixels. Another challenge is that objects of interest are minuscule (often only ~10 pixels in extent), which complicates traditional computer vision techniques. To address these issues, we propose a pipeline (You Only Look Twice, or YOLT) that evaluates satellite images of arbitrary size at a rate of >0.5 km2/s. The proposed approach can rapidly detect objects of vastly different scales with relatively little training data over multiple sensors. We evaluate large test images at native resolution, and yield scores of F1 > 0.8 for vehicle localization. We further explore resolution and object size requirements by systematically testing the pipeline at decreasing resolution, and conclude that objects only ~5 pixels in size can still be localized with high confidence. Code is available at https://github.com/CosmiQ/yolt.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Adam Van Etten (17 papers)
Citations (291)

Summary

Rapid Multi-Scale Object Detection in Satellite Imagery

The paper, "You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery," introduces an innovative approach for detecting small objects in large satellite imagery using deep learning methodologies. The proposed system, YOLT (You Only Look Twice), addresses critical challenges in processing such data efficiently.

Key Challenges and Solutions

Satellite images, unlike typical datasets, pose unique challenges due to their vast scale and pixel density. A DigitalGlobe satellite image can exceed 250 million pixels, representing over 64 km². This scale results in minuscule objects of interest, often around just 10 pixels in size, complicating the detection process by traditional computer vision frameworks.

YOLT tackles these challenges through a novel pipeline capable of evaluating images of arbitrary size at a processing rate of 0.5 km²/s. The method employs a convolutional neural network (CNN) architecture inspired by YOLO (You Only Look Once) but optimized for satellite imagery. The network architecture involves a dense final prediction grid enhancing small object detection precision and rotation invariance.

Numerical Results

YOLT demonstrates viable performance, achieving F1 scores greater than 0.8 for vehicle localization across large test images evaluated at native resolutions. An empirical paper on resolution showed that objects as small as 5 pixels could still be reliably localized.

Practical and Theoretical Implications

The practical implications of this research are significant. YOLT's ability to rapidly process and analyze large-scale satellite imagery makes it a robust tool for various applications, including urban planning, disaster response, and environmental monitoring. Its efficiency (processing up to 30 km² per minute for vehicles/buildings and 6000 km² per minute for airports) suggests its readiness for real-time satellite data analysis, especially when deployed on GPU clusters.

Theoretically, this paper provides insights into adapting ground-based deep learning models to overhead imagery, a transition that involves tackling spatial and rotational complexities. The work also highlights the potential of using multiple-scale models to discern between different sized objects effectively.

Future Developments in AI

Advancements in AI could further enhance the applicability and accuracy of systems like YOLT. Incorporating self-supervised learning to leverage unlabeled satellite data and exploring hybrid models combining CNNs with transformers, could address dataset size limitations and improve performance metrics.

In summary, this paper presents a thorough exploration into the challenges and solutions in satellite imagery analytics. Its contributions and empirical evidence place it as a significant step forward in the domain of rapid multi-scale object detection. The research opens avenues for further exploration, particularly in how AI can continue to adapt and optimize for varying scales and environments in overhead imagery.

Github Logo Streamline Icon: https://streamlinehq.com