Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 23 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution (2207.06418v2)

Published 13 Jul 2022 in eess.IV, cs.CV, cs.LG, and stat.AP

Abstract: Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStrat dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate nearly 10,000 sqkm of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. High-resolution Airbus imagery is CC BY-NC, while the labels and Sentinel2 imagery are CC BY, and the source code and pre-trained models under BSD. The dataset is available at https://zenodo.org/record/6810791 and the software package at https://github.com/worldstrat/worldstrat .

Citations (45)

Summary

  • The paper presents a stratified high-resolution satellite imagery dataset (WorldStrat) covering 10,000 km² and diverse land-use types.
  • It benchmarks super-resolution methods using single-image and multi-frame architectures with notable PSNR and SSIM metrics.
  • The accompanying open-source Python package integrates with EO-learn and PyTorch Lightning, enabling accessible research and model training.

Overview of the WorldStrat Dataset

The paper "Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution" presents an extensive and richly stratified dataset aimed at enhancing machine learning applications in satellite imagery. The dataset, known as WorldStrat, encompasses nearly 10,000 km² of high-resolution imagery from Airbus SPOT 6/7 satellites, paired with temporally-matched lower-resolution imagery from Sentinel-2 satellites. This compilation represents a critical resource for a diverse array of applications, including multi-frame super-resolution, climate change monitoring, urban development analysis, agriculture, and humanitarian activities.

Key Contributions

The WorldStrat dataset is meticulously curated to offer a representative cross-section of global land-use types. The stratification spans various environments, including urban areas, forests, ice caps, and agricultural land. Notably, the dataset also includes locations generally under-represented in machine learning datasets, such as humanitarian sites, illegal mining areas, and settlements of vulnerable populations. The specific aims of the dataset include:

  1. Broad-spectrum representativity of land-use types.
  2. Integration of high-resolution imagery (1.5 m/pixel) from the Airbus SPOT 6/7 satellites.
  3. Temporal matching with lower-resolution (10 m/pixel) imagery from Sentinel-2 satellites.
  4. Inclusion of non-mainstream areas of interest, enhancing the dataset's utility for social impact applications.

Data Composition and Structuring

The WorldStrat dataset is divided into approximately 3,450 instances, each representing a 2.5 km² patch of land. For some specific Points of Interest (POIs), larger areas of 22.5 km² are provided. Stratification was executed using data from the European Space Agency (ESA) Climate Change Initiative (CCI) Land Cover dataset, which employs classifications from the Food and Agriculture Organization (FAO) Land Cover Classification System (LCCS) and the Intergovernmental Panel on Climate Change (IPCC). Additionally, urban density classes were derived from the Global Human Settlement Layer Settlement Model (GHSL-SMOD).

Super-Resolution Benchmark

To illustrate the dataset's potential utility, the authors establish benchmarks for multi-frame super-resolution tasks using three different architectures:

  1. A single-image super-resolution architecture (SRCNN).
  2. A multi-frame extension of SRCNN by collating revisits as channels.
  3. A multi-spectral modification of the original HighResNet, optimized for computational efficiency.

The performance metrics applied are the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM). Results underscore significant variability across the distribution of the validation set, suggesting substantial room for algorithmic improvements.

Open-Source Python Package

Accompanying the dataset is an open-source Python package designed to facilitate data rebuilding, model training, and inference tasks. This integration with the popular EO-learn toolbox ensures accessibility and ease of use, even for researchers with modest computational resources. Tutorials and standardized interfaces in PyTorch Lightning further enhance the package's utility.

Implications and Future Directions

The WorldStrat dataset holds significant implications for the field of machine learning applied to satellite imagery. By addressing the bottleneck of inaccessible high-resolution imagery and providing a diverse, stratified dataset, the authors aim to democratize the analytic capabilities previously restricted to costly proprietary data. One immediate consequence is the enhancement of multi-frame super-resolution methods, which can derive high-resolution insights from freely available low-resolution Sentinel-2 imagery.

Future developments could include expanding the dataset to cover rivers, harbors, and coastal areas, which are currently under-represented. Additionally, further stratification based on Local Climate Zones (LCZ) could be explored to refine the dataset's utility for urban studies.

In summary, the WorldStrat dataset represents a significant step forward in making high-quality satellite imagery accessible for machine learning applications. Its broad representativity and integration with an open-source Python package ensure that it will serve as a foundational resource for numerous research endeavors.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 tweets and received 1263 likes.

Upgrade to Pro to view all of the tweets about this paper: