LandCoverNet: A global benchmark land cover classification training dataset (2012.03111v1)

Published 5 Dec 2020 in cs.CV and cs.LG

Abstract: Regularly updated and accurate land cover maps are essential for monitoring 14 of the 17 Sustainable Development Goals. Multispectral satellite imagery provide high-quality and valuable information at global scale that can be used to develop land cover classification models. However, such a global application requires a geographically diverse training dataset. Here, we present LandCoverNet, a global training dataset for land cover classification based on Sentinel-2 observations at 10m spatial resolution. Land cover class labels are defined based on annual time-series of Sentinel-2, and verified by consensus among three human annotators.

Citations (40)

View on Semantic Scholar

Summary

The paper introduces LandCoverNet, a comprehensive high-resolution Sentinel-2 training dataset that enhances land cover classification.
It employs a consensus-based labeling methodology, combining Random Forest pre-labeling with human validation to ensure precision.
Robust sampling and Bayesian averaging techniques support global applicability and empower sustainable land use monitoring.

LandCoverNet: A Comprehensive Benchmark for Land Cover Classification

This paper introduces LandCoverNet, a meticulously curated open-access global training dataset intended for land cover (LC) classification using multispectral satellite data, specifically from the Sentinel-2 satellites. The introduction of such a dataset is timely, given the critical role accurate and frequently updated LC maps play in advancing numerous global initiatives, including but not limited to precision agriculture, urban planning, and environmental conservation efforts, as delineated in 14 of the 17 Sustainable Development Goals.

Overview of LandCoverNet Dataset

The LandCoverNet dataset constitutes a shift towards more comprehensive and geographically diverse LC datasets. It provides high-resolution (10m) training data based on Sentinel-2 imagery, capturing a broad spectrum of LC classes, which are defined from an annual time-series of observations. The taxonomy used in this dataset was meticulously designed by domain experts, ensuring that it is both exhaustive and suitable for the spatial resolution of Sentinel-2 imagery.

A significant innovation in this paper is the consensus-based labeling methodology employed to enhance annotation accuracy. Annotators were tasked with labeling data that were pre-labeled by a Random Forest model, allowing for efficient human validation across large data volumes and the incorporation of multiple perspectives to diminish bias. This consensus approach allowed for robust data aggregation, minimizing human error even with the complexities inherent in time-series satellite data.

Methodology and Data Handling

The dataset is structured around globally representative sample regions to ensure a broad and reliable representation of global land surfaces. The authors employed a strategic sampling scheme, leveraging MODIS-derived global LC maps to guide the selection of Sentinel-2 tiles, ensuring a diverse range of land cover classes within each geographical segment.

The process of human validation in the annotation strategy is particularly noteworthy. Annotators employed a labeling dashboard to interact with and refine model predictions. Furthermore, each chip underwent cross-validation by multiple annotators, with a consensus label determined through a rigorous Bayesian averaging approach. This yielded high consensus scores and facilitated the production of a quality-controlled training dataset across various LC classes.

Numerical Results and Dataset Accessibility

LandCoverNet v1.0, comprising 1980 labeled chips across the African continent, demonstrates significant versatility in addressing LC classification tasks, providing data that spans all major classes except perpetual snow and ice. Quantitative analysis illustrated that a significant portion of the annotated data achieved a perfect consensus score, highlighting the dataset's reliability and precision.

The dataset is openly available under a Creative Commons license, promoting community engagement and advancing research in LC classification tasks.

Implications and Future Directions

The introduction of LandCoverNet holds profound implications for both practical applications and theoretical research in land cover classification, emphasizing the value of geographically diverse, high-fidelity training data in the global context. One of the theoretical implications is the provision of a benchmark that facilitates the development of more generalized machine learning models, capable of transcending regional limitations.

Practically, LandCoverNet empowers the scientific community to harness Sentinel-2's multispectral potential more effectively, potentially leading to advancements in global monitoring systems that underpin sustainable development efforts. The dataset's capacity to enhance LC model accuracy can directly inform policy-making and global change assessments.

Looking forward, the dataset is set for expansion to cover additional global regions, thus amplifying its utility and potential applications. The authors’ methodology provides a scalable framework that can be leveraged to continuously improve the quality of LC classification models as novel satellite data sources become available.

In conclusion, LandCoverNet represents a significant resource for remote sensing and ML communities, spearheading the use of high-resolution imagery in addressing global environmental and developmental challenges.

PDF Markdown