ForestEyes: Hybrid Deforestation Monitoring

Updated 12 October 2025

ForestEyes Project is a modular initiative combining citizen science with machine learning to enhance deforestation monitoring using remote sensing imagery.
Its segmented imagery is classified by volunteers on platforms like Zooniverse, employing metrics such as Homogeneity Ratio and entropy for quality control.
By integrating high-confidence volunteer labels with SVM classifiers, the project achieves scalable, efficient, and accurate deforestation mapping.

The ForestEyes Project is a modular citizen science and machine learning initiative focused on leveraging human volunteer classifications and advanced computational methods to improve the monitoring and detection of deforestation, primarily in tropical rainforest regions such as the Brazilian Amazon. Designed to complement official government monitoring programs (e.g., PRODES), ForestEyes provides additional, high-quality labeled data derived from the analysis of remote sensing images, thereby enhancing the timeliness and reliability of deforestation assessments. Its approach centers on distributing remote sensing image segmentation tasks to volunteers, aggregating and quality-controlling their contributions, and integrating these results as training data for machine learning classifiers that automate large-scale deforestation monitoring.

1. Project Design and Citizen Science Workflow

The ForestEyes framework is structured into several interlinked modules:

Pre-processing: Remote sensing imagery, such as Landsat-8 scenes, undergoes dimensionality reduction (often via Principal Component Analysis) to condense multispectral (7-band) data into simulated RGB images. These are segmented using superpixel algorithms such as SLIC, IFT-SLIC, or MaskSLIC to generate interpretable image segments.
Crowdsourcing and Volunteer Involvement: Tasks are disseminated via the Zooniverse platform, where volunteers assess each segment, classifying it as forest, non-forest, or undefined (if ambiguous). Multiple color compositions (e.g., true-color RGB and "753" false color) are provided for improved interpretability.
Aggregation and Quality Control: Each segment is classified by at least 15 independent contributors. The consensus class is determined via majority vote. Metrics such as the Homogeneity Ratio (HoR) and task entropy (Shannon entropy) are computed to gauge volunteer agreement and segment clarity. HoR is defined as

$\mathrm{HoR} = \frac{\max(\mathrm{NFP},\,\mathrm{NNP})}{\mathrm{NP}}$

where NFP and NNP are counts of forest and non-forest pixels, and NP is the segment's total pixel count.

Data Curation for Machine Learning: Segments meeting quality thresholds (high HoR, low entropy) are used as training data in the project's machine learning module, which is critical for scalable automated deforestation detection.

Volunteer contributions are quantified by per-user hit rates (agreement with consensus) and overall scores that factor in accuracy and participation volume. The workflow is iterative: ambiguous or unresolved "undefined" cases can be refined with increased segmentation granularity and re-tasked to volunteers, ensuring data quality.

2. Evolution of Segmentation and Sampling Approaches

Early ForestEyes deployments used SLIC for superpixel segmentation, resulting in variable segment homogeneity. Subsequent studies systematically compared 22 segmentation techniques, revealing that alternative methods (e.g., RSS, ERGC, ETPS, CRS, LSC, SH, GMMSP) outperform SLIC in both classical metrics (Boundary Recall, Undersegmentation Error) and citizen science-specific criteria (higher HoR, more "useful" segments). Particularly, these methods yield segments that are both more homogeneous and better aligned with deforestation boundaries, facilitating easier and more accurate labeling by volunteers and improving downstream classifier performance (Resende et al., 26 Nov 2024).

Sampling strategies for selecting machine learning training data have also evolved. Instead of random sampling, ForestEyes now utilizes volunteer response entropy (measured via Shannon entropy) to implement curriculum-like incremental sampling. The "increasing" strategy exposes the classifier first to segments with the lowest entropy (highest consensus), gradually introducing more challenging cases. This approach accelerates SVM training convergence and reduces required sample sizes for high accuracy, with only 10% of low-entropy samples achieving performance typically requiring ~70% of data with random sampling (Resende et al., 22 Aug 2024).

3. Machine Learning Integration for Deforestation Detection

The machine learning module predominantly employs Support Vector Machines (SVMs) trained on features extracted from high-confidence, crowd-labeled segments. The SVM optimization problem is formulated as:

$\min~\frac{1}{2}\|w\|^2 \quad \text{subject to}~y_i(w^\top x_i + b) \geq 1~\forall~i$

where $x_i$ are feature vectors (e.g., Haralick texture features), $y_i$ class labels, $w$ the weight vector, and $b$ the bias. While linear kernels are typically used to ensure interpretability and computational tractability, the framework is extensible.

Using entropy-ordered sample selection injects high-confidence labels into the early learning process, enabling rapid convergence on a robust decision boundary and greater resilience to label noise. SVMs trained under this strategy demonstrate superior balanced accuracy in segment-level deforestation classification compared to those trained on randomly selected samples (Resende et al., 22 Aug 2024).

4. Campaigns, Evaluation, and Ground Truth Comparison

ForestEyes has operated multiple campaign "workflows" across different years and image modalities. Initial tasks employed MODIS imagery (250 m resolution), while subsequent campaigns used Landsat-8 at 60 m and later native 30 m resolutions. Segmentation methodologies evolved (e.g., integration of MaskSLIC and IFT-SLIC), and user interface enhancements were deployed to aid volunteer interpretation.

Key campaign results include:

Early workflows (60 m Landsat-8) achieved overall accuracy in the 83–86% range against PRODES groundtruth.
Refined workflows using 30 m imagery and advanced segmentation further improved both accuracy and deforestation pixel differentiation (from ~48% to ~65% in MaskSLIC-based campaigns) (Dallaqua et al., 2022).
Groundtruth assessment relied on three strategies: pixel-based (GT-PRODES) and two segment-based (GT-U, which incorporates undefined, and GT-M, majority class), enabling nuanced evaluation of both classifier and volunteer performance.

5. Technical and Human Factors Challenges

Key challenges have arisen throughout ForestEyes development:

Segmentation and Image Resolution: Early campaigns using low-resolution or poorly segmented images resulted in a high proportion of ambiguous or high entropy tasks. This led to frequent "undefined" volunteer responses and reduced labeling effectiveness.
User Interface Usability: Volunteers reported difficulties with navigation (e.g., image flipping, zooming) and interpreting tasks with low homogeneity.
Ambiguous Class Compositions: Mixed segments (e.g., border regions) had lower consensus and accuracy rates, motivating both segmentation algorithm refinement and more sophisticated selection criteria for machine learning input.

Enhancements included adopting superior superpixel segmentation methods, revising presentation modes, adding color composites, and developing new metrics for continuous data quality monitoring.

6. Current Advancements and Future Directions

Recent research has extended ForestEyes capabilities in several directions:

Segmentation Pipeline Upgrade: Adoption of higher-performing superpixel algorithms is now recommended, directly raising the quality of citizen science–labeled data and improving classifier inputs (Resende et al., 26 Nov 2024).
Entropy-Based Sampling: The transition to curriculum-style entropy-increasing sampling is empirically supported to deliver more efficient SVM learning, particularly in imbalanced or noisy labeling regimes (Resende et al., 22 Aug 2024).
Automated Ambiguity Detection: Planned pipeline upgrades include integrating deep learning models for semantic segmentation and automatic ambiguous-segment flagging (i.e., low HoR detection) prior to volunteer task distribution.
Expanded Application Scope: While the primary focus remains deforestation monitoring, the conceptual framework is poised for adaptation to other remote sensing applications, including urban land use change.

The feedback loop between volunteers and machine learning, with iterative refinement via active learning and focused cognitive studies of user performance, underpins ForestEyes' goal of a robust, hybrid monitoring system supporting official assessments while expanding monitoring coverage.

7. Significance for Scalable Deforestation Monitoring

ForestEyes demonstrates that modular integration of citizen science with carefully engineered machine learning—anchored in advanced segmentation, entropy-driven sampling, and rigorous quality control—can generate high-coverage, reliable deforestation maps. This hybrid pipeline offers resilience to label noise, makes efficient use of human cognitive resources, and is adaptable to diverse monitoring needs. Its published outcomes show accuracy and detail on par with national-level monitoring, while its internal advances in segmentation, sampling, and participation design provide a template for similar large-scale environmental monitoring efforts.

PDF Markdown Chat (Pro)

References (3)

Exploring Superpixel Segmentation Methods in the Context of Citizen Science and Deforestation Detection (2024)

Sampling Strategies based on Wisdom of Crowds for Amazon Deforestation Detection (2024)

ForestEyes Project: Conception, Enhancements, and Challenges (2022)

Follow Topic

Get notified by email when new papers are published related to ForestEyes Project.