AttentionUNet-OBIA: Hybrid Forest Mapping

Updated 5 January 2026

The paper presents a hybrid method combining AttentionUNet with OBIA to deliver both pixel-level discrimination and object-level interpretability for forest/non-forest classification.
The methodology employs a UNet-style encoder-decoder with attention gates alongside mean-shift segmentation, enhancing feature focus and spatial coherence in high-resolution remote sensing data.
The approach achieves state-of-the-art performance (OA 95.64%, IoU 0.9064) and outperforms traditional OBIA and other deep learning variants in identifying forest cover.

AttentionUNet-OBIA is a hybrid forest cover mapping methodology that integrates a deep learning model—AttentionUNet—with Object-Based Image Analysis (OBIA) for high-resolution multispectral remote sensing image analysis. Developed within the "ForCM" pipeline for Sentinel-2 imagery, it achieves state-of-the-art accuracy for forest/non-forest classification in the Amazon Rainforest, providing both pixel-wise discrimination and object-level interpretability with open-source tools (Haque et al., 29 Dec 2025).

1. Architecture and Attention Mechanism

The core of AttentionUNet-OBIA is a UNet-style encoder–decoder architecture augmented with attention gates (AG). The input consists of $512 \times 512 \times C$ images ( $C=$ 3 or 4 bands). The model comprises four encoding stages, a bottleneck, and four decoding stages that symmetrically mirror the encoder. Each encoder level $\ell$ applies two consecutive $3\times3$ convolutions with ReLU activations (optionally batch-normalized), doubling feature channels at each downsampling (e.g., $64 \to 128 \to 256 \to 512$ ). Spatial resolution is reduced by $2 \times 2$ max-pooling (stride 2). The bottleneck contains two $3\times3$ convolutions at 1024 channels.

Decoding consists of $2\times2$ transposed convolution (up-convolution) to increase spatial resolution and halve channel count. At each decoder level $\ell$ , an attention gate receives the upsampled decoder signal $g^\ell$ and the encoder skip feature $C=$ 0. The AG formula follows Oktay et al. (2018):

\begin{align*} f_x &= W_x\,x^\ell, \quad f_g = W_g\,g^\ell \ \Psi_\text{int} &= \operatorname{ReLU}(f_x + f_g + b) \ \alpha^\ell &= \sigma(\psi^T \Psi_\text{int} + b_\psi) \ x^{{\ell\,\prime}} &= \alpha^\ell \odot x^\ell \end{align*}

where $C=$ 1 are learned convolutions, $C=$ 2 the sigmoid, and $C=$ 3 element-wise multiplication. The AG output $C=$ 4 emphasizes salient features and suppresses irrelevant regions before skip connection concatenation. The final feature map passes through a $C=$ 5 convolution, followed by sigmoid activation, yielding a pixel-wise probability map $C=$ 6.

2. Data Preprocessing and Input Modalities

Input data are Sentinel-2 Level-2A images, pre-corrected for atmospheric effects using ESA Sen2Cor. Band selection includes both three-band (RGB) and four-band (RGB plus NIR, all at 10 m) sets. Input normalization policies are:

Three-band images: divide by 255, cast to float32, yielding values in $C=$ 7.
Four-band images: cast to float32, per-band divided by maximum reflectance ( $C=$ 8), then rescaled to $C=$ 9.

No additional spectral indices such as NDVI are computed; only raw bands are provided to the model.

3. OBIA: Segmentation and Feature Extraction

Post-prediction, OBIA is performed in QGIS (v3.34.5, Orfeo Toolbox v8.1.2) using mean-shift segmentation, chosen for robust unsupervised object delineation. Mean-shift parameters are set to spatial radius $\ell$ 0, range radius $\ell$ 1, and minimum object size $\ell$ 2 pixels (trial-and-error selected). Each resulting image object $\ell$ 3 yields a feature vector $\ell$ 4, where $\ell$ 5 are mean band reflectances and $\ell$ 6 is the mean AttentionUNet pixelwise probability within $\ell$ 7. Optional features (not used in this work) include area, perimeter, compactness, and texture.

4. Fusion, Classification, and Post-processing

The classification stage fuses AttentionUNet-derived and OBIA-obtained features at object level. From the segmented object set $\ell$ 8, $\ell$ 9 are randomly sampled (visually-checked stratification) for manual ground-truth labeling. A linear-kernel Support Vector Machine (SVM, $3\times3$ 0) is trained to map $3\times3$ 1 to forest/non-forest labels. Label assignment is performed by evaluating SVM score $3\times3$ 2 with a threshold at 0 (or probability 0.5, if calibrated). Post-processing removes objects $3\times3$ 3 pixels by morphological opening, followed by boundary smoothing via a $3\times3$ 4 majority filter to reduce salt-and-pepper noise.

5. Training Protocol and Hyperparameters

Datasets are split as follows: for 3-band sets, V1: $3\times3$ 5 train/ $3\times3$ 6 val/ $3\times3$ 7 test; V2: $3\times3$ 8 train/ $3\times3$ 9 val/ $64 \to 128 \to 256 \to 512$ 0 test; V3: $64 \to 128 \to 256 \to 512$ 1 train/ $64 \to 128 \to 256 \to 512$ 2 val/ $64 \to 128 \to 256 \to 512$ 3 test; and for the 4-band set: $64 \to 128 \to 256 \to 512$ 4 train/ $64 \to 128 \to 256 \to 512$ 5 val/ $64 \to 128 \to 256 \to 512$ 6 test. Training applies binary cross-entropy loss with Adam optimizer ( $64 \to 128 \to 256 \to 512$ 7, $64 \to 128 \to 256 \to 512$ 8), batch size $64 \to 128 \to 256 \to 512$ 9, initial learning rate $2 \times 2$ 0, with ReduceLROnPlateau scheduling (factor $2 \times 2$ 1, patience $2 \times 2$ 2), and no class weighting (class balance assumed). Data augmentation is limited to random horizontal/vertical flips performed on-the-fly. Training epochs: 20 for V1, 10 for V2, V3, and 4-band. Hardware employed: Intel i7-class CPU, 32 GB RAM, NVIDIA GeForce GTX TITAN X 12 GB GPU.

6. Evaluation Metrics and Comparative Results

Performance is assessed using mean Intersection over Union (IoU), overall accuracy (OA), precision, recall, and F1-score, computed on randomly selected test images. AttentionUNet-OBIA achieves:

Metric	Value
OA	95.64 %
IoU	0.9064
Precision	93.32 %
Recall	96.84 %
F1-score	0.9504

Comparative results show that AttentionUNet-OBIA surpasses traditional OBIA (OA 92.91 %, IoU 0.8992, F1 0.9365) and other DL-OBIA variants such as ResUNet-OBIA (OA 94.54 %, IoU 0.9101, F1 0.9525). Standalone AttentionUNet (no OBIA) attains OA 95.93 % and IoU 0.9168 on the 4-band test set. An example confusion matrix for 1000 test pixels:

	Pred Forest	Pred Non-Forest
True Forest	581	19
True Non-Forest	27	373

This evaluates to $2 \times 2$ 3.

7. Workflow Schematic

The pipeline is summarized as follows:

Load images and ground-truth masks.
Normalize bands to $2 \times 2$ 4 (float32).
Partition datasets into train/val/test.
Construct AttentionUNet:
- Encoder (levels $2 \times 2$ 5): $2 \times 2$ 6
- Bottleneck: $2 \times 2$ 7
- Decoder ( $2 \times 2$ 8): $2 \times 2$ 9 two $3\times3$ 0
- Output: $3\times3$ 1 Conv (sigmoid).
Train with Adam optimizer and BCE loss, 10–20 epochs.
Inference: output $3\times3$ 2 probability map.
Segment objects with mean-shift (QGIS/OTB, $3\times3$ 3).
For each object $3\times3$ 4, compute $3\times3$ 5 (bands), $3\times3$ 6; assemble feature $3\times3$ 7.
Train linear SVM on labeled $3\times3$ 8.
Classify all $3\times3$ 9 as forest/non-forest.
Assign SVM label to all pixels within $2\times2$ 0 for final raster.
Post-process with small-object removal and majority filter.

A plausible implication is that the approach leverages spatial coherence, spectral consistency, and pixel-level DL inference for highly accurate, interpretable mapping, facilitated by accessible open-source software (Haque et al., 29 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

ForCM: Forest Cover Mapping from Multispectral Sentinel-2 Image by Integrating Deep Learning with Object-Based Image Analysis (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AttentionUNet-OBIA.

AttentionUNet-OBIA: Hybrid Forest Mapping

1. Architecture and Attention Mechanism

2. Data Preprocessing and Input Modalities

3. OBIA: Segmentation and Feature Extraction

4. Fusion, Classification, and Post-processing

5. Training Protocol and Hyperparameters

6. Evaluation Metrics and Comparative Results

7. Workflow Schematic

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AttentionUNet-OBIA: Hybrid Forest Mapping

1. Architecture and Attention Mechanism

2. Data Preprocessing and Input Modalities

3. OBIA: Segmentation and Feature Extraction

4. Fusion, Classification, and Post-processing

5. Training Protocol and Hyperparameters

6. Evaluation Metrics and Comparative Results

7. Workflow Schematic

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research