SMAGNet: Adaptive Flood Mapping Network

Updated 7 January 2026

The paper introduces SMAGNet, a multimodal model that adaptively fuses SAR and incomplete MSI data for robust post-flood water extent mapping.
It employs dual-stream ResNet50 encoders, a spatially masked adaptive gated fusion module, and a weight-shared U-Net decoder to handle missing MSI pixels.
SMAGNet achieves superior segmentation accuracy and resilience, outperforming prior multimodal baselines under varying MSI completeness.

The Spatially Masked Adaptive Gated Network (SMAGNet) is a multimodal deep learning model for post-flood water extent mapping that adaptively integrates synthetic aperture radar (SAR) and incomplete multispectral (MSI) data. SMAGNet is designed to enhance mapping robustness and accuracy in scenarios where MSI coverage is spatially incomplete, a common occurrence in remote sensing due to clouds or acquisition constraints. The network achieves this via a dual-stream encoder architecture, a spatially masked adaptive gated fusion module, and a weight-shared decoder, yielding strong performance across varying MSI data availability conditions (Lee et al., 31 Dec 2025).

1. Architectural Overview

SMAGNet is structured around three principal components: dual-stream encoders, spatially masked adaptive gated feature fusion modules (SMAG-FFM), and a U-Net-style weight-shared decoder. The architecture enables scale-wise multimodal feature fusion, robust handling of missing MSI pixels, and flexible inference in the presence or absence of MSI data.

Dual-stream encoders: Two separate ResNet50 backbones independently extract hierarchical features from SAR (VV, VH polarizations) and MSI (R, G, B, NIR bands) inputs. At each scale $i \in \{1, \ldots, 5\}$ , the encoders output feature maps $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ and $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ , where $C_i \in \{64, 256, 512, 1024, 2048\}$ and $(H_i, W_i)$ represents downsampled spatial dimensions.
Spatially Masked Adaptive Gated Feature Fusion: At each encoder scale, the two feature streams are fused via a gating mechanism modulated by a binary spatial mask identifying valid MSI pixels. This module enables dynamic, spatially-adaptive weighting of MSI and SAR representations.
Weight-shared decoder: A five-stage upsampling block reconstructs segmentation masks from the fused features. Both SAR-only and fused (SAR+MSI) representations are decoded using shared parameters, supporting seamless fallback to SAR-only inference.

2. Spatial Masking for Handling Missing MSI Data

Handling incompleteness in MSI data is central to SMAGNet. The spatial mask $M_{full}$ is derived from the raw MSI imagery, with $M_{full}(x, y) = 1$ if the pixel $(x, y)$ in the MSI bands is valid, and $0$ otherwise (e.g., NaN for missing data). This binary mask is downsampled to each encoder scale via nearest neighbor or average pooling, yielding $M_i \in \{0,1\}^{1 \times H_i \times W_i}$ .

The spatial mask directly gates the MSI feature stream during fusion. By multiplying the adaptive gate map with the spatial mask, the network enforces that at locations with missing MSI, the downstream feature fusion is determined solely by the SAR stream; all MSI features at those positions are zeroed out. This masking operation preserves the integrity of the feature fusion pipeline in the presence of arbitrary MSI missingness patterns.

3. Adaptive Gated Fusion Mechanism

SMAGNet's feature fusion is governed by a gated mechanism at each scale. SAR and MSI encoder features are concatenated along the channel axis and subjected to a $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 0 convolution followed by a sigmoid nonlinearity, yielding a spatial gate $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 1. This gate is spatially masked:

$F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 2

where $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 3 denotes element-wise multiplication. The masked gate $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 4 modulates the MSI features, while its complement $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 5 modulates SAR features:

$F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 6

At positions where $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 7 (i.e., MSI missing or gate suppressed), the output reverts to $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 8 exclusively. Elsewhere, it computes a convex combination, adaptively leveraging both modalities.

4. Training Workflow and Loss Function

SMAGNet was developed and evaluated on the C2S-MS Floods dataset (900 paired images from 18 flood events; train/val/test split of 60/20/20% by event). The SAR inputs undergo orbit correction, calibration, speckle filtering, terrain flattening, and decibel scaling; MSI bands are pre-masked for clouds and missing data, with per-channel normalization.

Training employs both SAR-only and SAR+MSI fusion heads, with a unified loss function. For ground-truth labels $F^{SAR}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 9, modality-specific predictions $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 0 and $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 1, and $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 2, the objective is:

$F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 3

where

$F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 4

No auxiliary IoU, Dice, or regularization losses are incorporated. Data augmentation consists of randomized flips and crops; model selection leverages validation IoU.

5. Quantitative Performance and Robustness Analysis

SMAGNet exhibits superior segmentation accuracy compared to both unimodal and prior multimodal baselines on C2S-MS Floods. Its metrics under default conditions and degraded MSI completeness are:

Model	IoU (%)	Precision (%)	Recall (%)	OA (%)
U-Net (SAR only)	79.65 ± 0.96	90.81 ± 0.83	86.64 ± 1.03	96.52 ± 0.18
MFGFUnet (prior)	85.96 ± 0.57	—	—	—
SMAGNet	86.47 ± 0.61	93.05 ± 0.76	92.45 ± 0.83	97.73 ± 0.11

Under controlled missingness of MSI data (0% to 100% pixels missing), the IoU of SMAGNet monotonically decreases from 86.47% to 79.53%, a 6.94% absolute drop. Competing multimodal models degrade by 13–17%. When MSI is completely unavailable, SMAGNet's IoU is statistically indistinguishable from the SAR-only U-Net (Mann–Whitney U test, $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 5), while competing multimodal models fall below it ( $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 6).

SMAGNet's resilience arises from the spatial mask, which enforces hard exclusion of invalid MSI features, and a weight-shared decoder, which is consistently exposed to both SAR-only and fused modalities during training. This ensures seamless adaptation to pure SAR inputs as needed.

6. Implementation Specifics

SMAGNet's dual encoders utilize standard ResNet50 blocks through conv5_x. At each scale, the gating convolution employs a kernel $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 7. Decoder stages progressively upsample via transposed convolutions; skip connections concatenate features from encoder layers. Each decoder block comprises up-convolution, skip concatenation, $F^{MSI}_i \in \mathbb{R}^{C_i \times H_i \times W_i}$ 8 convolution + ReLU (twice), outputting feature channels 256, 128, 64, 32, 16 at successive stages.

Predictions are generated by two parallel, weight-shared decoder heads for SAR-only and SAR+MSI fused features. All modules are implemented in PyTorch and run on NVIDIA RTX A5000 GPUs with 256 GB RAM. Source code is available at github.com/ASUcicilab/SMAGNet (Lee et al., 31 Dec 2025).

7. Significance and Applicability

SMAGNet addresses the practical challenge of spatially incomplete MSI in multimodal remote sensing by enforcing spatially explicit feature selection during fusion. The approach is theoretically extensible to other multimodal remote sensing tasks with intermittent data coverage. Its robust performance under varying MSI availability, with minimal performance regression compared to SAR-only baselines, underscores its suitability for real-world flood extent mapping where data completeness cannot be ensured.

Markdown Report Issue Upgrade to Chat

References (1)

A Spatially Masked Adaptive Gated Network for multimodal post-flood water extent mapping using SAR and incomplete multispectral data (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spatially Masked Adaptive Gated Network (SMAGNet).