SMAGNet: Adaptive Flood Mapping Network
- The paper introduces SMAGNet, a multimodal model that adaptively fuses SAR and incomplete MSI data for robust post-flood water extent mapping.
- It employs dual-stream ResNet50 encoders, a spatially masked adaptive gated fusion module, and a weight-shared U-Net decoder to handle missing MSI pixels.
- SMAGNet achieves superior segmentation accuracy and resilience, outperforming prior multimodal baselines under varying MSI completeness.
The Spatially Masked Adaptive Gated Network (SMAGNet) is a multimodal deep learning model for post-flood water extent mapping that adaptively integrates synthetic aperture radar (SAR) and incomplete multispectral (MSI) data. SMAGNet is designed to enhance mapping robustness and accuracy in scenarios where MSI coverage is spatially incomplete, a common occurrence in remote sensing due to clouds or acquisition constraints. The network achieves this via a dual-stream encoder architecture, a spatially masked adaptive gated fusion module, and a weight-shared decoder, yielding strong performance across varying MSI data availability conditions (Lee et al., 31 Dec 2025).
1. Architectural Overview
SMAGNet is structured around three principal components: dual-stream encoders, spatially masked adaptive gated feature fusion modules (SMAG-FFM), and a U-Net-style weight-shared decoder. The architecture enables scale-wise multimodal feature fusion, robust handling of missing MSI pixels, and flexible inference in the presence or absence of MSI data.
- Dual-stream encoders: Two separate ResNet50 backbones independently extract hierarchical features from SAR (VV, VH polarizations) and MSI (R, G, B, NIR bands) inputs. At each scale , the encoders output feature maps and , where and represents downsampled spatial dimensions.
- Spatially Masked Adaptive Gated Feature Fusion: At each encoder scale, the two feature streams are fused via a gating mechanism modulated by a binary spatial mask identifying valid MSI pixels. This module enables dynamic, spatially-adaptive weighting of MSI and SAR representations.
- Weight-shared decoder: A five-stage upsampling block reconstructs segmentation masks from the fused features. Both SAR-only and fused (SAR+MSI) representations are decoded using shared parameters, supporting seamless fallback to SAR-only inference.
2. Spatial Masking for Handling Missing MSI Data
Handling incompleteness in MSI data is central to SMAGNet. The spatial mask is derived from the raw MSI imagery, with if the pixel in the MSI bands is valid, and $0$ otherwise (e.g., NaN for missing data). This binary mask is downsampled to each encoder scale via nearest neighbor or average pooling, yielding .
The spatial mask directly gates the MSI feature stream during fusion. By multiplying the adaptive gate map with the spatial mask, the network enforces that at locations with missing MSI, the downstream feature fusion is determined solely by the SAR stream; all MSI features at those positions are zeroed out. This masking operation preserves the integrity of the feature fusion pipeline in the presence of arbitrary MSI missingness patterns.
3. Adaptive Gated Fusion Mechanism
SMAGNet's feature fusion is governed by a gated mechanism at each scale. SAR and MSI encoder features are concatenated along the channel axis and subjected to a convolution followed by a sigmoid nonlinearity, yielding a spatial gate . This gate is spatially masked:
where denotes element-wise multiplication. The masked gate modulates the MSI features, while its complement modulates SAR features:
At positions where (i.e., MSI missing or gate suppressed), the output reverts to exclusively. Elsewhere, it computes a convex combination, adaptively leveraging both modalities.
4. Training Workflow and Loss Function
SMAGNet was developed and evaluated on the C2S-MS Floods dataset (900 paired images from 18 flood events; train/val/test split of 60/20/20% by event). The SAR inputs undergo orbit correction, calibration, speckle filtering, terrain flattening, and decibel scaling; MSI bands are pre-masked for clouds and missing data, with per-channel normalization.
Training employs both SAR-only and SAR+MSI fusion heads, with a unified loss function. For ground-truth labels , modality-specific predictions and , and , the objective is:
where
No auxiliary IoU, Dice, or regularization losses are incorporated. Data augmentation consists of randomized flips and crops; model selection leverages validation IoU.
5. Quantitative Performance and Robustness Analysis
SMAGNet exhibits superior segmentation accuracy compared to both unimodal and prior multimodal baselines on C2S-MS Floods. Its metrics under default conditions and degraded MSI completeness are:
| Model | IoU (%) | Precision (%) | Recall (%) | OA (%) |
|---|---|---|---|---|
| U-Net (SAR only) | 79.65 ± 0.96 | 90.81 ± 0.83 | 86.64 ± 1.03 | 96.52 ± 0.18 |
| MFGFUnet (prior) | 85.96 ± 0.57 | — | — | — |
| SMAGNet | 86.47 ± 0.61 | 93.05 ± 0.76 | 92.45 ± 0.83 | 97.73 ± 0.11 |
Under controlled missingness of MSI data (0% to 100% pixels missing), the IoU of SMAGNet monotonically decreases from 86.47% to 79.53%, a 6.94% absolute drop. Competing multimodal models degrade by 13–17%. When MSI is completely unavailable, SMAGNet's IoU is statistically indistinguishable from the SAR-only U-Net (Mann–Whitney U test, ), while competing multimodal models fall below it ().
SMAGNet's resilience arises from the spatial mask, which enforces hard exclusion of invalid MSI features, and a weight-shared decoder, which is consistently exposed to both SAR-only and fused modalities during training. This ensures seamless adaptation to pure SAR inputs as needed.
6. Implementation Specifics
SMAGNet's dual encoders utilize standard ResNet50 blocks through conv5_x. At each scale, the gating convolution employs a kernel . Decoder stages progressively upsample via transposed convolutions; skip connections concatenate features from encoder layers. Each decoder block comprises up-convolution, skip concatenation, convolution + ReLU (twice), outputting feature channels 256, 128, 64, 32, 16 at successive stages.
Predictions are generated by two parallel, weight-shared decoder heads for SAR-only and SAR+MSI fused features. All modules are implemented in PyTorch and run on NVIDIA RTX A5000 GPUs with 256 GB RAM. Source code is available at github.com/ASUcicilab/SMAGNet (Lee et al., 31 Dec 2025).
7. Significance and Applicability
SMAGNet addresses the practical challenge of spatially incomplete MSI in multimodal remote sensing by enforcing spatially explicit feature selection during fusion. The approach is theoretically extensible to other multimodal remote sensing tasks with intermittent data coverage. Its robust performance under varying MSI availability, with minimal performance regression compared to SAR-only baselines, underscores its suitability for real-world flood extent mapping where data completeness cannot be ensured.