MCD-Net: Efficient Moraine Segmentation
- MCD-Net is a lightweight deep learning framework designed for optical-only moraine segmentation using high-resolution satellite imagery.
- It integrates MobileNetV2, CBAM, and DeepLabV3+ to achieve competitive metrics (62.3% mIoU, 72.8% Dice) at a reduced computational cost.
- The framework is validated on a large-scale, manually annotated dataset, enabling automated mapping crucial for palaeoglaciology and climate change studies.
MCD-Net is a lightweight deep learning framework specifically designed for optical-only moraine segmentation in high-resolution satellite imagery. It enables automated mapping of glacial landforms, which is central for reconstructing palaeoglaciology and assessing climate-driven geomorphic change, particularly where high-quality digital elevation models (DEMs) or multi-sensor data are lacking. MCD-Net integrates a MobileNetV2 encoder, a Convolutional Block Attention Module (CBAM), and a DeepLabV3+ decoder head. This combination yields competitive segmentation accuracy (62.3% mean Intersection over Union; 72.8% Dice coefficient) at reduced computational cost compared to heavyweight alternatives, establishing MCD-Net as a reproducible, deployable baseline for moraine-body segmentation using optical imagery (Cao et al., 5 Jan 2026).
1. Optical-Only Moraine Segmentation Dataset
MCD-Net was developed and validated using a large-scale, manually annotated remote sensing dataset. The dataset comprises 3,340 orthorectified Google Earth image tiles (1024×1024 px; 0.5–2 m/pixel) collected from glaciated regions of Sichuan and Yunnan, China (26°–32° N, 98°–104° E, elevations 2,800–5,200 m) spanning 2020–2025. These images include cirque, valley, and piedmont moraines captured under challenging conditions—shadows, low contrast, and variable vegetation. Labeling was performed by three geomorphologists, yielding binary segmentation masks distinguishing “background” (0) from “moraine body” (1). Ridges, initially labeled as a third class, were merged with the moraine class due to sub-pixel-scale ambiguity and high inter-annotator variance (~±2 px). The dataset is stratified geographically (9:1 train:test split; 2,630 train / 293 test) to ensure representation of diverse valley geomorphologies.
2. Segmentation Metrics
Model performance is evaluated using confusion-matrix-based statistics:
- Intersection over Union (IoU) for a single class (e.g., moraine body):
where TP = true positives, FP = false positives, FN = false negatives.
- Mean IoU (mIoU) over classes:
- Dice coefficient (F1 score overlap):
Auxiliary measures include Precision, Recall, and Pixel Accuracy. These metrics facilitate rigorous comparison with other segmentation frameworks.
3. MCD-Net Architectural Design
MCD-Net adopts an encoder–attention–decoder paradigm that balances efficiency and representational power.
Encoder (MobileNetV2):
- Processes input with stacked inverted residual blocks and depthwise separable convolutions, yielding a compact feature map .
Attention (CBAM):
- Applies channel and spatial attention via CBAM. Channel attention is computed as:
- Spatial attention is calculated as:
- CBAM-refined features are then:
ASPP and Decoder (DeepLabV3+):
- ASPP applies atrous convolutions at rates {1, 6, 12, 18} and global average pooling, aggregating multi-scale context and projecting to 256-D features:
- Decoder upsamples to 1/4 input resolution, fuses with low-level features, and outputs softmax logits representing moraine segmentation.
| Component | Core Methodology | Output Shape |
|---|---|---|
| Encoder | MobileNetV2 with inverted residual/dw-sep conv | |
| Attention | CBAM: channel+spatial attention | |
| ASPP | Multi-rate atrous + global pooling | |
| Decoder | DeepLabV3+ head and upsampling |
4. Mathematical Formulation and Losses
The core mathematical building blocks of MCD-Net use the following formalisms:
Depthwise Separable Convolutions and Inverted Residuals (MobileNetV2):
- Expansion:
- Depthwise convolution:
- Projection:
- Residual connection (if ).
ASPP (Atrous Spatial Pyramid Pooling):
- Atrous convolution is defined as:
- Aggregates multiscale responses and concatenates with global pooling.
Segmentation Loss:
- Weighted cross-entropy (equal weights ):
- Optionally, Dice loss may be combined:
5. Training Regimen and Implementation Protocol
MCD-Net is implemented in PyTorch, trained on an NVIDIA RTX 5060 Ti (16 GB). The training regimen features:
- Optimizer: AdamW (learning rate , weight decay ) with cosine-annealing learning rate schedule.
- Batch size: 16.
- Data augmentation: random scaling (0.5–2.0×), random flips, random rotations (±30°), and Gaussian blur.
- Input preprocessing: normalization to [0,1], conversion to channel-first format.
- Training duration is up to 200 epochs with early stopping (patience = 15 epochs) based on validation mIoU.
6. Computational Efficiency and Benchmarking
MCD-Net achieves competitive accuracy while maintaining low parameter and computation budgets:
| Model | Params (M) | GFLOPs (1024²) | mIoU (%) | Dice (%) |
|---|---|---|---|---|
| MCD-Net (MobileNetV2+CBAM) | 5.83 | 105.7 | 62.3 | 72.8 |
| ResNet152+CBAM | 76.0 | 433.9 | 59.8 | 70.7 |
| Xception+CBAM | 55.3 | 333.7 | 56.7 | 66.4 |
- MCD-Net reduces FLOPs by over 75% compared to ResNet152+CBAM and over 68% versus Xception+CBAM.
- Achieves faster inference (~15 fps on 1024×1024 tiles) compared to <5 fps for ResNet152+CBAM on RTX 5060 Ti.
This suggests that MCD-Net is particularly suitable for large-scale regional moraine mapping and real-time applications on moderate hardware.
7. Limitations and Outlook
Several practical limitations are identified:
- Delineation accuracy for ridges is constrained for features narrower than 3 px and in settings of severe weathering or vegetation encroachment.
- Spectral ambiguity and mosaicking artifacts introduce misclassifications, especially under extreme illumination.
- Small, isolated moraines (<0.2% of area) remain under-detected due to class imbalance and spatial resolution.
Recommended future research directions include:
- Incorporation of domain adaptation or self-supervised pre-training to improve generalization under variable sensor or environmental conditions.
- Sensor fusion approaches (e.g., combining DEM or SAR data) for enhanced ridge detection.
- Deployment of higher-resolution UAV or drone imagery for resolving sub-pixel features.
- Development of specialized small-object attention mechanisms or multi-scale feature pyramids for closely spaced or diminutive moraine bodies.
References for implementation specifics (e.g., layer activations, normalization strategies) are to CBAM [Woo et al., 2018], MobileNetV2 [Sandler et al., 2018], and DeepLabV3+ [Chen et al., 2018]. The dataset and source code are publicly available to support reproducibility and further extension (Cao et al., 5 Jan 2026).