Papers
Topics
Authors
Recent
2000 character limit reached

MCD-Net: Efficient Moraine Segmentation

Updated 12 January 2026
  • MCD-Net is a lightweight deep learning framework designed for optical-only moraine segmentation using high-resolution satellite imagery.
  • It integrates MobileNetV2, CBAM, and DeepLabV3+ to achieve competitive metrics (62.3% mIoU, 72.8% Dice) at a reduced computational cost.
  • The framework is validated on a large-scale, manually annotated dataset, enabling automated mapping crucial for palaeoglaciology and climate change studies.

MCD-Net is a lightweight deep learning framework specifically designed for optical-only moraine segmentation in high-resolution satellite imagery. It enables automated mapping of glacial landforms, which is central for reconstructing palaeoglaciology and assessing climate-driven geomorphic change, particularly where high-quality digital elevation models (DEMs) or multi-sensor data are lacking. MCD-Net integrates a MobileNetV2 encoder, a Convolutional Block Attention Module (CBAM), and a DeepLabV3+ decoder head. This combination yields competitive segmentation accuracy (62.3% mean Intersection over Union; 72.8% Dice coefficient) at reduced computational cost compared to heavyweight alternatives, establishing MCD-Net as a reproducible, deployable baseline for moraine-body segmentation using optical imagery (Cao et al., 5 Jan 2026).

1. Optical-Only Moraine Segmentation Dataset

MCD-Net was developed and validated using a large-scale, manually annotated remote sensing dataset. The dataset comprises 3,340 orthorectified Google Earth image tiles (1024×1024 px; 0.5–2 m/pixel) collected from glaciated regions of Sichuan and Yunnan, China (26°–32° N, 98°–104° E, elevations 2,800–5,200 m) spanning 2020–2025. These images include cirque, valley, and piedmont moraines captured under challenging conditions—shadows, low contrast, and variable vegetation. Labeling was performed by three geomorphologists, yielding binary segmentation masks distinguishing “background” (0) from “moraine body” (1). Ridges, initially labeled as a third class, were merged with the moraine class due to sub-pixel-scale ambiguity and high inter-annotator variance (~±2 px). The dataset is stratified geographically (9:1 train:test split; 2,630 train / 293 test) to ensure representation of diverse valley geomorphologies.

2. Segmentation Metrics

Model performance is evaluated using confusion-matrix-based statistics:

  • Intersection over Union (IoU) for a single class (e.g., moraine body):

IoU=TPTP+FP+FN\mathrm{IoU} = \frac{\mathrm{TP}}{\mathrm{TP} + \mathrm{FP} + \mathrm{FN}}

where TP = true positives, FP = false positives, FN = false negatives.

  • Mean IoU (mIoU) over C=2C=2 classes:

mIoU=1Cc=1CTPcTPc+FPc+FNc\mathrm{mIoU} = \frac{1}{C} \sum_{c=1}^{C} \frac{\mathrm{TP}_c}{\mathrm{TP}_c + \mathrm{FP}_c + \mathrm{FN}_c}

  • Dice coefficient (F1 score overlap):

Dice=2TP2TP+FP+FN\mathrm{Dice} = \frac{2\,\mathrm{TP}}{2\,\mathrm{TP} + \mathrm{FP} + \mathrm{FN}}

Auxiliary measures include Precision, Recall, and Pixel Accuracy. These metrics facilitate rigorous comparison with other segmentation frameworks.

3. MCD-Net Architectural Design

MCD-Net adopts an encoder–attention–decoder paradigm that balances efficiency and representational power.

Encoder (MobileNetV2):

  • Processes input IRH×W×3I\in\mathbb{R}^{H\times W\times 3} with stacked inverted residual blocks and depthwise separable convolutions, yielding a compact feature map FbaseF_{base}.

Attention (CBAM):

  • Applies channel and spatial attention via CBAM. Channel attention is computed as:

Mc(F)=σ(MLP(GAP(F))+MLP(GMP(F)))M_c(F) = \sigma(\mathrm{MLP}(\mathrm{GAP}(F)) + \mathrm{MLP}(\mathrm{GMP}(F)))

  • Spatial attention is calculated as:

Ms(F)=σ(f7×7([AvgPoolc(F);MaxPoolc(F)]))M_s(F) = \sigma(f^{7\times7}([\mathrm{AvgPool}_c(F);\,\mathrm{MaxPool}_c(F)]))

  • CBAM-refined features are then:

Fatt=Ms(F)Mc(F)FF_{att}=M_s(F)\odot M_c(F)\odot F

ASPP and Decoder (DeepLabV3+):

  • ASPP applies atrous convolutions at rates {1, 6, 12, 18} and global average pooling, aggregating multi-scale context and projecting to 256-D features:

Faspp=Conv1×1[Conv1×1(Fatt),{DilConv3×3,rk(Fatt)}k=13,GAP(Fatt)]F_{aspp} = \mathrm{Conv}_{1\times1}[\mathrm{Conv}_{1\times1}(F_{att}),\{\mathrm{DilConv}_{3\times3,r_k}(F_{att})\}_{k=1}^{3},\mathrm{GAP}(F_{att})]

  • Decoder upsamples FasppF_{aspp} to 1/4 input resolution, fuses with low-level features, and outputs softmax logits Y^{0,1}H×W\hat{Y}\in\{0,1\}^{H\times W} representing moraine segmentation.
Component Core Methodology Output Shape
Encoder MobileNetV2 with inverted residual/dw-sep conv FbaseF_{base}
Attention CBAM: channel+spatial attention FattF_{att}
ASPP Multi-rate atrous + global pooling FasppF_{aspp}
Decoder DeepLabV3+ head and upsampling Y^\hat{Y}

4. Mathematical Formulation and Losses

The core mathematical building blocks of MCD-Net use the following formalisms:

Depthwise Separable Convolutions and Inverted Residuals (MobileNetV2):

  • Expansion:

Fexp=Conv1×1(Fin;W1),Cexp=tCF_{exp} = \mathrm{Conv}_{1\times1}\bigl(F_{in};\, W_1\bigr), \quad C_{exp} = t \cdot C

  • Depthwise convolution:

DW(Fexp)i,j,c=(u,v)NFexp(i+u,j+v,c)Kc(u,v)\mathrm{DW}(F_{exp})_{i,j,c} = \sum_{(u,v)\in\mathcal{N}} F_{exp}(i + u,\,j + v,\,c)\,K_c(u,v)

  • Projection:

Fout=Conv1×1(DW(Fexp);W2)F_{out} = \mathrm{Conv}_{1\times1}\bigl(\mathrm{DW}(F_{exp});\,W_2\bigr)

  • Residual connection (if Cin=CoutC_{in}=C_{out}).

ASPP (Atrous Spatial Pyramid Pooling):

  • Atrous convolution is defined as:

[Fr  k](i)=u+v=1F(i+r(u,v))k(u,v)[F *_{r}\;k](i) = \sum_{u+v=1} F(i + r\,(u,v))\,k(u,v)

  • Aggregates multiscale responses and concatenates with global pooling.

Segmentation Loss:

  • Weighted cross-entropy (equal weights w0=w1=0.5w_0=w_1=0.5):

LCE=i=1Nc{0,1}wcyi,clogpi,c\mathcal{L}_{CE} = -\sum_{i=1}^N \sum_{c\in\{0,1\}} w_c\,y_{i,c} \log p_{i,c}

  • Optionally, Dice loss may be combined:

LDice=12ipiyi+ϵipi+iyi+ϵ\mathcal{L}_{Dice} = 1 - \frac{2\sum_i p_i y_i + \epsilon}{\sum_i p_i + \sum_i y_i + \epsilon}

5. Training Regimen and Implementation Protocol

MCD-Net is implemented in PyTorch, trained on an NVIDIA RTX 5060 Ti (16 GB). The training regimen features:

  • Optimizer: AdamW (learning rate 1×1041\times10^{-4}, weight decay 1×1041\times10^{-4}) with cosine-annealing learning rate schedule.
  • Batch size: 16.
  • Data augmentation: random scaling (0.5–2.0×), random flips, random rotations (±30°), and Gaussian blur.
  • Input preprocessing: normalization to [0,1], conversion to channel-first format.
  • Training duration is up to 200 epochs with early stopping (patience = 15 epochs) based on validation mIoU.

6. Computational Efficiency and Benchmarking

MCD-Net achieves competitive accuracy while maintaining low parameter and computation budgets:

Model Params (M) GFLOPs (1024²) mIoU (%) Dice (%)
MCD-Net (MobileNetV2+CBAM) 5.83 105.7 62.3 72.8
ResNet152+CBAM 76.0 433.9 59.8 70.7
Xception+CBAM 55.3 333.7 56.7 66.4
  • MCD-Net reduces FLOPs by over 75% compared to ResNet152+CBAM and over 68% versus Xception+CBAM.
  • Achieves faster inference (~15 fps on 1024×1024 tiles) compared to <5 fps for ResNet152+CBAM on RTX 5060 Ti.

This suggests that MCD-Net is particularly suitable for large-scale regional moraine mapping and real-time applications on moderate hardware.

7. Limitations and Outlook

Several practical limitations are identified:

  • Delineation accuracy for ridges is constrained for features narrower than 3 px and in settings of severe weathering or vegetation encroachment.
  • Spectral ambiguity and mosaicking artifacts introduce misclassifications, especially under extreme illumination.
  • Small, isolated moraines (<0.2% of area) remain under-detected due to class imbalance and spatial resolution.

Recommended future research directions include:

  • Incorporation of domain adaptation or self-supervised pre-training to improve generalization under variable sensor or environmental conditions.
  • Sensor fusion approaches (e.g., combining DEM or SAR data) for enhanced ridge detection.
  • Deployment of higher-resolution UAV or drone imagery for resolving sub-pixel features.
  • Development of specialized small-object attention mechanisms or multi-scale feature pyramids for closely spaced or diminutive moraine bodies.

References for implementation specifics (e.g., layer activations, normalization strategies) are to CBAM [Woo et al., 2018], MobileNetV2 [Sandler et al., 2018], and DeepLabV3+ [Chen et al., 2018]. The dataset and source code are publicly available to support reproducibility and further extension (Cao et al., 5 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to MCD-Net.