MGF-Skip: Mamba-Guided Fusion Skip Connection
- The paper demonstrates that MGF-Skip enhances encoder-decoder segmentation by using decoder-based gating to suppress noise and improve feature integration.
- MGF-Skip employs gated convolutions, residual reinforcement, and concatenation, enabling efficient boundary localization in medical image segmentation.
- Empirical results reveal that MGF-Skip boosts performance metrics like IoU and DSC compared to traditional concatenation and attention modules.
The Mamba-Guided Fusion Skip Connection (MGF-Skip) is a skip connection module designed to enhance semantic and spatial feature integration in encoder–decoder architectures for medical image segmentation. Introduced in HyM-UNet, MGF-Skip leverages semantically rich decoder features as gating signals to suppress noise in encoder features while enforcing fine structural detail through a residual pathway. This configuration enables improved boundary localization and noise robustness compared to conventional concatenation or attention-based skips, directly addressing the semantic gap and feature misalignment common in deep convolutional architectures (Chen et al., 22 Nov 2025).
1. Architectural Placement and Input–Output Relations
MGF-Skip replaces the standard feature concatenation employed at each encoder–decoder interface of U-Net-derived segmentation models. At a given decoder stage , the module receives:
- : the spatially high-resolution, texture-rich feature map from encoder stage .
- : a lower-resolution, semantically strong decoder feature from the previous decoding stage.
undergoes upsampling, , aligning spatial dimensions. The MGF-Skip module fuses and to produce , which is then consumed by the subsequent decoder block. Fig. 1 of (Chen et al., 22 Nov 2025) visually delineates this interface in the overall architecture.
2. Mathematical Formulation
The fusion process in MGF-Skip involves spatial gating, feature filtering, residual reinforcement, and concatenation:
- Gate Computation: The upsampled decoder feature undergoes a sequence of convolutions and nonlinearities to derive a spatially-aware gate:
- maps channels, stride 1, padding 1.
- maps .
- denotes the sigmoid activation.
- Feature Filtering: The encoder features are modulated spatially:
where denotes element-wise multiplication.
- Residual Reinforcement: The gated and original encoder features are summed:
- Final Fusion: The residual-augmented encoder feature is concatenated with the upsampled decoder feature:
This fused feature serves as input to the next decoder stage.
3. Implementation Specifics
MGF-Skip’s gating branch comprises:
- convolution (, stride 1, padding 1)
- ReLU activation
- convolution ()
- Sigmoid activation
Batch normalization is intentionally omitted in the gating branch to retain spatial sensitivity. Channel dimensions and kernel sizes are configured to ensure alignment between the gating mask (%%%%24%%%%) and the encoder feature (). The residual addition constitutes an internal skip within the module, ensuring information preservation.
4. Fusion Strategy and Functional Rationale
MGF-Skip is architected to achieve two critical objectives:
- Suppression of Background Noise: The sigmoid-activated gating mask , derived from deep decoder features, adaptively down-weights locations in associated with image noise or irrelevant structures (e.g., artifacts from hair occlusion or specular highlights). This mechanism is fully differentiable and trained end-to-end, with no fixed thresholds or hard masking.
- Preservation and Enhancement of Boundaries: Given that aggressive gating may suppress true boundary information, the residual connection ensures that low-level spatial details, essential for precise contour delineation, are continuously reinforced.
All convolutional parameters within the gating branch are learnable and jointly optimized with the rest of HyM-UNet.
5. Integration within HyM-UNet and Training Configuration
MGF-Skip is instantiated at each of the four encoder–decoder transition stages. Inputs are drawn from a hybrid encoder: early stages utilize CNN blocks for local texture modeling, while deeper stages employ Visual Mamba modules for long-range context. The training protocol for HyM-UNet with MGF-Skip includes:
- Optimizer: AdamW (weight decay , , )
- Initial learning rate: , cosine-annealed to over 200 epochs
- Batch size: 24
- Input resolution:
- Loss:
combining Dice, binary cross-entropy, and boundary-aware edge loss.
6. Empirical Performance and Comparative Analysis
Table 1 of (Chen et al., 22 Nov 2025) demonstrates that HyM-UNet incorporating MGF-Skip achieves superior results on the ISIC 2018 test set:
- Intersection over Union (IoU):
- Dice Similarity Coefficient (DSC):
- 95th percentile Hausdorff Distance (HD95): $4.03$ mm
- Precision:
These scores surpass those of U-Net, CE-Net, and Attention U-Net. Table 2 presents an ablation: integrating MGF-Skip into a U-Net baseline increases IoU by (from to ) and DSC by (from to ), outperforming SE and CBAM attention modules in the same positions.
MGF-Skip's design results in minimal additional parameter overhead relative to standard skip connections or spatial/channel attention modules, while maintaining inference latency ( ms per image on RTX 3090) competitive with or lower than ViT-based architectures.
7. Distinctions from Prior Skip Connection Mechanisms
MGF-Skip contrasts with alternative strategies as follows:
| Skip Module | Gating Source | Residual Path | Attention Dimension |
|---|---|---|---|
| Standard concat | None | None | None |
| SE | Encoder/Decoder | None | Channel |
| CBAM | Encoder/Decoder | None | Channel + Spatial |
| MGF-Skip | Decoder (deep) | Encoder | Spatial (decoder-guided) |
Standard concatenation indiscriminately propagates all encoder features, including noise. SE and CBAM introduce channel/spatial attention but lack explicit residual renormalization and do not leverage decoder semantics for gating. MGF-Skip’s use of the decoder as a gating source, with end-to-end-learned dynamic suppression and explicit residual addition, is unique in enhancing ambiguous boundary regions and suppressing artifacts, as shown by empirical evaluation (Chen et al., 22 Nov 2025).