Papers
Topics
Authors
Recent
2000 character limit reached

MGF-Skip: Mamba-Guided Fusion Skip Connection

Updated 29 November 2025
  • The paper demonstrates that MGF-Skip enhances encoder-decoder segmentation by using decoder-based gating to suppress noise and improve feature integration.
  • MGF-Skip employs gated convolutions, residual reinforcement, and concatenation, enabling efficient boundary localization in medical image segmentation.
  • Empirical results reveal that MGF-Skip boosts performance metrics like IoU and DSC compared to traditional concatenation and attention modules.

The Mamba-Guided Fusion Skip Connection (MGF-Skip) is a skip connection module designed to enhance semantic and spatial feature integration in encoder–decoder architectures for medical image segmentation. Introduced in HyM-UNet, MGF-Skip leverages semantically rich decoder features as gating signals to suppress noise in encoder features while enforcing fine structural detail through a residual pathway. This configuration enables improved boundary localization and noise robustness compared to conventional concatenation or attention-based skips, directly addressing the semantic gap and feature misalignment common in deep convolutional architectures (Chen et al., 22 Nov 2025).

1. Architectural Placement and Input–Output Relations

MGF-Skip replaces the standard feature concatenation employed at each encoder–decoder interface of U-Net-derived segmentation models. At a given decoder stage ii, the module receives:

  • Ei∈RHi×Wi×CeE_i \in \mathbb{R}^{H_i \times W_i \times C_e}: the spatially high-resolution, texture-rich feature map from encoder stage ii.
  • Di∈RHi+1×Wi+1×CdD_i \in \mathbb{R}^{H_{i+1} \times W_{i+1} \times C_d}: a lower-resolution, semantically strong decoder feature from the previous decoding stage.

DiD_i undergoes upsampling, Diup=Upsample(Di)∈RHi×Wi×CdD_i^{up} = \mathrm{Upsample}(D_i) \in \mathbb{R}^{H_i \times W_i \times C_d}, aligning spatial dimensions. The MGF-Skip module fuses EiE_i and DiupD_i^{up} to produce Fskip∈RHi×Wi×(Ce+Cd)F_{skip} \in \mathbb{R}^{H_i \times W_i \times (C_e + C_d)}, which is then consumed by the subsequent decoder block. Fig. 1 of (Chen et al., 22 Nov 2025) visually delineates this interface in the overall architecture.

2. Mathematical Formulation

The fusion process in MGF-Skip involves spatial gating, feature filtering, residual reinforcement, and concatenation:

  1. Gate Computation: The upsampled decoder feature DiupD_i^{up} undergoes a sequence of convolutions and nonlinearities to derive a spatially-aware gate:

Gi=σ(Conv1×1(ReLU(Conv3×3(Diup))))∈[0,1]Hi×Wi×CeG_i = \sigma \left( \mathrm{Conv}_{1\times1} \left( \mathrm{ReLU} \left( \mathrm{Conv}_{3\times3}(D_i^{up}) \right) \right) \right) \in [0,1]^{H_i \times W_i \times C_e}

  • Conv3×3\mathrm{Conv}_{3\times3} maps Cd→CeC_d \rightarrow C_e channels, stride 1, padding 1.
  • Conv1×1\mathrm{Conv}_{1\times1} maps Ce→CeC_e \rightarrow C_e.
  • σ(â‹…)\sigma(\cdot) denotes the sigmoid activation.
  1. Feature Filtering: The encoder features are modulated spatially:

Efilt=Ei⊙GiE_{filt} = E_i \odot G_i

where ⊙\odot denotes element-wise multiplication.

  1. Residual Reinforcement: The gated and original encoder features are summed:

Eres=Ei+EfiltE_{res} = E_i + E_{filt}

  1. Final Fusion: The residual-augmented encoder feature is concatenated with the upsampled decoder feature:

Fskip=Concat[Eres,  Diup]F_{skip} = \mathrm{Concat}[E_{res},\; D_i^{up}]

This fused feature serves as input to the next decoder stage.

3. Implementation Specifics

MGF-Skip’s gating branch comprises:

  • 3×33\times3 convolution (Cd→CeC_d \rightarrow C_e, stride 1, padding 1)
  • ReLU activation
  • 1×11\times1 convolution (Ce→CeC_e \rightarrow C_e)
  • Sigmoid activation

Batch normalization is intentionally omitted in the gating branch to retain spatial sensitivity. Channel dimensions and kernel sizes are configured to ensure alignment between the gating mask (%%%%24%%%%) and the encoder feature (EiE_i). The residual addition Ei+EfiltE_i + E_{filt} constitutes an internal skip within the module, ensuring information preservation.

4. Fusion Strategy and Functional Rationale

MGF-Skip is architected to achieve two critical objectives:

  • Suppression of Background Noise: The sigmoid-activated gating mask GiG_i, derived from deep decoder features, adaptively down-weights locations in EiE_i associated with image noise or irrelevant structures (e.g., artifacts from hair occlusion or specular highlights). This mechanism is fully differentiable and trained end-to-end, with no fixed thresholds or hard masking.
  • Preservation and Enhancement of Boundaries: Given that aggressive gating may suppress true boundary information, the residual connection Eres=Ei+EfiltE_{res} = E_i + E_{filt} ensures that low-level spatial details, essential for precise contour delineation, are continuously reinforced.

All convolutional parameters within the gating branch are learnable and jointly optimized with the rest of HyM-UNet.

5. Integration within HyM-UNet and Training Configuration

MGF-Skip is instantiated at each of the four encoder–decoder transition stages. Inputs are drawn from a hybrid encoder: early stages utilize CNN blocks for local texture modeling, while deeper stages employ Visual Mamba modules for long-range context. The training protocol for HyM-UNet with MGF-Skip includes:

  • Optimizer: AdamW (weight decay =10−4=10^{-4}, β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999)
  • Initial learning rate: 10−410^{-4}, cosine-annealed to 10−610^{-6} over 200 epochs
  • Batch size: 24
  • Input resolution: 256×256256 \times 256
  • Loss:

Ltotal=LDice+0.5 LBCE+0.5 LEdge\mathcal{L}_{total} = \mathcal{L}_{Dice} + 0.5\,\mathcal{L}_{BCE} + 0.5\,\mathcal{L}_{Edge}

combining Dice, binary cross-entropy, and boundary-aware edge loss.

6. Empirical Performance and Comparative Analysis

Table 1 of (Chen et al., 22 Nov 2025) demonstrates that HyM-UNet incorporating MGF-Skip achieves superior results on the ISIC 2018 test set:

  • Intersection over Union (IoU): 81.82%81.82\%
  • Dice Similarity Coefficient (DSC): 88.97%88.97\%
  • 95th percentile Hausdorff Distance (HD95): $4.03$ mm
  • Precision: 90.91%90.91\%

These scores surpass those of U-Net, CE-Net, and Attention U-Net. Table 2 presents an ablation: integrating MGF-Skip into a U-Net baseline increases IoU by +1.02%+1.02\% (from 80.13%80.13\% to 81.15%81.15\%) and DSC by +0.31%+0.31\% (from 87.81%87.81\% to 88.12%88.12\%), outperforming SE and CBAM attention modules in the same positions.

MGF-Skip's design results in minimal additional parameter overhead relative to standard skip connections or spatial/channel attention modules, while maintaining inference latency (≈12\approx 12 ms per 256×256256 \times 256 image on RTX 3090) competitive with or lower than ViT-based architectures.

7. Distinctions from Prior Skip Connection Mechanisms

MGF-Skip contrasts with alternative strategies as follows:

Skip Module Gating Source Residual Path Attention Dimension
Standard concat None None None
SE Encoder/Decoder None Channel
CBAM Encoder/Decoder None Channel + Spatial
MGF-Skip Decoder (deep) Encoder Spatial (decoder-guided)

Standard concatenation indiscriminately propagates all encoder features, including noise. SE and CBAM introduce channel/spatial attention but lack explicit residual renormalization and do not leverage decoder semantics for gating. MGF-Skip’s use of the decoder as a gating source, with end-to-end-learned dynamic suppression and explicit residual addition, is unique in enhancing ambiguous boundary regions and suppressing artifacts, as shown by empirical evaluation (Chen et al., 22 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Mamba-Guided Fusion Skip Connection (MGF-Skip).