MGF-Skip: Mamba-Guided Fusion Skip Connection

Updated 29 November 2025

The paper demonstrates that MGF-Skip enhances encoder-decoder segmentation by using decoder-based gating to suppress noise and improve feature integration.
MGF-Skip employs gated convolutions, residual reinforcement, and concatenation, enabling efficient boundary localization in medical image segmentation.
Empirical results reveal that MGF-Skip boosts performance metrics like IoU and DSC compared to traditional concatenation and attention modules.

The Mamba-Guided Fusion Skip Connection (MGF-Skip) is a skip connection module designed to enhance semantic and spatial feature integration in encoder–decoder architectures for medical image segmentation. Introduced in HyM-UNet, MGF-Skip leverages semantically rich decoder features as gating signals to suppress noise in encoder features while enforcing fine structural detail through a residual pathway. This configuration enables improved boundary localization and noise robustness compared to conventional concatenation or attention-based skips, directly addressing the semantic gap and feature misalignment common in deep convolutional architectures (Chen et al., 22 Nov 2025).

1. Architectural Placement and Input–Output Relations

MGF-Skip replaces the standard feature concatenation employed at each encoder–decoder interface of U-Net-derived segmentation models. At a given decoder stage $i$ , the module receives:

$E_i \in \mathbb{R}^{H_i \times W_i \times C_e}$ : the spatially high-resolution, texture-rich feature map from encoder stage $i$ .
$D_i \in \mathbb{R}^{H_{i+1} \times W_{i+1} \times C_d}$ : a lower-resolution, semantically strong decoder feature from the previous decoding stage.

$D_i$ undergoes upsampling, $D_i^{up} = \mathrm{Upsample}(D_i) \in \mathbb{R}^{H_i \times W_i \times C_d}$ , aligning spatial dimensions. The MGF-Skip module fuses $E_i$ and $D_i^{up}$ to produce $F_{skip} \in \mathbb{R}^{H_i \times W_i \times (C_e + C_d)}$ , which is then consumed by the subsequent decoder block. Fig. 1 of (Chen et al., 22 Nov 2025) visually delineates this interface in the overall architecture.

2. Mathematical Formulation

The fusion process in MGF-Skip involves spatial gating, feature filtering, residual reinforcement, and concatenation:

Gate Computation: The upsampled decoder feature $D_i^{up}$ undergoes a sequence of convolutions and nonlinearities to derive a spatially-aware gate:

$G_i = \sigma \left( \mathrm{Conv}_{1\times1} \left( \mathrm{ReLU} \left( \mathrm{Conv}_{3\times3}(D_i^{up}) \right) \right) \right) \in [0,1]^{H_i \times W_i \times C_e}$

$\mathrm{Conv}_{3\times3}$ maps $C_d \rightarrow C_e$ channels, stride 1, padding 1.
$\mathrm{Conv}_{1\times1}$ maps $C_e \rightarrow C_e$ .
$\sigma(\cdot)$ denotes the sigmoid activation.

Feature Filtering: The encoder features are modulated spatially:

$E_{filt} = E_i \odot G_i$

where $\odot$ denotes element-wise multiplication.

Residual Reinforcement: The gated and original encoder features are summed:

$E_{res} = E_i + E_{filt}$

Final Fusion: The residual-augmented encoder feature is concatenated with the upsampled decoder feature:

$F_{skip} = \mathrm{Concat}[E_{res},\; D_i^{up}]$

This fused feature serves as input to the next decoder stage.

3. Implementation Specifics

MGF-Skip’s gating branch comprises:

$3\times3$ convolution ( $C_d \rightarrow C_e$ , stride 1, padding 1)
ReLU activation
$1\times1$ convolution ( $C_e \rightarrow C_e$ )
Sigmoid activation

Batch normalization is intentionally omitted in the gating branch to retain spatial sensitivity. Channel dimensions and kernel sizes are configured to ensure alignment between the gating mask (%%%%24%%%%) and the encoder feature ( $E_i$ ). The residual addition $E_i + E_{filt}$ constitutes an internal skip within the module, ensuring information preservation.

4. Fusion Strategy and Functional Rationale

MGF-Skip is architected to achieve two critical objectives:

Suppression of Background Noise: The sigmoid-activated gating mask $G_i$ , derived from deep decoder features, adaptively down-weights locations in $E_i$ associated with image noise or irrelevant structures (e.g., artifacts from hair occlusion or specular highlights). This mechanism is fully differentiable and trained end-to-end, with no fixed thresholds or hard masking.
Preservation and Enhancement of Boundaries: Given that aggressive gating may suppress true boundary information, the residual connection $E_{res} = E_i + E_{filt}$ ensures that low-level spatial details, essential for precise contour delineation, are continuously reinforced.

All convolutional parameters within the gating branch are learnable and jointly optimized with the rest of HyM-UNet.

5. Integration within HyM-UNet and Training Configuration

MGF-Skip is instantiated at each of the four encoder–decoder transition stages. Inputs are drawn from a hybrid encoder: early stages utilize CNN blocks for local texture modeling, while deeper stages employ Visual Mamba modules for long-range context. The training protocol for HyM-UNet with MGF-Skip includes:

Optimizer: AdamW (weight decay $=10^{-4}$ , $\beta_1=0.9$ , $\beta_2=0.999$ )
Initial learning rate: $10^{-4}$ , cosine-annealed to $10^{-6}$ over 200 epochs
Batch size: 24
Input resolution: $256 \times 256$
Loss:

$\mathcal{L}_{total} = \mathcal{L}_{Dice} + 0.5\,\mathcal{L}_{BCE} + 0.5\,\mathcal{L}_{Edge}$

combining Dice, binary cross-entropy, and boundary-aware edge loss.

6. Empirical Performance and Comparative Analysis

Table 1 of (Chen et al., 22 Nov 2025) demonstrates that HyM-UNet incorporating MGF-Skip achieves superior results on the ISIC 2018 test set:

Intersection over Union (IoU): $81.82\%$
Dice Similarity Coefficient (DSC): $88.97\%$
95th percentile Hausdorff Distance (HD95): $4.03$ mm
Precision: $90.91\%$

These scores surpass those of U-Net, CE-Net, and Attention U-Net. Table 2 presents an ablation: integrating MGF-Skip into a U-Net baseline increases IoU by $+1.02\%$ (from $80.13\%$ to $81.15\%$ ) and DSC by $+0.31\%$ (from $87.81\%$ to $88.12\%$ ), outperforming SE and CBAM attention modules in the same positions.

MGF-Skip's design results in minimal additional parameter overhead relative to standard skip connections or spatial/channel attention modules, while maintaining inference latency ( $\approx 12$ ms per $256 \times 256$ image on RTX 3090) competitive with or lower than ViT-based architectures.

7. Distinctions from Prior Skip Connection Mechanisms

MGF-Skip contrasts with alternative strategies as follows:

Skip Module	Gating Source	Residual Path	Attention Dimension
Standard concat	None	None	None
SE	Encoder/Decoder	None	Channel
CBAM	Encoder/Decoder	None	Channel + Spatial
MGF-Skip	Decoder (deep)	Encoder	Spatial (decoder-guided)

Standard concatenation indiscriminately propagates all encoder features, including noise. SE and CBAM introduce channel/spatial attention but lack explicit residual renormalization and do not leverage decoder semantics for gating. MGF-Skip’s use of the decoder as a gating source, with end-to-end-learned dynamic suppression and explicit residual addition, is unique in enhancing ambiguous boundary regions and suppressing artifacts, as shown by empirical evaluation (Chen et al., 22 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

HyM-UNet: Synergizing Local Texture and Global Context via Hybrid CNN-Mamba Architecture for Medical Image Segmentation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Mamba-Guided Fusion Skip Connection (MGF-Skip).