Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mamba-UIE: Hybrid Physics-Based Underwater UIE

Updated 2 March 2026
  • Mamba-UIE is a computational framework that integrates state-space modeling with physically constrained decomposition to enhance underwater images.
  • It adopts a revised image formation model to decouple direct transmission and backscatter, ensuring reconstruction consistency and improved PSNR/SSIM metrics.
  • The architecture employs parallel modules and a novel Mamba-in-Convolution block to efficiently capture both global and local dependencies.

Mamba-UIE is a computational framework for underwater image enhancement (UIE) that fuses state-space modeling (Mamba) with a physically constrained, decompositional approach to the underwater image formation process. It addresses the deficiencies of CNNs (limited long-range modeling) and Transformers (quadratic complexity) by leveraging Mamba, a state space model with linear sequence complexity, and by incorporating a physical consistency constraint derived from a revised underwater image formation model. Mamba-UIE emphasizes both global and local information integration, achieving state-of-the-art performance on several UIE benchmarks (Zhang et al., 2024).

1. Revised Underwater Image Formation Model

At the core of Mamba-UIE is the adoption of an energy-conserving underwater image formation model, replacing the classic two-term Koschmieder formulation. The Akkaynak & Treibitz model distinguishes direct transmission from backscatter, both governed by separate attenuation coefficients (βd\beta_d and βb\beta_b):

I(x)=J(x)td(x)+B(1tb(x))I(x) = J(x) \odot t_d(x) + B \odot (1 - t_b(x))

where:

  • I(x)I(x): observed underwater image
  • J(x)J(x): latent scene radiance (“clean” image)
  • td(x)=exp(βdd(x))t_d(x) = \exp(-\beta_d d(x)): direct transmission
  • tb(x)=exp(βbd(x))t_b(x) = \exp(-\beta_b d(x)): backscatter transmission
  • BB: global background light
  • d(x)d(x): scene depth at pixel xx

This decomposition explicitly predicts four latent components: J(x)J(x), td(x)t_d(x), tb(x)t_b(x), BB. A reconstruction step synthesizes the image I(x)I'(x) from these, enforcing consistency with the input through a joint loss.

2. Physical-Consistency and Total Loss Formulation

The network is trained with a physically motivated reconstruction consistency loss:

Lrec=II(J,td,tb,B)22+[1SSIM(I,I)]L_\mathrm{rec} = \|I - I'(J, t_d, t_b, B)\|^2_2 + [1 - \mathrm{SSIM}(I, I')]

The total loss combines this reconstruction term with direct prediction losses on scene radiance:

Ltotal=L2(J,Jgt)+LSSIM(J,Jgt)+αLedge(J,Jgt)+LUIQM(J)+L2(I,I)+LSSIM(I,I)L_\mathrm{total} = L_2(J, J_\mathrm{gt}) + L_\mathrm{SSIM}(J, J_\mathrm{gt}) + \alpha L_\mathrm{edge}(J, J_\mathrm{gt}) + L_\mathrm{UIQM}(J) + L_2(I, I') + L_\mathrm{SSIM}(I, I')

with α=0.05\alpha = 0.05. Ground-truth supervision ensures the enhanced image JJ approximates the ideal “clear” scene while maintaining image formation fidelity via the reconstructed image.

Ablation shows removing this physics-based reconstruction penalty leads to a PSNR drop from 27.13dB27.13\,\mathrm{dB} to 26.15dB26.15\,\mathrm{dB}, matching the gain achieved by inclusion of the Mamba branch (Zhang et al., 2024).

3. Mamba-UIE Network Architecture

Mamba-UIE factorizes estimation into four parallel modules:

  • J-Net: estimates the scene radiance J(x)J(x), structured as a hybrid CNN+Mamba SSM with no spatial downsampling.
  • TD-Net: regresses direct transmission td(x)t_d(x) (6-layer CNN).
  • TB-Net: regresses backscatter transmission tb(x)t_b(x) (6-layer CNN).
  • GBL module: regresses global background light BB using per-channel statistics and a small regression head.

a) SSM Integration via Mamba-in-Convolution

J-Net integrates Mamba blocks—linear complexity state-space layers—directly with CNN modules. Each Mamba-in-Convolution (MIC) block:

  • Applies 1×11\times1 Conv \to InstanceNorm \to Mish activation to modulate channel count.
  • Uses a Channel-Spatial Siamese (CSS) structure: the feature map is reshaped both along spatial (for channel-level modeling) and channel (for spatial modeling) axes and processed by stacked Mamba layers.
  • Aggregates global (via CSS/Mamba) and local (CNN path) representations through residual summation.

By design, this hybrid modeling enables Mamba-UIE to capture both long-range and local dependencies, overcoming the CNN’s locality and Transformer’s prohibitive O(N2)O(N^2) complexity for high-resolution images.

b) Parallel Estimation, Supervision, and Output

TD-Net/TB-Net use shallow CNNs to estimate td(x)t_d(x) and tb(x)t_b(x), respectively; GBL estimates BB via statistics and regression. The four components are recombined (see Section 1), and both JJ and II' are directly supervised.

4. Training Procedure and Quantitative Results

Training leverages:

  • UIEB: $890$ paired images (underwater/clear); also tested on Challenging60 [no-ref], EUVP ($1,200$ pairs), U45 ($45$ unpaired challenging images).
  • All images are resized/cropped to 256×256256\times256.
  • Adam optimizer, learning rate 2×1042\times10^{-4}, batch size $1$.

Performance on UIEB:

  • MSE: 0.26×1030.26 \times 10^3
  • PSNR: 27.13dB27.13\,\mathrm{dB}
  • SSIM: $0.93$

EUVP: MSE 0.20×1030.20 \times 10^3, PSNR 26.19dB26.19\,\mathrm{dB}, SSIM $0.83$.

No-reference metrics:

  • Challenging60: UIQM $2.72$, UCIQE $0.59$
  • U45: UIQM $3.32$, UCIQE $0.61$

Comparative ranking: On UIEB, Mamba-UIE ranks first in MSE/PSNR and second in SSIM; on EUVP it is top-3 for all scores (Zhang et al., 2024).

Ablation tests confirm the necessity of both the physical reconstruction penalty and the Mamba SSM branch; removing either yields a PSNR/SSIM reduction of 1dB\sim 1\,\mathrm{dB}/$0.01$.

A comparison of formation models shows the revised Akkaynak model achieves superior PSNR/SSIM/UIQM over Koschmieder, Retinex, and Jaffe–McGlamery alternatives.

5. Context and Contributions in the UIE Landscape

Mamba-UIE is positioned within a class of recent SSM-based UIE models that address the dual need for efficiency and global context modeling. While methods like O-Mamba (Dong et al., 2024), RD-UIE (Jiang et al., 2 May 2025), and MambaUIE (Chen et al., 2024) emphasize novel state-space scanning or dynamic fusion mechanisms, Mamba-UIE is unique in its explicit, physics-driven image formation decomposition, parallel estimation, and reconstruction-consistency penalty.

The MIC block architecture, which exposes feature maps to both channel-wise and spatial-wise Mamba modeling and includes a residual CNN path, is notable for efficient sequence modeling at linear cost in large images. This design allows Mamba-UIE to outperform competing Transformer/CNN hybrids on both accuracy and computational complexity, especially at high input resolutions.

6. Implications, Limitations, and Future Work

Mamba-UIE demonstrates that linear-complexity SSMs combined with physically grounded constraints yield robust UIE across diverse datasets, even in the absence of explicit adversarial loss or perceptual penalties. Both reconstruction physics and advanced global dependency modeling provide substantial and separable performance gains.

A plausible implication is that future UIE models may further leverage hybrid SSM physics-constrained designs, including end-to-end learned depth/attenuation estimation for more challenging, uncalibrated scenarios, or extend to multi-frame (video) settings with temporal SSMs. The current design, however, relies on sufficient paired data and explicit supervision for each latent component. Exploring self-supervised estimates of the scene radiance and attenuation maps is a potential future extension (Zhang et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Mamba-UIE.