Papers
Topics
Authors
Recent
Search
2000 character limit reached

AQUA-Net: Adaptive Fusion for Underwater Imaging

Updated 12 December 2025
  • The paper introduces AQUA-Net, which fuses frequency and illumination cues to correct color casts, haze, and low contrast in underwater images.
  • Its methodology employs a three-level residual encoder–decoder with auxiliary FFT and Retinex-based branches to achieve high-frequency texture recovery and adaptive exposure control.
  • Experimental evaluations demonstrate significant boosts in metrics such as PSNR, SSIM, UIQM, and UCIQE, all while maintaining real-time efficiency and low computational complexity.

Adaptive Frequency Fusion and Illumination Aware Network (AQUA-Net) is a deep learning architecture for underwater image enhancement that integrates frequency- and illumination-domain processing with a lightweight residual encoder–decoder backbone. AQUA-Net is designed to address core challenges in underwater sensing: severe color distortion, low contrast, and haze induced by wavelength-dependent light absorption and scattering. By fusing spatial, frequency, and illumination cues, it restores perceptual realism and color balance while maintaining computational efficiency suitable for real-time and embedded deployment (Ali et al., 5 Dec 2025).

1. Architectural Foundations

AQUA-Net comprises a three-level encoder–decoder constructed from Residual Enhancement Modules (REMs). Each REM employs depthwise-separable convolutions, followed by a pointwise convolution and Leaky ReLU activation; a skip (residual) connection preserves low-level details. The encoder progressively subsamples the spatial resolution (×2, ×4, ×8), doubling feature channels at each level; the decoder symmetrically upsamples and reconstructs the image.

The central network is augmented by two auxiliary branches operating directly on raw image input:

  • The frequency-fusion branch performs a 2D Fourier transform (FFT) on each RGB channel, normalizes and adaptively modulates spectral magnitudes, reconstructs an enhanced image by inverse FFT, and computes a high-frequency correction map fused into the encoder’s base feature map.
  • The illumination-aware branch estimates a spatially varying illumination map using principles from Retinex theory; this map is injected into each decoder stage to achieve adaptive exposure correction.

The following schematic expresses the data flow in AQUA-Net:

Input Auxiliary Branches Fusion Operation Encoder–Decoder Path
I Frequency: R_f <br> Illumination: L(x) Fuse: X₀ = φ₀(I) + φ_p(R_f) Encoder: REM₁→↓→REM₂→↓→REM₃ <br> Decoder: ↑→REM₃→skip⊙L→REM₂→skip⊙L→REM₁→Reconstruct

This design enables correction of color casts, haze suppression, fine texture recovery, and adaptive exposure control across a range of underwater scenarios.

2. Frequency Fusion Encoder

The frequency-fusion branch analyses and enhances input frequency information. Let IRB×C×H×WI \in \mathbb{R}^{B \times C \times H \times W} denote the input batch; for each channel cc, the 2D Fourier transform is

Xc(u,v)=1HWh=0H1w=0W1Ic(h,w)ej2π(uhH+vwW)\mathcal{X}_c(u,v) = \frac{1}{\sqrt{HW}} \sum_{h=0}^{H-1} \sum_{w=0}^{W-1} I_c(h,w) e^{-j2\pi\left(\frac{uh}{H} + \frac{vw}{W}\right)}

Decompose Xc=McejΦc\mathcal{X}_c = M_c e^{j\Phi_c}; the magnitude McM_c is normalized by its mean, and a two-layer CNN predicts an adaptive modulation map SS. The enhanced spectrum is Mc=M~c(1+αS)M^*_c = \tilde M_c \odot (1 + \alpha S) (with α\alpha a learned scalar). Inverse FFT reconstructs the spatial-domain correction, and subtracting II yields the high-frequency map:

Rf=F1(McejΦc)IR_f = \mathcal{F}^{-1}(M^*_c e^{j\Phi_c}) - I

RfR_f is projected via a 3×33\times3 convolution and added to the initial encoder feature. This branch facilitates recovery of high-frequency textures and structural cues attenuated by water scattering.

3. Illumination-Aware Decoder

Illumination manipulation in AQUA-Net is grounded in Retinex theory, which models pixel intensity as I(x)=R(x)L(x)I(x) = R(x) \odot L(x) where R(x)R(x) denotes reflectance and L(x)L(x) denotes illumination. The illumination map L(x)L(x) is learned by a CNN ϕl\phi_l that outputs coefficient maps [α,β][\alpha, \beta]. The final map is

L=σ(α)[1+tanh(β)]L = \sigma(\alpha) \cdot [1 + \tanh(\beta)]

where σ()\sigma(\cdot) ensures scale [0,1][0,1], and tanh\tanh allows localized exposure control. During decoding, skip-connection features are modulated by the resized LL for each decoder stage:

Dk=ψu(Ek+1)+SkLkD_k = \psi_u(E_{k+1}) + S_k \odot L_k

where ψu\psi_u upsamples decoder features and SkS_k is the corresponding encoder feature. This mechanism adaptively corrects non-uniform and low-light exposures characteristic of underwater environments. The restored output is:

I^=tanh(ϕr(D1))\hat{I} = \tanh(\phi_r(D_1))

4. Training Objective and Ablation Analysis

AQUA-Net is supervised end-to-end via an L1L_1 reconstruction loss:

L=I^Iref1\mathcal{L} = \| \hat{I} - I_\text{ref} \|_1

No explicit perceptual or frequency-consistency losses are used beyond the frequency-branch operations.

Ablation studies on standard benchmarks (UEIB-T90, UIEB-C60) demonstrate incremental effect:

  • Base encoder–decoder: PSNR 18.473/SSIM 0.832/UIQM 2.872/UCIQE 0.377.
    • Frequency: substantial gains in PSNR, SSIM, UIQM, UCIQE.
    • Illumination: comparably robust improvement.
  • Full model: best performance (PSNR 21.257, SSIM 0.884, UIQM 3.250, UCIQE 0.397).

Both frequency and illumination branches yield complementary improvements, enhancing detail/texture and color/exposure respectively (Ali et al., 5 Dec 2025).

5. The DeepSea Dataset and Real-World Evaluation

DeepSea is a high-resolution, real-world underwater dataset introduced to test deep-sea performance. It comprises 1,533 frames (downsampled from 6K video to 1920×1080) captured at three Mediterranean locations (Strait of Sicily: 138–760 m; Off Bari: ≈470 m; Off Oristano: 108–258 m) using a professional ROV camera system. 80 images (DeepSea-T80) were selected, characterized by challenging degradations—wavelength-dependent attenuation, artificial ROV lighting with back-scatter, extreme low-light, turbidity, and authentic marine backgrounds.

AQUA-Net demonstrates strong generalization and competitive performance across multiple datasets—including UEIB-T90, UEIB-C60, EUVP-T515, RUIE-T78, and DeepSea-T80—outperforming or matching recent state-of-the-art solutions such as NU2Net, TACL, CCL-Net, and OUNet-JL in metrics including UIQM and SSIM.

6. Model Complexity, Efficiency, and Deployment

AQUA-Net comprises 0.333M parameters and requires 20.86 GFLOPs per 128×128128\times128 input, positioning it as the second most compact among eleven recent methods (UWCNN is smaller but with lower qualitative metrics). The design leverages depthwise-separable convolutions and efficient FFT-based blocks, enabling real-time inference (<30 ms/frame) on RTX-class GPUs at moderate resolutions (128×128–512×512), as well as efficient deployment on embedded platforms (e.g., NVIDIA Jetson).

Recommendations for real-world deployment include:

  • Use of precomputed FFT plans or cuFFT for optimal transform speed.
  • 8-bit quantization of frequency-branch convolution weights with minimal quality loss.
  • Pairing with a denoiser in extremely low-light regimes to mitigate sensor noise prior to frequency processing.

A plausible implication is that such efficiency–accuracy trade-offs are critical for autonomous vehicles and real-time marine monitoring applications.

7. Qualitative Results and Comparative Benchmarks

Qualitative analyses show that:

  • A baseline encoder–decoder partially addresses illumination but fails to remove color cast or low-frequency haze.
  • Addition of the frequency branch sharpens boundaries but does not fully restore global color balance.
  • The full system achieves comprehensive enhancement, balancing color correction, contrast, and texture sharpness.

In direct comparison with methods such as CCL-Net and OUNet-JL, AQUA-Net effectively removes dominant green/blue hues, suppresses haze, and avoids over-saturation/artifact introduction, especially in challenging deep-sea conditions where competing methods often yield undesired color shifts.

Overall, the fusion of spatial, frequency, and illumination cues in AQUA-Net establishes it as a robust, generalizable, and computationally efficient approach to enhancing underwater imagery under both controlled and extreme real-world conditions (Ali et al., 5 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Frequency Fusion and Illumination Aware Network (AQUA-Net).