AQUA-Net: Adaptive Fusion for Underwater Imaging

Updated 12 December 2025

The paper introduces AQUA-Net, which fuses frequency and illumination cues to correct color casts, haze, and low contrast in underwater images.
Its methodology employs a three-level residual encoder–decoder with auxiliary FFT and Retinex-based branches to achieve high-frequency texture recovery and adaptive exposure control.
Experimental evaluations demonstrate significant boosts in metrics such as PSNR, SSIM, UIQM, and UCIQE, all while maintaining real-time efficiency and low computational complexity.

Adaptive Frequency Fusion and Illumination Aware Network (AQUA-Net) is a deep learning architecture for underwater image enhancement that integrates frequency- and illumination-domain processing with a lightweight residual encoder–decoder backbone. AQUA-Net is designed to address core challenges in underwater sensing: severe color distortion, low contrast, and haze induced by wavelength-dependent light absorption and scattering. By fusing spatial, frequency, and illumination cues, it restores perceptual realism and color balance while maintaining computational efficiency suitable for real-time and embedded deployment (Ali et al., 5 Dec 2025).

1. Architectural Foundations

AQUA-Net comprises a three-level encoder–decoder constructed from Residual Enhancement Modules (REMs). Each REM employs depthwise-separable convolutions, followed by a pointwise convolution and Leaky ReLU activation; a skip (residual) connection preserves low-level details. The encoder progressively subsamples the spatial resolution (×2, ×4, ×8), doubling feature channels at each level; the decoder symmetrically upsamples and reconstructs the image.

The central network is augmented by two auxiliary branches operating directly on raw image input:

The frequency-fusion branch performs a 2D Fourier transform (FFT) on each RGB channel, normalizes and adaptively modulates spectral magnitudes, reconstructs an enhanced image by inverse FFT, and computes a high-frequency correction map fused into the encoder’s base feature map.
The illumination-aware branch estimates a spatially varying illumination map using principles from Retinex theory; this map is injected into each decoder stage to achieve adaptive exposure correction.

The following schematic expresses the data flow in AQUA-Net:

Input	Auxiliary Branches	Fusion Operation	Encoder–Decoder Path
I	Frequency: R_f <br> Illumination: L(x)	Fuse: X₀ = φ₀(I) + φ_p(R_f)	Encoder: REM₁→↓→REM₂→↓→REM₃ <br> Decoder: ↑→REM₃→skip⊙L→REM₂→skip⊙L→REM₁→Reconstruct

This design enables correction of color casts, haze suppression, fine texture recovery, and adaptive exposure control across a range of underwater scenarios.

2. Frequency Fusion Encoder

The frequency-fusion branch analyses and enhances input frequency information. Let $I \in \mathbb{R}^{B \times C \times H \times W}$ denote the input batch; for each channel $c$ , the 2D Fourier transform is

$\mathcal{X}_c(u,v) = \frac{1}{\sqrt{HW}} \sum_{h=0}^{H-1} \sum_{w=0}^{W-1} I_c(h,w) e^{-j2\pi\left(\frac{uh}{H} + \frac{vw}{W}\right)}$

Decompose $\mathcal{X}_c = M_c e^{j\Phi_c}$ ; the magnitude $M_c$ is normalized by its mean, and a two-layer CNN predicts an adaptive modulation map $S$ . The enhanced spectrum is $M^*_c = \tilde M_c \odot (1 + \alpha S)$ (with $\alpha$ a learned scalar). Inverse FFT reconstructs the spatial-domain correction, and subtracting $I$ yields the high-frequency map:

$R_f = \mathcal{F}^{-1}(M^*_c e^{j\Phi_c}) - I$

$R_f$ is projected via a $3\times3$ convolution and added to the initial encoder feature. This branch facilitates recovery of high-frequency textures and structural cues attenuated by water scattering.

3. Illumination-Aware Decoder

Illumination manipulation in AQUA-Net is grounded in Retinex theory, which models pixel intensity as $I(x) = R(x) \odot L(x)$ where $R(x)$ denotes reflectance and $L(x)$ denotes illumination. The illumination map $L(x)$ is learned by a CNN $\phi_l$ that outputs coefficient maps $[\alpha, \beta]$ . The final map is

$L = \sigma(\alpha) \cdot [1 + \tanh(\beta)]$

where $\sigma(\cdot)$ ensures scale $[0,1]$ , and $\tanh$ allows localized exposure control. During decoding, skip-connection features are modulated by the resized $L$ for each decoder stage:

$D_k = \psi_u(E_{k+1}) + S_k \odot L_k$

where $\psi_u$ upsamples decoder features and $S_k$ is the corresponding encoder feature. This mechanism adaptively corrects non-uniform and low-light exposures characteristic of underwater environments. The restored output is:

$\hat{I} = \tanh(\phi_r(D_1))$

4. Training Objective and Ablation Analysis

AQUA-Net is supervised end-to-end via an $L_1$ reconstruction loss:

$\mathcal{L} = \| \hat{I} - I_\text{ref} \|_1$

No explicit perceptual or frequency-consistency losses are used beyond the frequency-branch operations.

Ablation studies on standard benchmarks (UEIB-T90, UIEB-C60) demonstrate incremental effect:

Base encoder–decoder: PSNR 18.473/SSIM 0.832/UIQM 2.872/UCIQE 0.377.
- Frequency: substantial gains in PSNR, SSIM, UIQM, UCIQE.
- Illumination: comparably robust improvement.
Full model: best performance (PSNR 21.257, SSIM 0.884, UIQM 3.250, UCIQE 0.397).

Both frequency and illumination branches yield complementary improvements, enhancing detail/texture and color/exposure respectively (Ali et al., 5 Dec 2025).

5. The DeepSea Dataset and Real-World Evaluation

DeepSea is a high-resolution, real-world underwater dataset introduced to test deep-sea performance. It comprises 1,533 frames (downsampled from 6K video to 1920×1080) captured at three Mediterranean locations (Strait of Sicily: 138–760 m; Off Bari: ≈470 m; Off Oristano: 108–258 m) using a professional ROV camera system. 80 images (DeepSea-T80) were selected, characterized by challenging degradations—wavelength-dependent attenuation, artificial ROV lighting with back-scatter, extreme low-light, turbidity, and authentic marine backgrounds.

AQUA-Net demonstrates strong generalization and competitive performance across multiple datasets—including UEIB-T90, UEIB-C60, EUVP-T515, RUIE-T78, and DeepSea-T80—outperforming or matching recent state-of-the-art solutions such as NU2Net, TACL, CCL-Net, and OUNet-JL in metrics including UIQM and SSIM.

6. Model Complexity, Efficiency, and Deployment

AQUA-Net comprises 0.333M parameters and requires 20.86 GFLOPs per $128\times128$ input, positioning it as the second most compact among eleven recent methods (UWCNN is smaller but with lower qualitative metrics). The design leverages depthwise-separable convolutions and efficient FFT-based blocks, enabling real-time inference (<30 ms/frame) on RTX-class GPUs at moderate resolutions (128×128–512×512), as well as efficient deployment on embedded platforms (e.g., NVIDIA Jetson).

Recommendations for real-world deployment include:

Use of precomputed FFT plans or cuFFT for optimal transform speed.
8-bit quantization of frequency-branch convolution weights with minimal quality loss.
Pairing with a denoiser in extremely low-light regimes to mitigate sensor noise prior to frequency processing.

A plausible implication is that such efficiency–accuracy trade-offs are critical for autonomous vehicles and real-time marine monitoring applications.

7. Qualitative Results and Comparative Benchmarks

Qualitative analyses show that:

A baseline encoder–decoder partially addresses illumination but fails to remove color cast or low-frequency haze.
Addition of the frequency branch sharpens boundaries but does not fully restore global color balance.
The full system achieves comprehensive enhancement, balancing color correction, contrast, and texture sharpness.

In direct comparison with methods such as CCL-Net and OUNet-JL, AQUA-Net effectively removes dominant green/blue hues, suppresses haze, and avoids over-saturation/artifact introduction, especially in challenging deep-sea conditions where competing methods often yield undesired color shifts.

Overall, the fusion of spatial, frequency, and illumination cues in AQUA-Net establishes it as a robust, generalizable, and computationally efficient approach to enhancing underwater imagery under both controlled and extreme real-world conditions (Ali et al., 5 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Frequency Fusion and Illumination Aware Network (AQUA-Net).

AQUA-Net: Adaptive Fusion for Underwater Imaging

1. Architectural Foundations

2. Frequency Fusion Encoder

3. Illumination-Aware Decoder

4. Training Objective and Ablation Analysis

5. The DeepSea Dataset and Real-World Evaluation

6. Model Complexity, Efficiency, and Deployment

7. Qualitative Results and Comparative Benchmarks

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AQUA-Net: Adaptive Fusion for Underwater Imaging

1. Architectural Foundations

2. Frequency Fusion Encoder

3. Illumination-Aware Decoder

4. Training Objective and Ablation Analysis

5. The DeepSea Dataset and Real-World Evaluation

6. Model Complexity, Efficiency, and Deployment

7. Qualitative Results and Comparative Benchmarks

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research