AQUA-Net: Adaptive Fusion for Underwater Imaging
- The paper introduces AQUA-Net, which fuses frequency and illumination cues to correct color casts, haze, and low contrast in underwater images.
- Its methodology employs a three-level residual encoder–decoder with auxiliary FFT and Retinex-based branches to achieve high-frequency texture recovery and adaptive exposure control.
- Experimental evaluations demonstrate significant boosts in metrics such as PSNR, SSIM, UIQM, and UCIQE, all while maintaining real-time efficiency and low computational complexity.
Adaptive Frequency Fusion and Illumination Aware Network (AQUA-Net) is a deep learning architecture for underwater image enhancement that integrates frequency- and illumination-domain processing with a lightweight residual encoder–decoder backbone. AQUA-Net is designed to address core challenges in underwater sensing: severe color distortion, low contrast, and haze induced by wavelength-dependent light absorption and scattering. By fusing spatial, frequency, and illumination cues, it restores perceptual realism and color balance while maintaining computational efficiency suitable for real-time and embedded deployment (Ali et al., 5 Dec 2025).
1. Architectural Foundations
AQUA-Net comprises a three-level encoder–decoder constructed from Residual Enhancement Modules (REMs). Each REM employs depthwise-separable convolutions, followed by a pointwise convolution and Leaky ReLU activation; a skip (residual) connection preserves low-level details. The encoder progressively subsamples the spatial resolution (×2, ×4, ×8), doubling feature channels at each level; the decoder symmetrically upsamples and reconstructs the image.
The central network is augmented by two auxiliary branches operating directly on raw image input:
- The frequency-fusion branch performs a 2D Fourier transform (FFT) on each RGB channel, normalizes and adaptively modulates spectral magnitudes, reconstructs an enhanced image by inverse FFT, and computes a high-frequency correction map fused into the encoder’s base feature map.
- The illumination-aware branch estimates a spatially varying illumination map using principles from Retinex theory; this map is injected into each decoder stage to achieve adaptive exposure correction.
The following schematic expresses the data flow in AQUA-Net:
| Input | Auxiliary Branches | Fusion Operation | Encoder–Decoder Path |
|---|---|---|---|
| I | Frequency: R_f <br> Illumination: L(x) | Fuse: X₀ = φ₀(I) + φ_p(R_f) | Encoder: REM₁→↓→REM₂→↓→REM₃ <br> Decoder: ↑→REM₃→skip⊙L→REM₂→skip⊙L→REM₁→Reconstruct |
This design enables correction of color casts, haze suppression, fine texture recovery, and adaptive exposure control across a range of underwater scenarios.
2. Frequency Fusion Encoder
The frequency-fusion branch analyses and enhances input frequency information. Let denote the input batch; for each channel , the 2D Fourier transform is
Decompose ; the magnitude is normalized by its mean, and a two-layer CNN predicts an adaptive modulation map . The enhanced spectrum is (with a learned scalar). Inverse FFT reconstructs the spatial-domain correction, and subtracting yields the high-frequency map:
is projected via a convolution and added to the initial encoder feature. This branch facilitates recovery of high-frequency textures and structural cues attenuated by water scattering.
3. Illumination-Aware Decoder
Illumination manipulation in AQUA-Net is grounded in Retinex theory, which models pixel intensity as where denotes reflectance and denotes illumination. The illumination map is learned by a CNN that outputs coefficient maps . The final map is
where ensures scale , and allows localized exposure control. During decoding, skip-connection features are modulated by the resized for each decoder stage:
where upsamples decoder features and is the corresponding encoder feature. This mechanism adaptively corrects non-uniform and low-light exposures characteristic of underwater environments. The restored output is:
4. Training Objective and Ablation Analysis
AQUA-Net is supervised end-to-end via an reconstruction loss:
No explicit perceptual or frequency-consistency losses are used beyond the frequency-branch operations.
Ablation studies on standard benchmarks (UEIB-T90, UIEB-C60) demonstrate incremental effect:
- Base encoder–decoder: PSNR 18.473/SSIM 0.832/UIQM 2.872/UCIQE 0.377.
- Frequency: substantial gains in PSNR, SSIM, UIQM, UCIQE.
- Illumination: comparably robust improvement.
- Full model: best performance (PSNR 21.257, SSIM 0.884, UIQM 3.250, UCIQE 0.397).
Both frequency and illumination branches yield complementary improvements, enhancing detail/texture and color/exposure respectively (Ali et al., 5 Dec 2025).
5. The DeepSea Dataset and Real-World Evaluation
DeepSea is a high-resolution, real-world underwater dataset introduced to test deep-sea performance. It comprises 1,533 frames (downsampled from 6K video to 1920×1080) captured at three Mediterranean locations (Strait of Sicily: 138–760 m; Off Bari: ≈470 m; Off Oristano: 108–258 m) using a professional ROV camera system. 80 images (DeepSea-T80) were selected, characterized by challenging degradations—wavelength-dependent attenuation, artificial ROV lighting with back-scatter, extreme low-light, turbidity, and authentic marine backgrounds.
AQUA-Net demonstrates strong generalization and competitive performance across multiple datasets—including UEIB-T90, UEIB-C60, EUVP-T515, RUIE-T78, and DeepSea-T80—outperforming or matching recent state-of-the-art solutions such as NU2Net, TACL, CCL-Net, and OUNet-JL in metrics including UIQM and SSIM.
6. Model Complexity, Efficiency, and Deployment
AQUA-Net comprises 0.333M parameters and requires 20.86 GFLOPs per input, positioning it as the second most compact among eleven recent methods (UWCNN is smaller but with lower qualitative metrics). The design leverages depthwise-separable convolutions and efficient FFT-based blocks, enabling real-time inference (<30 ms/frame) on RTX-class GPUs at moderate resolutions (128×128–512×512), as well as efficient deployment on embedded platforms (e.g., NVIDIA Jetson).
Recommendations for real-world deployment include:
- Use of precomputed FFT plans or cuFFT for optimal transform speed.
- 8-bit quantization of frequency-branch convolution weights with minimal quality loss.
- Pairing with a denoiser in extremely low-light regimes to mitigate sensor noise prior to frequency processing.
A plausible implication is that such efficiency–accuracy trade-offs are critical for autonomous vehicles and real-time marine monitoring applications.
7. Qualitative Results and Comparative Benchmarks
Qualitative analyses show that:
- A baseline encoder–decoder partially addresses illumination but fails to remove color cast or low-frequency haze.
- Addition of the frequency branch sharpens boundaries but does not fully restore global color balance.
- The full system achieves comprehensive enhancement, balancing color correction, contrast, and texture sharpness.
In direct comparison with methods such as CCL-Net and OUNet-JL, AQUA-Net effectively removes dominant green/blue hues, suppresses haze, and avoids over-saturation/artifact introduction, especially in challenging deep-sea conditions where competing methods often yield undesired color shifts.
Overall, the fusion of spatial, frequency, and illumination cues in AQUA-Net establishes it as a robust, generalizable, and computationally efficient approach to enhancing underwater imagery under both controlled and extreme real-world conditions (Ali et al., 5 Dec 2025).