AquaDiff: Diffusion Models in Image & Flow Simulation
- AquaDiff is a dual-purpose framework that applies diffusion principles to both underwater image enhancement and multiphase flow simulation.
- It employs a conditional generative denoising diffusion process with an advanced U-Net architecture to restore natural colors and contrasts in aquatic visuals.
- In fluid dynamics, AquaDiff uses a diffuse interface (CHNS-type) method with adaptive, energy-stable schemes to accurately model air–sea interfaces.
AquaDiff denotes two distinct but high-impact methodological frameworks in computational science: (i) a class of diffusion-based models for underwater image enhancement that address chromatic and perceptual degradations in aquatic visual data, and (ii) a family of diffuse interface (CHNS-type) algorithms for simulating air–water interfaces in geophysical and environmental fluid dynamics. Both share nomenclature but pertain to different domains—image restoration and multiphase flow modeling—though both leverage diffusion principles at their core.
1. Diffusion-Based Underwater Image Enhancement
Physical Degradation Model
AquaDiff for image enhancement is founded on the Jaffe–McGlamery degradation model, characterizing underwater image formation with per-channel exponential absorption:
where
- : observed underwater intensity at pixel and channel ,
- : scene radiance,
- : background light,
- : wavelength-dependent attenuation,
- : scene distance per pixel.
The restoration objective is estimation of reflecting natural color, contrast, and detail (Shaahid et al., 15 Dec 2025).
Diffusion Process for Restoration
AquaDiff frames enhancement as a conditional generative denoising diffusion process. The forward process gradually corrupts clean reference with noise to generate a Markov chain :
In reverse, a neural network (a U-Net backbone) learns to iteratively denoise , conditioned on a pre-processed input through
where is parametrized to predict the mean from 's output.
2. Chromatic Prior–Guided Color Compensation
Color distortion in underwater environments is addressed via a physics-guided compensation prior in the CIE Lab color space. Chroma channels , are adaptively de-biased as:
with a mask thresholding high-luminance regions and a Gaussian blur; . Elementwise compensation is compactly written as:
where , targeting compensation where needed and avoiding overcorrection in high-brightness regions (Shaahid et al., 15 Dec 2025).
3. Conditional Diffusion Neural Architecture
The denoising backbone underlying AquaDiff is an enhanced U-Net, featuring:
- Channel multipliers on a 64-dim base.
- Residual dense blocks to optimize feature reuse.
- Dense skip connections (U-Net++ paradigm) for enriched multi-scale information flow.
- Multi-resolution self-attention at and feature resolutions to capture nonlocal chromatic relationships.
Key to the model is the cross-attention fusion mechanism, where at each time step, degraded input and current noisy state interact via:
with , , being trainable projections. This enables the model to dynamically guide denoising using color-compensated cues (Shaahid et al., 15 Dec 2025).
4. Cross-Domain Consistency Loss
The training objective of AquaDiff is a composite loss, enforcing:
- Pixel-wise fidelity: norm on (, )
- Multi-scale structure: sum of pixel losses at and resolutions
- Perceptual similarity: VGG-19 feature differences, weighted empirically ($1:0.5:0.1$) across layers 2, 7, 16
- Structural similarity:
- Frequency-domain consistency: between FFT magnitudes of output and target
Aggregated as:
This enforces alignment across pixel, perceptual, structural, and spectral domains, suppressing color artifacts and over-smoothing typical of diffusion models (Shaahid et al., 15 Dec 2025).
5. Empirical Evaluation and Benchmarks
AquaDiff is trained with:
- LSUI (5004 pairs), UIEB (800 train/90 val)
- Patch size , 2000 diffusion steps, Adam optimizer with learning rate , batch size 1.
Quantitative performance is reported on full- and no-reference benchmarks (PSNR, SSIM, UIQM, UCIQE) across U45, S16, C60, TEST-U90 datasets. Notably, AquaDiff attains the highest UCIQE (colorfulness and contrast) on all non-reference sets and competitive PSNR/SSIM. Ablation demonstrates cumulative gains from the chromatic prior, enhanced U-Net, and CDC loss:
| Method | U45 UIQM/UCIQE | S16 UIQM/UCIQE | C60 UIQM/UCIQE | TEST-U90 PSNR/SSIM |
|---|---|---|---|---|
| UDCP | 3.30/0.455 | 1.49/0.443 | 2.73/0.393 | 11.38/0.516 |
| Water-Net | 4.86/0.450 | 3.50/0.431 | 4.45/0.442 | 19.92/0.833 |
| Ucolor | 4.95/0.446 | 3.58/0.419 | 4.33/0.385 | 21.00/0.869 |
| DiffWater | 4.73/0.462 | 4.52/0.450 | 4.66/0.434 | 20.97/0.895 |
| AquaDiff | 4.61/0.539 | 4.44/0.524 | 4.32/0.518 | 20.25/0.883 |
Qualitative analysis confirms strong suppression of blue, green-yellow, and red casts and preservation of texture under varying illumination (Shaahid et al., 15 Dec 2025).
6. Diffuse-Interface Framework in Atmosphere–Ocean Systems
AquaDiff also denotes a diffuse interface methodology for air–sea interactions, modeled by variable-density Cahn–Hilliard–Navier–Stokes (CHNS) systems:
Governing Equations:
- Momentum (Navier–Stokes):
- Incompressibility:
- Phase-field advection–diffusion:
- Chemical potential:
Adaptive, energy-stable schemes with a posteriori error estimation and dynamic mesh refinement are implemented to resolve sharp density and velocity gradients at wind-driven interfaces (Garcke et al., 2016).
7. Algorithmic, Implementation, and Applications Context
For image enhancement, AquaDiff is implemented as a deep learning pipeline with U-Net backbone, requiring no augmentation beyond cropping/flip. For geophysical flows, the finite-element implementation leverages iFEM mesh management, semi-smooth Newton linearization, Krylov/preconditioned linear solvers, and adaptive mesh refinement. Example simulations include adaptive wind–wave generation resolving up to 25,000 DoFs and capturing dynamic topological changes at air–sea boundaries (Garcke et al., 2016).
In both domains, "AquaDiff" frameworks leverage diffusion processes—either as the factorial mechanism for denoising and structural restoration (vision) or as a mesoscopic approximation to multiphase interface physics (fluid dynamics). The significance of both approaches lies in state-of-the-art restoration of underwater imagery and in efficient, accurate modeling of dynamic interfacial flows, respectively.