DichroGAN: Seafloor Color Restoration cGAN
- DichroGAN is a conditional generative adversarial network that restores true, dewatered seafloor colors by compensating for depth-dependent spectral distortions.
- The architecture integrates four specialized generators with a U-Net style encoder–decoder and a ViT-based discriminator to disentangle diffuse, specular, and transmission components.
- Quantitative evaluations demonstrate superior SSIM and PSNR compared to state-of-the-art methods, highlighting its potential for robust underwater image reconstruction.
DichroGAN is a conditional generative adversarial network (cGAN) designed for the restoration of true in-air seafloor colors from satellite imagery, effectively compensating for the severe, depth-dependent spectral distortions imposed by the water column. The methodology integrates a physically motivated underwater image formation equation with a four-generator architecture, explicitly disentangling diffuse and specular reflectance, transmission, and veiling light, to produce accurate "dewatered" (in-air) radiance estimates. DichroGAN is trained and validated on PRISMA satellite hyperspectral data and demonstrates quantitative and qualitative improvements over state-of-the-art underwater image restoration techniques (Gonzalez-Sabbagh et al., 1 Jan 2026).
1. Physical Modeling: Underwater Image Formation
DichroGAN is grounded in a physically explicit model of underwater radiative transfer, primarily using Duntley’s underwater image formation model (UIFM). For each spectral band , the observed radiance is:
where:
- : observed radiance at the sensor
- : true object radiance (just above the seafloor)
- : veiling light from water column backscatter
- : total attenuation (absorption + scattering)
- : diffuse attenuation coefficient
- : range from sensor to seafloor
- : viewing angles
Under a nadir view, this equation simplifies (per-pixel) to:
with transmission
Recovery of the "in-air" color requires inverting this mapping:
Thus, accurate estimation of and is essential to reconstruct seafloor color.
2. Network Architecture and Component Functions
DichroGAN’s architecture comprises four generators and a single discriminator, organized within a unified conditional GAN:
- (diffuse-reflectance generator): Estimates diffuse reflectance component from RGB satellite input.
- (specular-reflectance generator): Estimates specular reflectance component.
- (transmission/depth generator): Predicts per-pixel transmission map, representing attenuation due to water depth.
- (radiance-restoration generator): Synthesizes the final "dewatered" in-air RGB output, combining the preceding estimations.
All generators utilize a U-Net style encoder–decoder structure:
- Encoder: ResNet-50 pretrained on ImageNet
- Decoder: Five upsampling blocks (feature map sizes: [256, 128, 64, 32, 16])
- Skip connections between encoder and decoder layers
The discriminator is a Vision Transformer (ViT)-based patch-level classifier that evaluates (input, output) pairs for adversarial training.
The comprehensive workflow is as follows:
- and take the same RGB+mask inputs—each outputs and (diffuse/specular).
- outputs the transmission/depth map .
- receives the sum and transmission (plus a Grey-World estimate for veiling) and outputs the dewatered RGB .
3. Loss Functions and Joint Objective
DichroGAN’s training strategy combines adversarial and physically informed losses:
- Adversarial loss (cGAN):
- Dichromatic model-based reflectance decomposition:
- Diffuse:
- Specular:
- Full radiance recon.: (water-masked)
- Transmission/depth regularization:
- transmission:
- Scale-invariant smoothness:
- Radiance-restoration loss: penalty on in-air RGB estimate, masked to water pixels:
- UIFM consistency ("pseudo-reprojection"):
These loss terms are combined as:
Losses involving physical or radiometric quantities are only applied within water-masked regions to avoid bias from land/cloud pixels.
4. Dataset Construction and Preprocessing
DichroGAN is trained on data derived from PRISMA Level-2 VNIR hyperspectral cubes, which provide:
- 63 bands spanning 400–1010 nm at 30 m GSD.
- RGB synthesis: bands 33 (R), 45 (G), 56 (B).
- Binary water-vs.-nonwater masks via automatic thresholding on NIR bands.
The training corpus includes:
- 1,570 unique RGB scenes with all 63 spectral bands (98,000 image slices).
- All images (input/output) normalized to and resized to .
- Histogram stretch applied to diffuse/specular outputs for improved dynamic range.
- No explicit geometric or photometric augmentation apart from random seed specification.
5. Quantitative Results and Comparative Analysis
DichroGAN’s performance is ascertained through detailed ablations and benchmarks, summarized as follows:
| Model | SSIM | PSNR (dB) |
|---|---|---|
| cGANs-baseline | 0.593 | 17.75 |
| cWGAN-2 | 0.524 | 17.96 |
| cGANs-G_t | 0.582 | 17.78 |
| cGAN-VGG | 0.489 | 16.34 |
| DichroGAN (proposed) | 0.672 | 18.01 |
On full-reference NASA EO data, DichroGAN achieves the highest SSIM and PSNR scores among tested architectures.
A comparable pattern holds in benchmark comparisons with classical and modern underwater restoration methods (UDCP, CWR, NU²Net, Phaseformer):
- On NASA EO: DichroGAN achieves SSIM 0.560, PSNR 14.39 dB (highest PSNR).
- On combined PRISMA + NASA EO (no-reference): CCF 18.84, UIQM 2.342, NIQE 5.422 (2nd on CCF/NIQE across methods).
- On HICRD & UIEB (underwater benchmarks): best NIQE, competitive CCF/UIQM.
Qualitative analysis shows DichroGAN restores terrain detail and color without over-enhancement or color cast, in contrast to existing methods that often introduce artifacts or fail to remove water silhouettes.
6. Implementation Details and Limitations
- Framework: PyTorch, running on AMD EPYC 7402P CPU with 60 GB RAM.
- Hyperparameters: Batch size 6, epochs 130, learning rate , Adam optimizer with .
- Initialization: Generator encoders from ResNet-50 pretrained on ImageNet, with and warm-started from a scene-depth model [González-Sabbagh et al., 2025].
- Input/output sizes: All nets work on images (except output ).
- Masking: All key losses are masked to water pixels.
- Limitations: Tendency for slight blurriness (loss of high-frequency detail), somewhat lower performance on metrics biased toward over-enhancement. Future work includes scaling dataset size, multi-term perceptual/texture losses, and evolving to higher-resolution architectures.
7. Significance, Outlook, and Generalization
DichroGAN enables explicit, physically grounded restoration of in-air radiance from satellite images of the seafloor, integrating the dichromatic reflection model and Duntley’s UIFM into a unified deep generative framework. This architecture achieves state-of-the-art restoration across reference and non-reference benchmarks. A plausible implication is that the explicit disentanglement of reflectance, transmission, and veiling light within a deep learning model is critical to robust underwater image reconstruction, and similar frameworks may be extensible to other remote sensing or atmospheric correction domains (Gonzalez-Sabbagh et al., 1 Jan 2026).