Papers
Topics
Authors
Recent
2000 character limit reached

DichroGAN: Seafloor Color Restoration cGAN

Updated 8 January 2026
  • DichroGAN is a conditional generative adversarial network that restores true, dewatered seafloor colors by compensating for depth-dependent spectral distortions.
  • The architecture integrates four specialized generators with a U-Net style encoder–decoder and a ViT-based discriminator to disentangle diffuse, specular, and transmission components.
  • Quantitative evaluations demonstrate superior SSIM and PSNR compared to state-of-the-art methods, highlighting its potential for robust underwater image reconstruction.

DichroGAN is a conditional generative adversarial network (cGAN) designed for the restoration of true in-air seafloor colors from satellite imagery, effectively compensating for the severe, depth-dependent spectral distortions imposed by the water column. The methodology integrates a physically motivated underwater image formation equation with a four-generator architecture, explicitly disentangling diffuse and specular reflectance, transmission, and veiling light, to produce accurate "dewatered" (in-air) radiance estimates. DichroGAN is trained and validated on PRISMA satellite hyperspectral data and demonstrates quantitative and qualitative improvements over state-of-the-art underwater image restoration techniques (Gonzalez-Sabbagh et al., 1 Jan 2026).

1. Physical Modeling: Underwater Image Formation

DichroGAN is grounded in a physically explicit model of underwater radiative transfer, primarily using Duntley’s underwater image formation model (UIFM). For each spectral band λi\lambda_i, the observed radiance NN is:

N(z,θ,ϕ,λi)=J(z,θ,ϕ,λi)exp[α(z,λi)r] +V(z,θ,ϕ,λi)exp[K(z,θ,ϕ,λi)rcosθ] ×{1exp[α(z,λi)r+K(z,θ,ϕ,λi)rcosθ]}\begin{aligned} N(z,\theta,\phi,\lambda_i) &= J(z,\theta,\phi,\lambda_i)\,\exp[-\alpha(z,\lambda_i)\,r] \ &+V(z,\theta,\phi,\lambda_i)\,\exp[K(z,\theta,\phi,\lambda_i)\,r\cos\theta] \ &\qquad\times\left\{1-\exp\left[-\alpha(z,\lambda_i)\,r+K(z,\theta,\phi,\lambda_i)\,r\cos\theta\right]\right\} \end{aligned}

where:

  • NN: observed radiance at the sensor
  • JJ: true object radiance (just above the seafloor)
  • VV: veiling light from water column backscatter
  • α(z,λ)=a(z,λ)+b(z,λ)\alpha(z,\lambda) = a(z,\lambda) + b(z,\lambda): total attenuation (absorption + scattering)
  • KK: diffuse attenuation coefficient
  • rr: range from sensor to seafloor
  • (θ,ϕ)(\theta, \phi): viewing angles

Under a nadir view, this equation simplifies (per-pixel) to:

N(u,λi)=J(u,λi)T(u,λi)+V(u,λi)[1T(u,λi)]\mathbf{N}(u,\lambda_i) = \mathbf{J}(u,\lambda_i)\,\mathbf{T}(u,\lambda_i) + \mathbf{V}(u,\lambda_i)\left[1-\mathbf{T}(u,\lambda_i)\right]

with transmission

T(u,λi)=exp[rα(z,λi)]\mathbf{T}(u,\lambda_i) = \exp[-r\,\alpha(z,\lambda_i)]

Recovery of the "in-air" color requires inverting this mapping:

J(u,λi)=N(u,λi)V(u,λi)T(u,λi)+V(u,λi)\mathbf{J}(u,\lambda_i) = \frac{\mathbf{N}(u,\lambda_i)-\mathbf{V}(u,\lambda_i)}{\mathbf{T}(u,\lambda_i)} + \mathbf{V}(u,\lambda_i)

Thus, accurate estimation of V(u,λi)V(u,\lambda_i) and T(u,λi)T(u,\lambda_i) is essential to reconstruct seafloor color.

2. Network Architecture and Component Functions

DichroGAN’s architecture comprises four generators and a single discriminator, organized within a unified conditional GAN:

  • GdG_d (diffuse-reflectance generator): Estimates diffuse reflectance component from RGB satellite input.
  • GsG_s (specular-reflectance generator): Estimates specular reflectance component.
  • GtG_t (transmission/depth generator): Predicts per-pixel transmission map, representing attenuation due to water depth.
  • GjG_j (radiance-restoration generator): Synthesizes the final "dewatered" in-air RGB output, combining the preceding estimations.

All generators utilize a U-Net style encoder–decoder structure:

  • Encoder: ResNet-50 pretrained on ImageNet
  • Decoder: Five upsampling blocks (feature map sizes: [256, 128, 64, 32, 16])
  • Skip connections between encoder and decoder layers

The discriminator is a Vision Transformer (ViT)-based patch-level classifier that evaluates (input, output) pairs for adversarial training.

The comprehensive workflow is as follows:

  1. GdG_d and GsG_s take the same RGB+mask inputs—each outputs n^d\hat{n}_d and n^s\hat{n}_s (diffuse/specular).
  2. GtG_t outputs the transmission/depth map t^\hat{t}.
  3. GjG_j receives the sum r^=n^d+n^s\hat{r} = \hat{n}_d + \hat{n}_s and transmission t^\hat{t} (plus a Grey-World estimate for veiling) and outputs the dewatered RGB y^\hat{y}.

3. Loss Functions and Joint Objective

DichroGAN’s training strategy combines adversarial and physically informed losses:

  • Adversarial loss (cGAN):

LcGAN=Ex,y[logD(x,y)]+Ex[log(1D(x,Gj(y^r,t^)))]\mathcal{L}_{cGAN} = \mathbb{E}_{x,y}[\log D(x,y)] + \mathbb{E}_{x}[\log(1-D(x,G_j(\hat{y}_r,\hat{t})))]

  • Dichromatic model-based reflectance decomposition:
    • Diffuse: Lgd=Lgw(λ)gS(u,λ)y^d1\mathcal{L}_{gd} = \|L_{gw}(\lambda)gS(u,\lambda)-\hat{y}_d\|_1
    • Specular: Lgs=k(u)Lgw(λ)y^s1\mathcal{L}_{gs} = \|k(u)L_{gw}(\lambda)-\hat{y}_s\|_1
    • Full radiance recon.: Lr=I(u,λ)(Gd(x)+Gs(x))2\mathcal{L}_r = \|I(u,\lambda)-(G_d(x)+G_s(x))\|_2 (water-masked)
  • Transmission/depth regularization:
    • L1L_1 transmission: Lt1=T(u,λ)y^t1\mathcal{L}_{t_1} = \|T(u,\lambda)-\hat{y}_t\|_1
    • Scale-invariant smoothness: Lt2=1nixlogy^t,i+ylogy^t,i\mathcal{L}_{t_2} = \frac{1}{n}\sum_i |\nabla_x\log\hat{y}_{t,i}| + |\nabla_y\log\hat{y}_{t,i}|
  • Radiance-restoration loss: L1L_1 penalty on in-air RGB estimate, masked to water pixels: Lgj\mathcal{L}_{gj}
  • UIFM consistency ("pseudo-reprojection"):

LN=N(u,λ)N^(u,λ)1,      N^=y^jt^+Vgw(1t^)\mathcal{L}_N = \|N(u,\lambda)-\hat{N}(u,\lambda)\|_1,\;\;\; \hat{N}=\hat{y}_j\,\hat{t}+V_{gw}(1-\hat{t})

These loss terms are combined as:

Lobj=minGd,Gs,Gt,GjmaxD  LcGAN+γ(Lgd+Lgs)+σLr+ιLgj+τ(Lt1+Lt2)+νLN with γ=30,  σ=90,  ι=100,  τ=50,  ν=10\begin{aligned} \mathcal{L}_{obj} &= \min_{G_d,G_s,G_t,G_j} \max_{D} \;\mathcal{L}_{cGAN} +\gamma\,(\mathcal{L}_{gd}+\mathcal{L}_{gs}) +\sigma\,\mathcal{L}_r +\iota\,\mathcal{L}_{gj} +\tau\,(\mathcal{L}_{t_1}+\mathcal{L}_{t_2}) +\nu\,\mathcal{L}_N\ &\text{with }\gamma=30,\;\sigma=90,\;\iota=100,\;\tau=50,\;\nu=10 \end{aligned}

Losses involving physical or radiometric quantities are only applied within water-masked regions to avoid bias from land/cloud pixels.

4. Dataset Construction and Preprocessing

DichroGAN is trained on data derived from PRISMA Level-2 VNIR hyperspectral cubes, which provide:

  • 63 bands spanning 400–1010 nm at 30 m GSD.
  • RGB synthesis: bands 33 (R), 45 (G), 56 (B).
  • Binary water-vs.-nonwater masks via automatic thresholding on NIR bands.

The training corpus includes:

  • 1,570 unique RGB scenes with all 63 spectral bands (\approx98,000 image slices).
  • All images (input/output) normalized to [0,1][0,1] and resized to 256×256256\times256.
  • Histogram stretch applied to diffuse/specular outputs for improved dynamic range.
  • No explicit geometric or photometric augmentation apart from random seed specification.

5. Quantitative Results and Comparative Analysis

DichroGAN’s performance is ascertained through detailed ablations and benchmarks, summarized as follows:

Model SSIM PSNR (dB)
cGANs-baseline 0.593 17.75
cWGAN-2 0.524 17.96
cGANs-G_t 0.582 17.78
cGAN-VGG 0.489 16.34
DichroGAN (proposed) 0.672 18.01

On full-reference NASA EO data, DichroGAN achieves the highest SSIM and PSNR scores among tested architectures.

A comparable pattern holds in benchmark comparisons with classical and modern underwater restoration methods (UDCP, CWR, NU²Net, Phaseformer):

  • On NASA EO: DichroGAN achieves SSIM 0.560, PSNR 14.39 dB (highest PSNR).
  • On combined PRISMA + NASA EO (no-reference): CCF 18.84, UIQM 2.342, NIQE 5.422 (2nd on CCF/NIQE across methods).
  • On HICRD & UIEB (underwater benchmarks): best NIQE, competitive CCF/UIQM.

Qualitative analysis shows DichroGAN restores terrain detail and color without over-enhancement or color cast, in contrast to existing methods that often introduce artifacts or fail to remove water silhouettes.

6. Implementation Details and Limitations

  • Framework: PyTorch, running on AMD EPYC 7402P CPU with 60 GB RAM.
  • Hyperparameters: Batch size 6, epochs 130, learning rate 2×1042 \times 10^{-4}, Adam optimizer with β1=0.5,  β2=0.999\beta_1=0.5,\;\beta_2=0.999.
  • Initialization: Generator encoders from ResNet-50 pretrained on ImageNet, with GjG_j and GtG_t warm-started from a scene-depth model [González-Sabbagh et al., 2025].
  • Input/output sizes: All nets work on 3×256×2563\times256\times256 images (except GtG_t output 1×256×2561\times256\times256).
  • Masking: All key losses are masked to water pixels.
  • Limitations: Tendency for slight blurriness (loss of high-frequency detail), somewhat lower performance on metrics biased toward over-enhancement. Future work includes scaling dataset size, multi-term perceptual/texture losses, and evolving to higher-resolution architectures.

7. Significance, Outlook, and Generalization

DichroGAN enables explicit, physically grounded restoration of in-air radiance from satellite images of the seafloor, integrating the dichromatic reflection model and Duntley’s UIFM into a unified deep generative framework. This architecture achieves state-of-the-art restoration across reference and non-reference benchmarks. A plausible implication is that the explicit disentanglement of reflectance, transmission, and veiling light within a deep learning model is critical to robust underwater image reconstruction, and similar frameworks may be extensible to other remote sensing or atmospheric correction domains (Gonzalez-Sabbagh et al., 1 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to DichroGAN.