Sea-Undistort Synthetic Dataset

Updated 6 February 2026

Sea-Undistort synthetic datasets are based on physically-driven models that simulate underwater light absorption, scattering, and marine phenomena.
They are constructed using both physics-based simulation and data-driven domain translation to produce paired degraded and clean reference images.
These datasets enable robust benchmarking of restoration algorithms with metrics like PSNR, SSIM, UCIQE, and other underwater quality indices.

Sea-Undistort Synthetic Dataset refers to a class of datasets generated for supervised training and benchmarking of image restoration, enhancement, and undistortion algorithms in underwater imaging scenarios. These datasets are constructed by rigorously modeling underwater optical phenomena—such as wavelength-dependent absorption, scattering, backscatter, forward-scatter, and physically realistic degradations observed in aquatic settings—and pairing degraded views with ground-truth references. Sea-Undistort datasets address the fundamental limitation in underwater vision research: the infeasibility of acquiring authentic, high-fidelity reference (clean) images paired with real-world underwater scenes. As a result, these synthetic datasets underpin advancements in underwater image processing via authentic supervision signals and enable robust learning and evaluation across diverse degradation types, depth regimes, and aquatic environments (Tian et al., 18 Nov 2025, Kromer et al., 11 Aug 2025, Ismiroglou et al., 18 Sep 2025).

1. Physical and Computational Foundations

Sea-Undistort Synthetic Datasets employ physically-motivated underwater image formation models as the basis for image degradation synthesis. The canonical model is the Jaffe-McGlamery equation, capturing both direct transmission and additive components:

$I_c(x) = J_c(x) \cdot t_c(x) + B_c^{\infty} \cdot (1 - t_c(x)) + F_c(x)$

where $I_c(x)$ is the observed underwater radiance at pixel $x$ (channel $c$ ), $J_c(x)$ is clear scene radiance, $t_c(x) = \exp(-\beta_c d(x))$ is the transmission with beam attenuation coefficient $\beta_c$ and scene depth $d(x)$ , $B_c^{\infty}$ is the global backscattering term, and $F_c(x)$ optionally models forward scattering via depth-adaptive blurring (Ismiroglou et al., 18 Sep 2025, Jain et al., 2022). Parameterizations are sampled according to Jerlov water types, depth distributions, and turbidity, effectively spanning the full range of natural aquatic optical conditions.

Distortions such as marine snow, sun-glint, volumetric inhomogeneity, and dynamic waves are incorporated in specific datasets by overlay models (e.g., convolutional particle scatter in PHISWID (Kaneko et al., 2024)) or via path-traced rendering and procedural geometry (e.g., surface waves and sunglint in Blender, as in (Kromer et al., 11 Aug 2025)).

2. Dataset Construction and Variants

Dataset construction pipelines begin from high-color-fidelity in-air images or RGB-D pairs (sources: RAISE, ImageNet, iNaturalist-12K, NYU-DepthV2, TartanAir) (Tian et al., 18 Nov 2025, Kaneko et al., 2024). Underwater degradation is synthesized through two principal strategies:

Physics-based simulation: Direct application of analytic/empirical models with randomly sampled water parameters (attenuation coefficients, backscatter, ambient veiling light) and depth. Datasets such as PHISWID and UWStereo employ this approach for both color attenuation and additive artifacts (Kaneko et al., 2024, Lv et al., 2024).
Data-driven domain translation: Unpaired image-to-image translation networks (e.g., CycleGAN-Turbo with a diffusion-based generator and CLIP-based discriminator (Tian et al., 18 Nov 2025), or MUNIT architectures (Jain et al., 2022)) learn to convert in-air or synthetic underwater images into the target underwater domain by leveraging adversarial, cycle-consistency, and identity losses. This method enables the synthesis of visually indistinguishable underwater styles not easily described by analytic models.

Degraded images are always paired with clean references, and in some datasets, depth maps and scene metadata supplement the image pairs. Releases typically provide balanced coverage across degradation types, as in UWNature/UWImgNetSD (10,000 pairs, six styles) (Tian et al., 18 Nov 2025), PHISWID (2,264 pairs, five Jerlov water types, with/without marine snow) (Kaneko et al., 2024), or Sea-Undistort for airborne mapping (1,002 pairs, sun-glint/waves/scattering) (Kromer et al., 11 Aug 2025).

3. Optical Degradation Types and Physical Models

Representative degradation types included span:

Color casts: Shallow-water blue, deep blue, deep green, mild green (tuned channel-specific attenuation).
Low-light/turbid: Increased extinction, pronounced backscatter.
Scattering-dominated blur (blurry): Depth-dependent forward scatter (modeled as convolution with a Gaussian kernel with scale proportional to local depth) (Ismiroglou et al., 18 Sep 2025).
Marine snow: Procedurally added particulate scatter, with H (Gaussian-like) and V (edge-enhanced) particle types (Kaneko et al., 2024).
Sun glint and dynamic waves: Realized via physics-based path tracing on ocean-surface geometry, Fresnel reflection, and anisotropic phase functions (Kromer et al., 11 Aug 2025).
Nonuniform inhomogeneity: Stochastic Gaussian Random Fields multiplied with depth and medium parameters, introducing spatial nonuniformity in turbidity (Ismiroglou et al., 18 Sep 2025).

Mathematically, all models ultimately reduce to channel-wise exponential transmission of clear radiance, plus one or more additive terms (backscatter, forward scatter, marine snow, sun glint).

4. Dataset Organization and Implementation

Sea-Undistort datasets are organized according to task requirements. Common structure is:

Dataset	Pairs	Resolution	Degradations/Types	Extra Modalities
UWNature/UWImgNetSD (Tian et al., 18 Nov 2025)	10,000	256×256	Blue, Low-Light, Deep Blue, Deep Green, Green, Blurry	None
Sea-Undistort (Bathymetry) (Kromer et al., 11 Aug 2025)	1,002	512×512	Sun-glint, wave, scattering	Metadata (camera, sun, depth)
PHISWID (Kaneko et al., 2024)	2,264	384×384	Color attenuation, marine snow	Depth, water type
SUDS (Forward scatter) (Ismiroglou et al., 18 Sep 2025)	~128	1080p	Forward/backscatter, inhomogeneity	Depth, water params

Data are split into train, validation, and test sets. Metadata (camera parameters, sun position, water type, depth) are provided where applicable (Kromer et al., 11 Aug 2025, Kaneko et al., 2024).

Acquisition and use involve typical pipelines:

Download datasets from public repositories (Tian et al., 18 Nov 2025, Kromer et al., 11 Aug 2025, Kaneko et al., 2024).
Load and preprocess image pairs (with optional depth/metadata augmentation).
DataLoader code and scripts are typically provided, supporting standard transformations and batch operations.
On-the-fly augmentations: flipping, color jitter, geometric noise.
For early-fusion or mask-aware restoration, per-pixel auxiliary masks (e.g., sun-glint) can be included (Kromer et al., 11 Aug 2025).

5. Benchmarking and Comparative Evaluation

Sea-Undistort datasets are used to benchmark a diverse set of restoration, enhancement, and mapping architectures. Representative approaches include:

Color restoration and enhancement: Supervised training with pixel-wise, perceptual, and adversarial losses to regress from degraded to reference images (Tian et al., 18 Nov 2025, Jain et al., 2022, Kaneko et al., 2024). Typical objectives include $\mathcal{L}_{total} = \alpha\| \hat{J} - J \|_1 + \beta\mathcal{L}_{VGG}(\hat{J}, J)+ \gamma\mathcal{L}_{GAN}(\hat{J})$ .
Paired and unpaired translation: MUNIT and CycleGAN-based frameworks minimize adversarial, cycle-consistency, and identity losses to achieve domain adaptation (Tian et al., 18 Nov 2025, Jain et al., 2022).
Physical model supervision: Training on synthetic data with ground-truth transmission/backscatter for loss components beyond pixel intensities (e.g., $\| \hat{c} - c \|_1$ , $\| \hat{B} - B \|_1$ ) (Lv et al., 2024).
Bathymetric mapping and DSM retrieval: Diffusion-based restoration frameworks (e.g., ResShift+EF) evaluated on downstream digital surface model and real aerial data (Kromer et al., 11 Aug 2025).

Performance assessment employs both full-reference and no-reference metrics. Full-reference: PSNR, SSIM; no-reference: UCIQE, UIQM. In comparative studies, models trained on Sea-Undistort/UWNature data achieved higher UCIQE (e.g., UWCNN on U45: 0.5612 for UWImgNetSD vs 0.5514 for UIEB) and more favorable artifact and color reconstruction against competing methods (Tian et al., 18 Nov 2025, Kaneko et al., 2024).

6. Usage Guidelines and Practical Considerations

Practical integration into research pipelines follows established procedures:

Direct download of image/reference pairs and optional metadata.
Training with recommended supervised losses (L $_1$ , L $_2$ , perceptual, adversarial).
Augmentation with on-the-fly geometric, photometric, or mask-based transforms.
Evaluation on standard underwater test sets (U45, RUIE, LSUI/EUVP for color restoration; BUCKET for turbid environments; Agia Napa bathymetry for mapping).
For best generalization, employ a mixture of degradation types/water conditions, randomized parameterizations, and multi-domain adaptation steps.

Reproducibility is facilitated via public repositories, code for dataset synthesis and augmentation, and detailed configuration files specifying all model, water, and rendering parameters (Tian et al., 18 Nov 2025, Kaneko et al., 2024, Kromer et al., 11 Aug 2025).

7. Impact and Limitations

Sea-Undistort Synthetic Datasets have significantly advanced the ability to evaluate, train, and generalize underwater image restoration and enhancement models in the absence of real paired reference data (Tian et al., 18 Nov 2025). Their comprehensive coverage of degradation types, parameterizations spanning natural waters, and extensibility to complex scene-level phenomena (marine snow, dynamic surface, glint) provide a principled foundation for objective benchmarking. Quantitative studies demonstrate that models trained on these datasets attain comparable or superior performance versus those trained on classical or heuristic benchmarks.

Limitations persist in the stylization of select artifacts (e.g., oversimplified marine snow (Kaneko et al., 2024)), finite coverage of extreme or mixed turbidity, and the inherent domain gap introduced by synthetic generation, even when mitigated through domain adaptation. Physical models rely on accurate depth information, which in practice may require further refinement. Continued efforts focus on increased realism, expanded scene variation, and greater public release scale to match emerging demands in underwater robotics, mapping, and environmental monitoring.