GeoDiff-LF: Underwater 4-D LF Enhancement

Updated 5 February 2026

The paper introduces GeoDiff-LF, a diffusion-based framework that incorporates a global geometry-aware regularizer to enhance underwater 4-D light field imaging.
It leverages convolutional and EPIT-adapters to enforce both local spatial–angular patterns and global geometric consistency, achieving state-of-the-art PSNR and visual performance.
Efficient inference is realized through a noise-map predictor and few-step reverse diffusion, making high-resolution underwater LF enhancement feasible in real-world applications.

GeoDiff-LF is a diffusion-based framework designed for enhancing underwater 4-D light field (LF) imaging by integrating global geometry-aware regularization into a latent diffusion process. By leveraging 4-D spatial–angular LF structure and a frozen pre-trained SD-Turbo diffusion backbone, the system introduces learnable adapters and a global tensor regularizer, enabling robust color correction and geometric consistency in challenging underwater scenes. The network achieves state-of-the-art quantitative and visual results with efficient inference, advancing the application of deep generative models for high-fidelity underwater LF enhancement (Lin et al., 29 Jan 2026).

1. Architecture and Design Principles

GeoDiff-LF builds upon the latent U-Net architecture of SD-Turbo, retaining its pre-trained weights (denoted $\theta_0$ ) and augmenting both its convolutional and attention layers with lightweight, learnable adapters (parameters $\theta_1$ ). The backbone processes 4-D LF patches $\mathbf{X}_\tau \in \mathbb{R}^{1 \times 7 \times 7 \times h \times w \times c}$ , representing batch size, angular dimensions, spatial resolution, and channel depth. All $(u \cdot v)$ angular views are reshaped into a pseudo-batch and encoded jointly.

Core adapter modules enable explicit 4-D modeling:

Convolutional Adapters: After each frozen ResNet block, a Conv-Adapter applies low-dimensional, multi-pattern 3D convolutions over spatial $(h_d \times w_d)$ , angular $(u \times v)$ , horizontal and vertical EPI slices. The resulting bottleneck captures fine-grained spatial–angular redundancies efficiently.
EPIT-Attention Adapters: Conventional self-attention is limited to intra-view correlations. The EPIT (Epipolar Plane Image Transformer) Adapter reshapes features to encode long-range angular dependencies. Horizontal and vertical EPI “bands” are tokenized, processed with small transformer blocks, and fused to the residual stream, propagating cross-view epipolar consistency throughout all U-Net layers.
Geometric Cue Injection: Conv- and EPIT-Adapters are interspersed at every encoder/decoder resolution, ensuring that both local spatial–angular patterns and global LF geometry inform all levels of representation. The frozen low-level prior from SD-Turbo is thus steered towards solutions that conform to 4-D LF constraints by the learned adapters.

2. Geometry-Guided Loss and Tensor Regularization

To regularize geometric fidelity across the LF, GeoDiff-LF introduces a global tensor-based regularizer that operates on intermediate reconstructions $\mathbf{X}_{\tau-1}$ . The process is as follows:

Block-Matching and Tucker Decomposition: The reconstructed and ground-truth LF patches are divided into non-overlapping 3-D blocks, which are grouped via block matching based on similarity. Each group is aggregated into a tensor and factorized by Tucker decomposition, yielding a core tensor $\mathbf{G}$ and associated matrices for each group.
Core-Tensor $L_1$ Penalty: The regularization term penalizes the $L_1$ norm of the difference between corresponding core tensors of predicted and reference blocks, enforcing structural coherence at a global level:

$\ell_s(V_{\tau-1}^{n_j}, V_0^{n_j}) = \|\mathbf{G}_{\tau-1}^j - \mathbf{G}_0^j\|_1.$

Progressive Weighting: The regularization is weighted by a schedule $\rho_t = \bar{\alpha}_t^2$ , where $\bar{\alpha}_t$ reflects variance in the diffusion process, decreasing the geometric penalty as denoising advances:

$\mathcal{L}_s^t = \sum_{j=1}^k \rho_t \|\mathbf{G}_{t-1}^j - \mathbf{G}_0^j\|_1.$

Combined Loss: The total training objective is the sum of the standard $L_1$ reconstruction loss and the tensor geometry term:

$\text{Loss} = \|\mathbf{X}_0 - \hat{\mathbf{X}}\|_1 + \lambda \mathcal{L}_s^t,$

where $\lambda=1$ in practice.

Enforcing this global geometry constraint is empirically more effective than pixel-level or SSIM-based regularization for LF enhancement, as confirmed by ablation results.

3. Noise-Map Prediction and Fast Sampling

GeoDiff-LF accelerates the conventional, slow diffusion process using two mechanisms:

Noise-Map Predictor: Instead of always sampling the forward diffusion from Gaussian noise at $X_T$ , a dedicated noise-predictor $f_w$ is trained to estimate the noise distribution in the input underwater observation $\mathbf{Y}_0$ . The initial latent at timestep $\tau$ is computed as:

$\mathbf{X}_\tau = \sqrt{\bar{\alpha}_\tau}\,\mathbf{Y}_0 + \sqrt{1 - \bar{\alpha}_\tau}\,f_w(\mathbf{Y}_0, \tau).$

Few-Step Reverse Chain: The denoising schedule is limited to a small set of reversion timesteps $\mathcal{S} = \{500, 400, 300, 200, 100\}$ , meaning only five denoising steps are performed at inference. The model learns to denoise over large intervals, yielding efficient enhancement with minimal performance degradation:

$\mathbf{X}_{\tau_{i-1}} = g_\theta(\mathbf{X}_{\tau_i}, \mathbf{Y}_0, \tau_i) + \sigma_{\tau_i}\mathbf{Z}_{\tau_i},$

where $g_\theta$ is defined via the learned noise estimate.

This design maintains fidelity while reducing inference time to ~2.8 seconds per LF on modern GPUs.

4. Integration of LF Geometry in Diffusion Priors

By freezing the diffusion backbone (SD-Turbo) and inserting adapters solely at convolutional and attention junctures, GeoDiff-LF preserves the advantages of large-scale 2-D image priors. The adapters act as geometric “overlays,” enforcing spatial–angular and epipolar constraints at every U-Net level. The result is a generative prior specifically attuned to reconstructing underwater light fields that are both visually realistic and geometrically plausible.

This architecture addresses the unique challenges of underwater LF data: non-uniform color casts, scattering, and local texture degradation are mitigated by enforcing higher-order geometric structure not achievable with conventional 2-D or vanilla 4-D CNN approaches.

5. Training Protocol and Implementation

GeoDiff-LF is trained on the LFUB dataset, encompassing 60 train and 15 test scenes at $7 \times 7$ angular and $640 \times 360$ spatial resolution. The training regime includes:

Data Augmentation: Random $128 \times 128$ spatial crops, normalization to $[-1,1]$ , random flips, and 90-degree rotations.
Optimizer and Learning Rate: Adam optimizer with $(\beta_1 = 0.9, \beta_2 = 0.999)$ , learning rate $2 \times 10^{-4}$ , over 2 million iterations.
Batching: Batch size 1 for high-dimensional LF patches.
Update Protocol: Only parameters of the adapters ( $\theta_1$ ) and noise predictor ( $w$ ) are updated; SD-Turbo weights ( $\theta_0$ ) remain fixed.

During training, for each $(\mathbf{X}_0, \mathbf{Y}_0)$ pair, a random diffusion timestep $\tau \in \{200, 300, 400, 500\}$ is chosen, and the loss is backpropagated to update only the lightweight modules.

6. Quantitative Performance and Ablation

GeoDiff-LF demonstrates strong results on both synthetic and real underwater light field datasets:

Table: LFUB and LFUID Results

Metric	GeoDiff-LF (LFUB)	LFUB [Lin et al. 2025]	GeoDiff-LF (LFUID)
PSNR (dB)	23.67	22.51	-
SSIM	0.8711	0.8680	-
LPIPS	0.2361	0.2535	-
ΔE	11.71	13.51	-
UIQM	-	-	6.038
BRISQUE	-	-	20.871
NIMA	-	-	4.692

Ablation studies show:

Both Conv- and EPIT-Adapters are required for optimal performance; removing either degrades PSNR by ~1.0 dB.
The tensor-based geometry loss outperforms simple pixel-level or SSIM-based objectives by a similar margin.
Applying the geometry regularizer on $\mathbf{X}_{\tau-1}$ gives superior results compared to applying it on $\mathbf{X}_\tau$ .
Depth maps inferred from enhanced LFs yield improved occlusion boundaries and planar smoothness.

Inference speed is practical for real-world use, with ongoing efforts to further reduce the number of required sampling steps (Lin et al., 29 Jan 2026).

7. Significance, Generalization, and Future Directions

GeoDiff-LF advances the state-of-the-art for underwater LF enhancement by demonstrating that explicit, modular integration of LF geometry into a generative diffusion prior substantially improves both visual and geometric quality. The efficient few-step sampling and parameter-efficient adapter design make it suitable for high-resolution, high-dimensional LF data, as required in underwater robotics, marine biology, and computational photography.

Depth recovery, color correction, and texture preservation are achieved simultaneously, and evaluation on both synthetic and real datasets confirms strong generalization. The codebase will be made available for reproducibility and extension.

A plausible implication is that the adapter/regularizer paradigm could generalize to other 4-D LF applications, including LF synthesis, denoising, or editing, potentially in domains outside underwater imaging. Further acceleration of diffusion inference (toward 1–2 steps) is proposed without loss of quality, and additional geometric cues or learned priors could further improve performance (Lin et al., 29 Jan 2026).

Markdown Upgrade to Chat

References (1)

Enhancing Underwater Light Field Images via Global Geometry-aware Diffusion Process (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GeoDiff-LF.