Papers
Topics
Authors
Recent
Search
2000 character limit reached

UD-SfPNet: End-to-End Underwater 3D Reconstruction

Updated 5 March 2026
  • The paper introduces UD-SfPNet, a unified framework that integrates descattering and shape-from-polarization to achieve state-of-the-art underwater 3D normal reconstruction.
  • It leverages a Polarization Parameter Network, Descattering Network, and Normal Estimation Network with Detail-Enhanced Convolutions to reduce angular error on the MuS-Polar3D dataset.
  • Experimental evaluations show enhanced PSNR, SSIM, and LPIPS metrics, demonstrating the practical capability to capture fine geometric details in turbid underwater environments.

UD-SfPNet is an end-to-end neural framework for underwater 3D surface normal reconstruction that jointly models descattering and shape-from-polarization (SfP) inference, leveraging polarization cues acquired via division-of-focal-plane (DoFP) polarimetric imaging. The architecture integrates polarization physics with advanced deep-learning modules to achieve state-of-the-art accuracy in challenging turbid underwater environments. UD-SfPNet was introduced to avoid error accumulation inherent in sequential (cascaded) descattering-then-reconstruction pipelines by enabling global optimization across both tasks, outperforming previous methods in terms of mean surface normal angular error on the MuS-Polar3D dataset (Wang et al., 1 Mar 2026).

1. Motivation and Background

Underwater optical imaging is fundamentally limited by particulate scattering (notably Mie scattering), leading to blur, contrast loss, and noise that traditional intensity- or spectrum-based dehazing methods cannot adequately resolve, particularly when backscatter and target radiance are comparable. Polarization imaging distinguishes itself by exploiting the differential polarization states of backscattered and target-reflected light—captured via a DoFP sensor at 0°, 45°, 90°, and 135°—which enables both descattering and direct extraction of geometric surface orientation cues via the degree and angle of polarization.

Typical cascaded approaches, in which descattering precedes SfP normal estimation, propagate irrecoverable errors from the first stage. By contrast, UD-SfPNet unifies both tasks in a single global optimization framework. Loss functions from both low-level (descattering) and high-level (normal estimation) objectives co-regulate the pipeline, ensuring the preservation of fine geometric information and substantially mitigating error accumulation.

2. Network Architecture

UD-SfPNet comprises three interacting modules: the Polarization Parameter Network (PPN), Descattering Network (DN), and Normal Estimation Network (NEN), with auxiliary components designed for geometric and color consistency.

  • Polarization Parameter Network (PPN):
    • Inputs: Degree of polarization (ρ\rho), angle of polarization (ϕ\phi), specular (ISI^S) and diffuse (IDI^D) image components, extracted from processed Stokes parameters.
    • Outputs: A high-dimensional ‘normal feature’ (NF) and a 64-bin global normal-orientation histogram. The PPN regularizes local predictions with a global normal prior encoded in its output.
  • Descattering Network (DN):
    • Architecture: A U-Net variant (4-level encoding/decoding with skip connections) in which all convolutions are replaced by Detail-Enhanced Convolutions (DEConv). The DN processes raw scattered polarization images IscI_{sc}, outputting descattered images IdescI_{desc}.
    • Losses: L1L_1 pixel loss, SSIM structural loss, TV regularization, and perceptual (LPIPS) loss, all masked to the target region.
  • Normal Estimation Network (NEN):
    • Inputs: NF from the PPN and IdescI_{desc} from the DN.
    • Architecture: Shared encoder, multi-head attention bottleneck, two decoder branches—one focused on polarization cues, the other embedding a Pyramid Color Embedding (PCE) module for channel–orientation consistency. All convolutions utilize DEConv for enhanced high-frequency detail preservation.
    • Output: Predicted normal map NpreN_{pre}, supervised by a cosine similarity-based angular error loss.

Information Flow and Optimization

The NF acts as a global polarization prior, guiding NEN’s local normal predictions. Descattered images and polarization-derived features are jointly processed. All loss terms are summed into a unified objective; back-propagation updates all sub-networks simultaneously, enforcing cross-stage consistency.

3. Mathematical Modeling

3.1 Underwater Scattering and Descattering

For each Stokes channel S0S_0, underwater image formation is modeled as:

S0(x,y)=T(x,y)+B(x,y)S_0(x, y) = T(x, y) + B(x, y)

with TT as the unattenuated target signal and BB as additive backscatter. The DN learns an implicit inversion of this relationship under supervision.

3.2 Polarization and Surface Geometry

Stokes parameters are computed as: S0=I0+I90,S1=I0I90,S2=I45I135S_0 = I_{0^\circ} + I_{90^\circ}, \quad S_1 = I_{0^\circ} - I_{90^\circ}, \quad S_2 = I_{45^\circ} - I_{135^\circ} Degree and angle of polarization: ρ=S12+S22S0,ϕ=12arctan(S2S1)\rho = \frac{\sqrt{S_1^2 + S_2^2}}{S_0}, \quad \phi = \frac{1}{2} \arctan \left(\frac{S_2}{S_1}\right) For specular reflection with refractive index η\eta: ρ=2sin2θcosθη2sin2θη2sin2θη2sin2θ+2sin4θ\rho = \frac{2\sin^2\theta\,\cos\theta\,\sqrt{\eta^2-\sin^2\theta}} {\eta^2-\sin^2\theta - \eta^2\sin^2\theta + 2\sin^4\theta} Intensity as a function of polarizer rotation φ\varphi: I(φ)=Imax+Imin2+ImaxImin2cos(2φ2ϕ)I(\varphi) = \frac{I_{max} + I_{min}}{2} + \frac{I_{max} - I_{min}}{2}\cos \bigl(2\varphi - 2\phi \bigr) Given (ρ,ϕ)(\rho, \phi), two ambiguous solutions exist for zenith (θ\theta) and azimuth (α\alpha) angles, yielding the local normal: n=[sinθcosα,sinθsinα,cosθ]T\mathbf{n} = [\sin \theta \cos \alpha, \sin \theta \sin \alpha, \cos \theta]^\mathsf{T}

3.3 Joint Loss Function

The total training loss is

Ltotal=λ1Lhist+λ2LL1+λ3LSSIM+λ4LTV+λ5LLPIPS+λ6Lnormal\mathcal{L}_{total} = \lambda_1 \mathcal{L}_{hist} + \lambda_2 \mathcal{L}_{L1} + \lambda_3 \mathcal{L}_{SSIM} + \lambda_4 \mathcal{L}_{TV} + \lambda_5 \mathcal{L}_{LPIPS} + \lambda_6 \mathcal{L}_{normal}

with empirical weights λ1=1.0,λ2=10.0,λ3=1.0,λ4=10.0,λ5=2.0,λ6=30.0\lambda_1=1.0, \lambda_2=10.0, \lambda_3=1.0, \lambda_4=10.0, \lambda_5=2.0, \lambda_6=30.0.

4. Implementation and Ablation

  • Dataset: MuS-Polar3D (726 samples, 80%/10%/10% train/val/test split).
  • Infrastructure: PyTorch on 4×NVIDIA A100 GPUs; 1000 training epochs; Adam optimizer, initial LR=0.001.
  • Augmentation: Random 256×256256\times256 crops (with 50%\geq50\% foreground), horizontal flipping.
  • Inference: Sliding-window tiling, overlap stitching.

Ablation on the MuS-Polar3D test set highlights the importance of each module:

Component Removed Mean MAE (°) Median MAE (°)
w/o PPN 16.72 15.94
w/o DN 15.37 15.38
w/o PPN & DN 15.56 16.09
w/o Color Embedding 15.46 15.73
w/o DEConv 23.03 22.48
Full UD-SfPNet (proposed) 15.12 15.21

The DEConv module is especially impactful on angular error.

5. Experimental Evaluation

5.1 Quantitative Metrics

  • Descattering performance: PSNR improves from 30.80 (raw) to 36.87, SSIM from 0.9569 to 0.9745, LPIPS from 0.3830 to 0.0356.
  • Surface normal estimation (Mean Angular Error, MuS-Polar3D test set):
    • DeepSfP (2020): 19.64°
    • SfP-wild (2022): 21.64°
    • TransSfP (2023): 20.54°
    • AttentionU²-Net (2025): 15.72°
    • DSINE (2024): 16.94°
    • UD-SfPNet: 15.12° (lowest)

5.2 Qualitative Results

Error heatmaps reveal that UD-SfPNet distributes errors more evenly, with suppressed errors in high-curvature and fine-detail regions compared to oversmoothing in prior methods. Reconstruction of 3D surfaces via normal integration captures detailed textures and geometry even under varying levels of water turbidity.

6. Key Insights and Applications

UD-SfPNet demonstrates that end-to-end joint modeling of polarization-based descattering and shape inference achieves superior 3D normal recovery, attributed to:

  • Polarization uniquely enables the separation of backscatter from object signal and provides robust surface normal cues.
  • Global prior (from PPN) regularizes per-pixel predictions.
  • Color embedding enforces cross-channel (RGB/orientation) geometric consistency.
  • DEConv modules enhance high-frequency detail retention, essential under scattering.

Applications extend to underwater robotics (infrastructure inspection, maintenance), marine archaeology, biological imaging (e.g., coral morphology), and environmental monitoring (seabed mapping, coral health)—any scenario demanding high-resolution 3D recovery in turbid water.

UD-SfPNet is the first framework to achieve end-to-end, physically grounded, and geometry-aware underwater 3D imaging via polarization (Wang et al., 1 Mar 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UD-SfPNet.