Papers
Topics
Authors
Recent
2000 character limit reached

IPCD-Net: Intrinsic Decomposition for 3D Point Clouds

Updated 17 November 2025
  • IPCD-Net is an end-to-end deep learning framework that separates 3D point clouds into per-point albedo and shading components, addressing challenges in unstructured data.
  • It employs Point Transformer v2 for permutation-invariant feature aggregation and a Projection-based Luminance Distribution module to capture global illumination cues.
  • The network enables precise texture editing, relighting, and point-cloud registration by significantly reducing shading errors and enhancing color accuracy.

Intrinsic Point-Cloud Decomposition Network (IPCD-Net) is an end-to-end deep learning architecture designed to separate albedo and shading components directly from colored 3D point clouds, enabling tasks such as relighting, texture editing, and robust registration under varying outdoor illumination. IPCD-Net addresses the fundamental challenges posed by the irregular nature of point-cloud data and the necessity to infer global illumination properties in the absence of explicit light direction or color, which prior image-based and point-based decomposition techniques fail to handle effectively.

1. Formulation of Intrinsic Decomposition for Point Clouds

The intrinsic decomposition task seeks, for each spatial location, to factor observed color into albedo and shading from a single observation, typically under a Lambertian assumption for reflectance. In the classical image setting, this is expressed as I(p)=A(p)S(p)I(p) = A(p) \cdot S(p), where II is the pixel color, AA the per-pixel albedo, and SS the shading due to illumination. IPCD-Net extends this paradigm to unordered point sets: for a point cloud represented by positions PRN×3P \in \mathbb{R}^{N\times 3} and observed colors IRN×3I \in \mathbb{R}^{N\times 3}, the aim is to learn functions predicting A^,S^RN×3\hat{A}, \hat{S} \in \mathbb{R}^{N\times 3} such that, at each point ii:

IiA^iS^iI_i \approx \hat{A}_i \odot \hat{S}_i

where \odot denotes elementwise multiplication. All predictions and supervisory signals reside natively in point-cloud space, obviating rasterization or grid imposition.

2. Network Architecture and Pointwise Feature Aggregation

IPCD-Net processes input per-point features comprised of 3D coordinates PiP_i and RGB color IiI_i. For permutation-invariant feature learning, it employs Point Transformer v2 (PTv2) as a shared encoder, assembling k-nearest neighbor graphs and applying grouped vector attention, producing latent features FRN×CF \in \mathbb{R}^{N \times C}. Two “pre-estimate” heads, parameterized as small multi-layer perceptrons (MLPs), then predict initial albedo and shade estimates denoted ARN×3A' \in \mathbb{R}^{N\times 3} and SRN×3S' \in \mathbb{R}^{N\times 3}.

Downstream, global-light context is introduced by the Projection-based Luminance Distribution (PLD) module, whose output is concatenated per-point to the pre-estimates. Two refinement MLP heads subsequently yield the final predictions, A^\hat{A} and S^\hat{S}. All operations—attention, MLP layers, neighbor search—are natively set-based and respect the unordered, non-uniform density of point clouds.

3. Projection-based Luminance Distribution (PLD) and Global-Illumination Encoding

A principal difficulty in point-cloud decomposition is the absence of canonical image axes or global-light annotation. IPCD-Net’s PLD module estimates the light field over the point cloud from within the data. It samples 324 uniform directions (θ,ϕ)(\theta, \phi) over the upper hemisphere and, for each, rotates the cloud, renders an orthographic luminance image L(θ,ϕ;u,v)RH×WL(\theta, \phi; u, v) \in \mathbb{R}^{H\times W}, and computes the mean luminance:

PLD(θ,ϕ)=1NPu,vL(θ,ϕ;u,v)\mathrm{PLD}(\theta, \phi) = \frac{1}{N_P} \sum_{u,v} L(\theta, \phi; u, v)

where NP=HWN_P = H \cdot W. This collection, interpreted as a hemispherical luminance map, is embedded by SphereNet (a spherical convolution network) into a global-light feature vector RdL\ell \in \mathbb{R}^{d_L} (with dL=3d_L = 3). The hierarchical refinement proceeds by tiling \ell to all NN points and concatenating with AA', SS' to form input XRN×(6+dL)X \in \mathbb{R}^{N\times (6 + d_L)} for the final MLP heads. This mechanism instructs the network to leverage coarse-to-fine light cues, improving both the removal of cast shadows from albedo and the color accuracy of shade while preserving local geometric variation.

4. Supervision, Loss Terms, and Learning

The training objective combines supervision on intermediate (pre-estimates) and final predictions using ground-truth decompositions available in synthetic data. For both albedo and shading, pointwise losses (Frobenius norm) are applied:

  • Pre-estimate losses: Lprealb=AAFL^{\mathrm{alb}}_{\mathrm{pre}} = \|A - A' \|_F, Lpreshd=SSFL^{\mathrm{shd}}_{\mathrm{pre}} = \|S - S'\|_F, Lprephy=IASFL^{\mathrm{phy}}_{\mathrm{pre}} = \|I - A' \odot S'\|_F
  • Final-estimate losses: Lpntalb=AA^FL^{\mathrm{alb}}_{\mathrm{pnt}} = \|A - \hat{A}\|_F, Lpntshd=SS^FL^{\mathrm{shd}}_{\mathrm{pnt}} = \|S - \hat{S}\|_F, Lpntphy=IA^S^FL^{\mathrm{phy}}_{\mathrm{pnt}} = \|I - \hat{A} \odot \hat{S}\|_F

The total loss is

Ltot=Lpntalb+Lpntshd+Lpntphy+λ(Lprealb+Lpreshd+Lprephy)L_{\mathrm{tot}} = L^{\mathrm{alb}}_{\mathrm{pnt}} + L^{\mathrm{shd}}_{\mathrm{pnt}} + L^{\mathrm{phy}}_{\mathrm{pnt}} + \lambda (L^{\mathrm{alb}}_{\mathrm{pre}} + L^{\mathrm{shd}}_{\mathrm{pre}} + L^{\mathrm{phy}}_{\mathrm{pre}})

with λ=0.1\lambda = 0.1 weighting the auxiliary supervision. This regime encourages both accurate decomposition and faithful reconstruction at multiple network stages.

5. Dataset Construction and Training Protocol

IPCD-Net is trained and validated on a synthetic outdoor-scene dataset tailored for intrinsic decomposition in point clouds. The dataset comprises 30 distinct “assets” (building models with controllable albedo), each rendered with respect to three sun positions (morning, noon, evening) to create varied shading conditions. Pure-shade ground truth is computed by removing albedo and re-illuminating. For each condition, 10610^6 points are randomly sampled; ground-truth albedo, shading, and color per point are stored. The final set consists of 90 point-cloud scenarios, split by asset: 23 for training, 7 for test.

The pipeline utilizes PyTorch on NVIDIA H100 GPUs. Each training step samples 10410^4 points from the 10610^6-point clouds. The encoder uses PTv2; PLD projections are rendered with PyTorch3D; SphereNet processes the PLD feature. PLD’s 324 views correspond to 1010^\circ steps in elevation (00^\circ to 8080^\circ), azimuth (00^\circ to 350350^\circ), with images of size 256×256256 \times 256. Optimization employs Adam with standard parameters.

6. Benchmarks, Ablations, and Quantitative Results

Evaluation metrics include per-point MSE (×102\times 10^{-2}), MAE (×101\times 10^{-1}), and PSNR (dB) for both albedo and shade. Comparative baselines are standard intrinsic image techniques (e.g., Retinex, NIID-Net, CD-IID, IID-Anything), a rendering-then-IID-then-reprojection pipeline (GS-IR), and ablated versions of IPCD-Net (w/o PLD, w/o HFR+PLD, w/o shared encoder, “base model”). Quantitative test-set results are:

Model MSEalb_\mathit{alb} MSEshd_\mathit{shd} MAEalb_\mathit{alb} MAEshd_\mathit{shd} PSNRalb_\mathit{alb} PSNRshd_\mathit{shd}
Baseline-A 18.9 29.1 3.58 4.27 7.57 5.96
NIID-Net 15.2 12.1 2.93 2.46 8.97 9.99
IPCD-Net_base 4.02 5.11 1.58 1.62 14.0 13.5
IPCD-Net 3.03 3.25 1.31 1.37 15.6 15.1

Ablation indicates that PLD provides the most reduction in shading error, hierarchical refinement supports albedo recovery, and shared encoding stabilizes training. The full model demonstrates clear quantitative improvements across all metrics.

7. Applications, Generalization, and Limitations

Practical Applications

  • Texture editing: Separating II into A^\hat{A} and S^\hat{S} allows selective editing of the albedo. Recombining edited albedo with original shading prevents unnatural lighting artifacts that would result from direct modification.
  • Relighting: To transfer an object between lighting conditions, one computes A^1\hat{A}_1 from input I1=A1S1I_1 = A_1 \odot S_1, then synthesizes I~12=A^1S2\tilde{I}_{1\rightarrow 2} = \hat{A}_1 \odot S_2. This operation mitigates residual shadows and achieves ground-truth-consistent appearance under novel illumination.
  • Point-cloud registration: Under changing light, ICP registration on original II degrades as overlap falls. Registration using estimated albedo A^\hat{A} recovers recall rates near those achievable with ground-truth.

Generalization and Real-World Evaluation

On SensatUrban (real urban LiDAR and imagery), IPCD-Net achieved the highest F1 relative to alternative baselines, with annotations over 900 reflectance-ordered point pairs. The method effectively reduces cast shadows and remains robust to noise-prone real scans.

Limitations and Prospects

PLD presumes Lambertian, diffuse-dominated scenes; severe specular or highly variable reflectance (e.g., black vs. white-mirror) can bias luminance statistics. Very sparse or occluded clouds degrade PLD reliability. Prospective advances include learned inpainting/completion to densify PLD projections and the adoption of newer point-cloud encoding backbones or integration with BRDF estimation for more general inverse rendering scenarios.

IPCD-Net constitutes the first end-to-end neural framework for learned decomposition of arbitrary colored point clouds into albedo and shade directly, leveraging point-wise feature aggregation and global-light analysis via PLD, with strong performance on synthetic and real-world benchmarks in decomposition fidelity and practical downstream tasks.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to IPCD-Net.