Intrinsic Point-Cloud Decomposition (IPCD)
- Intrinsic Point-Cloud Decomposition (IPCD) comprises computational frameworks that extract intrinsic scene properties such as albedo and shading from 3D point clouds using both geometric cues and learning-based methods.
- CID-based techniques partition point clouds into nearly convex subregions by quantifying concave intrusions, thereby enhancing segmentation for tasks like relighting and robotic navigation.
- Learning frameworks like IPCD-Net and PoInt-Net employ point-based neural networks with physics-based loss functions to effectively overcome challenges of unstructured data and varying illumination.
Intrinsic Point-Cloud Decomposition (IPCD) refers to a body of computational methods and learning frameworks dedicated to extracting "intrinsic" scene properties—such as albedo and shading—from discrete 3D point cloud data, or to partitioning such data into semantically or geometrically meaningful components. As point clouds are increasingly adopted in graphics, vision, and robotics, IPCD methods aim to exploit their geometric and radiometric richness, overcoming the limitations imposed by traditional image, mesh, or voxel representations.
1. Mathematical Foundations of Intrinsic Decomposition
Let denote a colored 3D point cloud, where each comprises a 3D position and an RGB color . The foundational radiometric model assumes Lambertian reflectance; thus, for each point: where is the measured color, is the surface albedo, is the shading factor, and “” denotes per-channel (Hadamard) product. In some contexts,
with being the surface normal and the (unknown or estimated) light direction.
The intrinsic decomposition seeks to recover and from the observed point cloud . This is ill-posed, especially given the unstructured nature of point clouds and occlusion, motivating both hand-crafted regularizers (e.g., gradient priors, chromatic ratios) and learning-based constraints.
2. IPCD via Concavity-Induced Distance (CID)
A distinct geometry-only, mesh-free IPCD instance uses Concavity-Induced Distance (CID) (Wang et al., 2023) to partition an unoriented 3D point cloud into nearly convex regions. CID is formally defined for as: where is the line segment between and , and . Discretely,
with samples along . For clusters ,
CID quantifies the “depth” of the maximal concave intrusion along inter-point segments: large CID values across concavities promote separation into convex subregions, while locally convex patches yield low CID. The method’s properties include non-negativity, symmetry, reflexivity, and rigid-motion invariance, but not the triangle inequality—superadditivity enhances robust segmentation at convex boundaries.
The pipeline comprises CID-based farthest point sampling (FPS) for seed selection, adjacency graph construction via CID, and efficient label propagation; CID enables grouping into convex primitives for robotics tasks (collision, planning). Empirical results on S3DIS and ScanNet show CID label propagation achieving competitive Average Precision (AP) with deep learning baselines—e.g., for S3DIS at AP, CID: 0.691, SGPN: 0.541, PointGroup: 0.640. Quantitative grouping quality is measured by purity and compactness metrics, with CID yielding higher purity at equal compactness than mesh-based methods.
3. Learning-Based Intrinsic Decomposition
Intrinsic decomposition in colored point clouds is addressed by learning frameworks such as IPCD-Net (Sato et al., 13 Nov 2025) and PoInt-Net (Xing et al., 2023), which leverage physics-based formulations and point-wise neural aggregators.
3.1 IPCD-Net
IPCD-Net extends image-based intrinsic image decomposition to direct point-cloud processing, solving two challenges: unstructured data and unknown global illumination direction. The network operates entirely in 3D point-cloud space, using Point Transformer V2 (PTv2) for feature extraction and MLP heads for albedo and/or shading.
The full model incorporates Projection-based Luminance Distribution (PLD), capturing global-light statistics by rendering the point cloud from directions distributed on the upper hemisphere, then aggregating projection-specific average luminance into a spherical map. SphereNet extracts a 3D global-light descriptor from the PLD map. This global cue is concatenated per point with per-point pre-estimates and refined via hierarchical MLPs.
Supervision uses the Frobenius norm losses:
| Loss | Definition |
|---|---|
| Albedo Loss | |
| Shade Loss | |
| Physical Consistency |
Auxiliary losses on pre-estimates ensure stage-wise learning. The final total loss aggregates all components, with balancing hyperparameter .
3.2 PoInt-Net
PoInt-Net uses three PointNet-style modules: Point Albedo-Net, Light Direction Estimation Net, and a Learnable Shader. The architecture processes per-point , encoding via multilayer perceptrons, global max-pooling, and feature concatenation. Key innovations include regularization via a gradient-difference prior and cross-color-ratio sparsity, which enforce smoothness and chromatic consistency in albedo.
Training proceeds in two stages. First, shading and light direction are fit; then, albedo reconstruction combines direct albedo loss, image reconstruction, and prior losses. PoInt-Net demonstrates strong zero-shot generalization, robustness to depth noise, and consistently lower error than 2D and NeRF-based methods, with only 1/10–1/100 their parameters.
4. Datasets, Metrics, and Comparative Performance
IPCDecomposition models require datasets pairing point clouds, albedo, and shade, under known illumination. The synthetic outdoor scene dataset for IPCD-Net comprises 30 scenes × 3 sun positions, yielding 90 scenes, each sampled with up to 1 million points. Data splits use 23 assets (×3 times) for training and 7 for testing.
Evaluation metrics are computed in point-cloud space (per channel):
- Mean Squared Error (MSE), scaled ×10⁻²
- Mean Absolute Error (MAE), scaled ×10⁻¹
- Peak Signal-to-Noise Ratio (PSNR), in dB
Baselines include: trivial assignments, point-cloud-adapted Retinex, classic 2D IID methods adapted via rendering (NIID-Net, CD-IID, IID-Anything, GS-IR), and both IPCD-Net and PoInt-Net.
Representative quantitative results:
| Method | MSE(alb) | MSE(shd) | PSNR(alb) | PSNR(shd) |
|---|---|---|---|---|
| IPCD-Net (Base) | 4.02 | 5.11 | 14.0 | 13.5 |
| IPCD-Net (Full) | 3.03 | 3.25 | 15.6 | 15.1 |
| 2D-IID Baselines | >12 | >12 | <12 | <12 |
PoInt-Net achieves state-of-the-art results across ShapeNet-Intrinsic, MIT-Intrinsic, MPI-Sintel, and Inverender, with notably lower MSE and model size compared to 2D CNNs or NeRF-based approaches; for example, on ShapeNet-Intrinsic: PoInt-Net MSE = 0.46 (albedo), 0.38 (shading).
Ablation studies confirm contributions of individual modules (PLD, hierarchical refinement, shader), with performance degrading when any is removed.
5. Practical Applications
Intrinsic decomposition on point clouds enables a range of applications across graphics, AR, and robotics:
- Texture Editing: Decomposition into allows consistent color edits (e.g., wall recoloring), with recomposition preserving shadows.
- Relighting: Utilizing multiple illumination conditions, IPCD-based relighting produces images consistent with new light directions, removing spurious residual shadows inherited from the original composite.
- Point-Cloud Registration: Registration algorithms using raw color suffer from misalignment under varying illumination; substituting estimated albedo as the input for colored ICP significantly improves recall, almost matching ground-truth albedo performance.
- Scene Abstraction and Robotics: Convex-hull envelopes computed from CID-induced clusters offer highly efficient, modular scene representations for collision checking and planning.
6. Limitations and Prospects
IPCD methodologies, both geometry-driven and learning-based, exhibit several limitations:
- Computational Cost: CID-FPS and cluster merging scale with the number of points and require intensive nearest-neighbor queries; acceleration via GPU or approximate NN search is needed for large-scale data.
- Sampling Nonuniformity: Methods can undersegment in sparse or non-uniform scans, such as outdoor LiDAR, missing narrow gaps or over-penalizing sparsity.
- Thin Object Segmentation: Geometry-only CID tends to merge thin structures; integrating photometric cues or learned features may address such ambiguities.
- Triangle Inequality Violation: CID’s superadditivity enhances convex-part separation, but complicates construction of graph embeddings for spectral methods or manifold learning.
- Real-World Generalization: IPCD-Net, trained on synthetic data, generalizes robustly to real outdoor scenes as validated by F1 scores on annotated benchmarks, but performance in highly reflective, specular, or interreflective environments remains to be characterized.
Future work seeks to extend IPCD approaches to more diverse environments, accelerate computational bottlenecks, integrate photometric cues, and refine scene abstractions for high-level semantic tasks.