Universal Photometric Stereo (PS)

Updated 1 July 2025

Universal Photometric Stereo (PS) is a class of algorithms that recovers 3D surface normals from images under arbitrary, unknown lighting and material conditions, overcoming the limitations of classical PS.
It relies on data-driven approaches and techniques like unified feature representations to decouple geometry from variable lighting, enabling robustness in uncalibrated, real-world scenarios.
Applications include casual 3D scanning and industrial inspection, though challenges remain with extreme materials or minimal lighting variation.

Universal photometric stereo (PS) denotes a class of algorithms and frameworks that seek to recover accurate 3D surface normals—and, by extension, shape—of objects from images captured under arbitrary and unknown lighting, with minimal or no assumptions about reflectance, lighting model, or scene calibration. Unlike classical PS, which is typically constrained to directional lights, Lambertian surfaces, and controlled (darkroom) settings, universal PS methods aim for broad applicability across materials, illumination regimes, and acquisition scenarios, including real-world, uncontrolled environments.

1. Problem Definition and Principles

Universal PS extends the traditional photometric stereo problem, which involves estimating surface normals from multiple images taken with a fixed camera but under varying lighting conditions. Classic models rely on assumptions such as Lambertian reflectance, directional light (far-field), or calibrated illumination. Universal PS eliminates or substantially relaxes these constraints, seeking methods that:

Function without knowledge or calibration of lighting directions or spectral composition,
Are robust to arbitrary, complex (including spatially-varying, mixed-frequency, and dynamic) illumination,
Apply to a wide variety of materials and reflectance behaviors, including non-Lambertian, specular, rough, and spatially-varying BRDFs,
Maintain high geometric fidelity in regions of complex surface detail, sharp features, and significant shadowing or inter-reflections.

Universal PS methods emphasize data-driven design, often leveraging deep neural networks trained on synthetic or real datasets rendered or acquired under diverse material, shape, and lighting conditions (2206.02452, 2303.15724, 2506.18882).

2. Methodological Advances and Unified Feature Representation

A significant advancement in universal PS is the introduction of unified or lighting-invariant feature representations. Rather than encoding the observed intensities in ways that entangle lighting and geometric signals—which complicates normal inference under unknown, varying illumination—modern universal PS approaches are architected to explicitly decouple these sources of variation.

For example, LiNO-UniPS (2506.18882) uses:

Learnable light register tokens: Special tokens prepended to image feature sequences in transformer models, designed to aggregate lighting information across all input images.
Global cross-image attention: Attention operations interleaving frame, light-axis, and all-frame global attention, allowing the network to reason about lighting context and establish geometry-consistent features that are invariant to lighting.

The decoupling of lighting from geometry in the learned feature space reduces ambiguity in normal estimation. Empirical evidence shows that such architectures achieve higher cosine similarity (CSIM) and structural similarity (SSIM) of feature representations under different illuminations, which correlates with improved normal recovery accuracy.

Other unified feature approaches, such as global lighting contexts (2206.02452) and scale-invariant split-and-merge encoders (2303.15724), address similar goals via dense latent lighting representation and non-local context sharing respectively.

3. Handling of Arbitrary and Unknown Lighting

Universal PS differs from prior approaches primarily in its design to operate under arbitrary, uncalibrated illumination. Distinct strategies in the literature include:

Global/contextual lighting representations: Networks learn to encode illumination as latent vectors or tokens (light register tokens, global contexts), effectively bypassing the need to estimate explicit lighting parameters.
Order- and count-agnostic input handling: Universal PS models process a variable number and order of images per object, relying on pooling, attention, or transformer modules to aggregate features across arbitrary input image sets [PS-FCN, (1807.08696); SDM-UniPS, (2303.15724); PS-Transformer, (2211.11386)]. This design is crucial for robustness to real-world data acquisition constraints.
Physics-free and spectral ambiguity embracing networks: Some recent architectures are agnostic to the physical image formation model and not tied to spectral or intensity calibration. For instance, SpectraM-PS (2410.20716) processes any number and combination of sensor channels (e.g., RGB, NIR) and leverages spectral ambiguity as a beneficial property for learning.

A unified outcome of these methods is the ability to recover high-fidelity surface normals from scenes featuring mixed and spatially-varying lighting, dynamic or real-time acquisition settings, and materials with complex reflectance (e.g., metallic, anisotropic, translucent).

4. Preservation of High-Frequency Geometric Detail

Protecting fine surface detail is a notable challenge for universal PS, given that downsampling and standard convolutional architectures often smooth or blur high-frequency geometry and edges.

Recent solutions include:

Wavelet Down/Up Sampling: Instead of ordinary resampling, discrete wavelet transforms decompose features and images into low- and high-frequency bands, which are separately encoded and later recombined. This ensures that local surface detail is maintained through the deep feature hierarchy (2506.18882).
Normal-Gradient Perception Loss: A loss function that dynamically weights each pixel’s loss by its normal gradient magnitude (emphasizing areas with rapid normal variation), explicitly focusing the learning process on challenging, detailed regions (2506.18882).
Dual-branch and attention-based fusions: Architectures such as IGA-PSN (2412.11650) incorporate a dedicated branch for image gradients, employing channel and spatial attention to ensure high-frequency cues are preserved during fusion.

As a result—and as confirmed by empirical benchmarks—such methods yield normal maps with sharper transitions, preserved edges, and more accurate microgeometry, notably improving performance in complex scenes with self-shadowing and inter-reflection.

5. Benchmarks, Datasets, and Empirical Results

Universal PS methods are evaluated on a growing suite of benchmarks and new datasets engineered for real-world variability:

DiLiGenT and DiLiGenT10²: Standard benchmarks for shape-from-shading and photometric stereo, with calibrated lighting and ground-truth normals for a range of objects and materials. Leading universal PS methods achieve mean angular errors (MAE) below 6°, approaching the accuracy of classical calibrated methods (2211.14118, 2506.18882).
LUCES(-MV): A recent dataset (2412.16737) for near-field, multi-view point-light photometric stereo, with challenging shapes, materials (including concave, specular, and untextured), high-resolution images, and ground-truth mesh. Universal PS methods demonstrate significantly improved robustness on such real-world data, but results also reveal current limitations, especially for metallic, concave, and highly detailed objects.
PS-Verse: A synthetic dataset with graded complexity and graded normal maps, enabling evaluation of models’ generalization to intricate surface features (2506.18882).
SpectraM14: The first benchmark for spectrally multiplexed PS under unknown spectrum/sensor conditions, supporting dynamic (video) and multispectral evaluation, with uncalibrated real-world scenarios (2410.20716).

Across these datasets, universal PS state-of-the-art methods notably outperform traditional or lighting-specific models, especially in uncalibrated and spatially-varying lighting, although performance gaps remain for extreme materials and geometries.

Method	DiLiGenT MAE (°)	LUCES-MV MAE (°)	Mask-free support	High-frequency detail
LiNO-UniPS (2506.18882)	4.74	9.48	Yes	Yes
SDM-UniPS (2303.15724)	5.80	—	Yes	Partial
MS-PS (2211.14118)	5.84	—	Yes	Partial
SpectraM-PS (2410.20716)	—	—	Yes	—

See referenced papers for full tables and per-object breakdowns.

6. Practical Applications and Limitations

Applications for universal photometric stereo are expanding, including:

Casual and mobile 3D scanning: Real-world digitization, AR/VR content creation, and heritage preservation no longer require calibrated, controlled lighting or precise object masks (2303.15724).
Industrial inspection and metrology: High-fidelity geometry capture in uncontrolled environments, on moving/large/complex surfaces, or with minimal imaging setup (2412.16737).
Robotics, autonomous navigation, and manipulation: Real-time surface normal estimation under unknown or dynamic illumination, crucial for grasping, localization, and scene understanding.
Video-rate or dynamic surface monitoring: Physics-free, single-shot or per-frame multiplexed PS approaches (2410.20716) enable surface normal recovery for deforming or moving objects.

However, some limitations remain:

Residual ambiguities for translucent materials, extreme specularities, and in scenes lacking lighting or reflectance variation.
Dependence on robust segmentation or treatment of background—though recent methods are increasingly mask-free.
Sensitivity to minimal lighting diversity in the input stack, which can degrade normal recovery in static or low-variation illumination conditions.
For the most challenging shapes and materials on new benchmarks like LUCES-MV, even state-of-the-art methods deviate substantially from ground truth, emphasizing the need for further advances in universal PS algorithms.

7. Future Directions

Future research directions identified in the literature include:

Developing more efficient and scalable global attention and feature decoupling techniques, to further improve inference speed and feature invariance (2506.18882).
Addressing remaining ambiguities and failure modes for planar or highly intricate surfaces via additional priors or multi-view consistency.
Expanding training datasets to include broader variations in lighting, geometry, and material, as well as including dynamic and multispectral scenarios (2410.20716, 2506.18882).
Integrating universal PS into multi-modal and multi-view systems (e.g., neural rendering and volumetric approaches) for even richer 3D shape recovery in unconstrained environments.
Further exploring single- or few-shot learning paradigms, physics-free self-supervised losses, and dynamic scene applications.

Universal photometric stereo stands as a critical bridge between theoretical computer vision and practical, real-world 3D perception, with ongoing development pointing towards robust, high-fidelity normal and shape recovery across all conditions encountered in natural and industrial settings.