Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

140 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

46 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

SDM-UniPS: Scalable Universal Photometric Stereo

Updated 1 July 2025

SDM-UniPS is a scalable, mask-free universal photometric stereo framework designed to estimate detailed surface normals from multiple images with unknown, arbitrary lighting.
It employs a novel architecture using scale-invariant encoding and non-local pixel sampling to handle high-resolution inputs and complex illumination without explicit calibration.
Achieving state-of-the-art accuracy on benchmarks, SDM-UniPS enables practical applications like high-fidelity 3D scanning and analysis in unconstrained real-world scenarios.

SDM-UniPS refers to the Scalable, Detailed, Mask-free Universal Photometric Stereo framework, a significant development within the domain of universal photometric stereo (PS). This approach is designed to robustly estimate fine-grained surface normals from multi-image observations captured under arbitrary, uncontrolled, and spatially-varying illumination, without relying on explicit mask or lighting calibration. SDM-UniPS advances state-of-the-art PS by addressing challenges such as scalability to high-resolution inputs, preservation of geometric detail, lighting-normal feature entanglement, and generalization to real-world image capture.

1. Architectural Principles and Algorithmic Approach

SDM-UniPS employs an encoder–decoder neural network architecture that processes multiple images of the same scene, each captured under different (and unknown) lighting. The core components and their functions are as follows:

Scale-invariant Spatial-light Feature Encoder: Input images (possibly of arbitrary resolution) are partitioned into non-overlapping sub-tiles. Each tile is processed by a ConvNeXt-T backbone to extract a hierarchical feature representation at multiple scales. To maintain precision with high-resolution inputs, the "split-and-merge" mechanism enables local feature computation and global aggregation, avoiding the loss of spatial or geometric detail that typically arises from aggressive downsampling.
Cross-light Feature Attention: At each spatial location, features corresponding to the same pixel across all illuminations are fused using a series of Transformer blocks. This "light-axis" Transformer models the interdependence of observations across lighting conditions, forming a spatial-light feature tensor that encodes local appearance variation due to normal and illumination changes without relying on explicit lighting parameters.
Pixel-sampling Transformer Decoder: Rather than reconstructing the entire normal map in a dense, feedforward fashion, SDM-UniPS samples groups of pixel locations (e.g., several thousand) and aggregates their spatial-light features for all lighting conditions. These features are pooled using multi-head attention and then subjected to a second-stage Transformer to model spatial (non-local) surface context, enforcing consistency and allowing non-local geometry priors to influence local prediction. Each sampled pixel's final feature is decoded to a normal vector via a learned MLP.
Mask-free Operation: The system is robust to the absence of object masks, a critical attribute for applications in unconstrained environments.

Mathematical Formulation

The overall inference may be described as follows:

Given images $\mathbf{I}_k \in \mathbb{R}^{H \times W \times 3}$ for $k = 1, \dots, K$ ,

Extract per-image features: $\mathcal{F}_k = \text{Encoder}(\mathbf{I}_k)$ .
For each sampled pixel $x_i$ and light $k$ , obtain spatial-light features and observed values: $\{ \mathcal{F}_k(x_i), \mathbf{I}_k(x_i) \}_{k=1}^K$ .
Aggregate across all $K$ : $a(x_i) = \text{PMA}\left( \{ \mathcal{F}_k(x_i), \mathbf{I}_k(x_i) \}_{k=1}^K \right )$ .
Fuse across all pixels $i$ using a Transformer: $[a'(x_1), ..., a'(x_m)] = \text{Transformer}(a(x_1), ..., a(x_m))$ .
Decode to normals: $\mathbf{n}(x_i) = \text{MLP}(a'(x_i))$ .

This architectural strategy enables efficient, scalable, and highly detailed normal map recovery.

2. Treatment of Arbitrary and Spatially-Varying Lighting

Traditional calibrated PS methods assume known, spatially uniform or parametric lighting and often require explicit light direction estimation or calibration. SDM-UniPS removes these constraints:

Lighting as Feature, Not Parameter: No explicit modeling or estimation of physically parameterized light vectors is performed. Instead, the encoder learns to represent all lighting-induced variability as latent features, enabling robustness to both global and spatially-varying illumination.
Non-parametric Generalization: The spatial-light feature encoding supports generalization to various real-world illumination scenarios, including shadows, inter-reflections, complex occlusions, and glossy highlights.
Non-locality: Pixel-sampling Transformer's non-local interactions allow cues such as mutual shading and reflectance priors to inform the prediction for a given pixel, leveraging scene-wide context and reducing ambiguities arising from lighting-normal couplings.

3. New Synthetic Dataset and Data Regimen

SDM-UniPS is trained on the PS-Mix synthetic dataset, designed to address the lack of diversity and realism in previous datasets:

Composition: Over 34,000 scenes, each containing multiple objects, random arrangements, varied assignments of real-world diffuse, specular, and metallic textures from AdobeStock, and shapes drawn from ModelNet, ABC, and SHREC repositories.
Illumination Diversity: Five distinct lighting setups are incorporated: natural environment, single directional, single point, and mixed environment with either directional or point lighting. Lighting variation includes both low- and high-frequency spatial effects.
Resolution and Normalization: All renders are at $512 \times 512$ with auto-exposure and normalization, ensuring compatibility with subsequent inference at higher resolutions.
Generalization to Real-world Scenarios: The variety of materials, overlapped objects, and high-frequency spatial lighting in PS-Mix drives superior robustness and transfer to practical, unconstrained PS tasks.

4. Performance and Comparative Evaluation

SDM-UniPS demonstrates state-of-the-art performance on prevalent benchmarks:

On DiLiGenT Dataset: Achieves a mean angular error of 5.8°, outperforming prior universal and even many calibrated/non-universal methods (e.g., surpassing UniPS at 14.7° and Logothetis2021 at 6.2°). Notably, this is accomplished using as few as 8 input images under unknown lighting and without object masks.
Robustness to Number of Images: Maintains low error even with fewer images (e.g., 2–8 frames), outperforming lighting-calibrated two-shot methods.
Mask-free and Resolution-scalable: Normal estimation remains accurate without masks, and the split-merge architecture supports processing of images at resolutions significantly higher than those seen in training.
Ablation Results: The introduction of non-local interaction (sampling Transformer) and scale-invariant encoding yields significant accuracy improvements and supports high-fidelity detail recovery.

Method	Mask Required	Mean Angular Error (°)	Uncalibrated Lighting	High-Res Support	Non-local Interaction
UniPS	Yes	14.7	Yes	No	No
Logothetis2021	No	6.2	No	No	No
SDM-UniPS	No	5.8	Yes	Yes	Yes

5. Detail Preservation and Non-local Normal Recovery

SDM-UniPS addresses PS’s historical limitation in preserving high-frequency geometric details via:

Hierarchical Split-and-Merge Feature Processing: Sub-tiling allows for the retention of fine features throughout the encoding pipeline, avoiding blurring introduced by global downsampling.
Pixel-wise and Non-local Decoding: By operating on randomly sampled pixel groups with a Transformer, SDM-UniPS leverages both local intensity cues and global context, enabling restoration of subtle normal variations, shadow boundaries, and fine object edges, rivaling the quality of laser scanning.
Generalization to Real Scenes: Evaluations demonstrate robust detail capture on real-world images in uncontrolled lighting, surpassing other universal PS approaches.

6. Practical Applications and Limitations

SDM-UniPS extends the applicability of photometric stereo techniques to diverse real-world domains:

3D Scanning and Modeling: Enables scanner-level normal maps from casual photographs, supporting heritage documentation, industrial inspection, and consumer-grade 3D capture.
Reflectance and SVBRDF Recovery: The spatial-light features learned for normal estimation are adaptable to broader inverse rendering tasks, including material understanding and relighting.
Robotics and AR/VR: Scene understanding in uncontrolled illumination, critical for autonomous manipulation and immersive content creation.
Industrial and Biomedical Imaging: Mask-free operation allows for the analysis of complex objects and tissues without specialized acquisition setups.

Limitations, as identified in more recent literature, include persistent coupling between lighting and normal features (making extremely complex scenes challenging), and some smoothing of the finest geometric detail due to upsampling operations. Successor frameworks such as LINO-UniPS address these constraints by employing learnable light register tokens, global attention, and wavelet-based feature operations.

7. Summary Table of SDM-UniPS Properties

Property	SDM-UniPS	Impact
Scale-invariant encoding	Yes (split-and-merge ConvNeXt-T, arbitrarily high-res)	Accurate, detailed normals at any scale
Non-local normal inference	Pixel-sampling Transformer decoder	Context-aware, less ambiguous estimates
Mask-free operation	Yes	Widens deployment spectrum
Lighting model independence	Yes (implicit, not parametric)	Robust to uncontrolled conditions
Training data	PS-Mix (multi-object, multi-material, lighting variety)	Enhances generalization
Benchmark performance	State-of-the-art on DiLiGenT and PS-Wild-Test	Validated superiority
Applications	3D scanning, relighting, inspection, AR/VR, robotics	Versatility, democratization of PS

SDM-UniPS represents an inflection point in photometric stereo, rendering state-of-the-art surface normal estimation scalable, robust, and practical for a wide range of real-world imaging scenarios where classical constraints on illumination or object masking are infeasible. Subsequent research continues to refine its detail preservation and decoupling of physical factors within learned feature representations.

PDF Markdown Chat (Upgrade)