- The paper presents an inverse rendering method that decouples illumination and reflectance using active near-infrared flash imaging paired with passive RGB.
- It employs a three-stage optimization pipeline involving geometry initialization, NIR-based refinement, and RGB environment recovery for robust performance.
- Empirical results show significant improvements in geometry, albedo accuracy, and relighting, validated through both quantitative and qualitative evaluations.
Ambient-Robust Inverse Rendering via Active RGB-NIR Imaging
The inverse rendering task—reconstructing object geometry and surface reflectance from images—faces intrinsic ill-posedness, especially in the presence of uncontrolled ambient illumination. Classical and contemporary pipelines using only passive (ambiently-lit) RGB images are confounded by significant ambiguity between environment illumination and intrinsic reflectance. Recent active illumination methods, which attempt to decouple these sources by introducing artificial lighting, are often limited by visible-spectrum perturbations: strong visible flashes disrupt human-centric environments and restrict data capture to controlled scenarios. The discussed paper develops an ambient-robust inverse rendering framework utilizing active near-infrared (NIR) flash illumination alongside ambiently-lit RGB imaging. By exploiting the human imperceptibility of NIR lighting, the system provides strong, disturbance-free point-light cues for reflectance recovery, while RGB channels capture photorealistic color information for downstream applications.
System Architecture and Dataset Construction
The acquisition platform integrates a pixel-aligned RGB-NIR camera with a synchronized high-power NIR flash on a six-DoF robotic arm attached to a mobile base. This architecture (Figure 1) enables dense, programmable, object-centric multi-view sweeps, capturing both ambient RGB and flash/NIR-paired images. The device automates all steps: arm trajectory, exposure bracketing, flash synchronization, mask generation using SAM3 segmentation, and extrinsic calibration via COLMAP.
Figure 1: RGB–NIR mobile vision system and data processing pipeline combining NIR flash isolation, dense sampling, and automated masking/pose estimation.
The pipeline produces the first dense multi-view RGB–NIR inverse rendering dataset with paired flash/no-flash NIR and HDR RGB images across a range of real-world and synthetic environments (Figure 2). The system captures both indoor and challenging outdoor scenes, with environmental ground truth approximated via mirrored spheres to facilitate quantitative evaluation.
Figure 2: Multi-view RGB–NIR image dataset for four real objects and corresponding synthetic scenes, covering diverse ambient conditions.
Methodology: Three-Stage RGB-NIR Inverse Rendering
The core algorithm proceeds via a staged optimization strategy (Figure 3), leveraging complementary modalities to achieve ambient robustness:
Figure 3: Sequential pipeline: (1) geometry initialization from RGB; (2) NIR-based reflectance/geometry refinement; (3) joint RGB albedo and environment recovery.
Stage 1: Geometry Initialization
2D Gaussian splatting is applied to multi-view RGB images for an initial geometry estimate. Each surface Gaussian primitive is parameterized by spatial location, local frame, opacity, and spherical harmonics radiance, optimized under an alpha-blending rasterizer. Only geometry parameters propagate to subsequent stages.
Stage 2: NIR Flash Inverse Rendering
Critical for disentanglement, NIR flash-only images are formed via subtraction (i.e., INIR=INIR-on−INIR-off), providing shading governed predominantly by a known point-light source and immune to ambient RGB illumination variance. Surface reflectance is expressed as a weighted sum of basis Disney BRDFs in NIR, and geometry is refined by jointly minimizing photometric, geometric, mask, and edge-aware smoothness losses. The optimization is fully differentiable (Figure 4), with roughness and metallic shared across the RGB and NIR channels per empirical validation (Figure 5).
Figure 4: NIR flash-based estimation and geometry refinement; red icon = trainable parameters.
Figure 5: Cross-spectral (RGB–NIR) BRDF model validation against measured hyperspectral data.
Stage 3: RGB Environment Inverse Rendering
Given fixed geometry, roughness, and metallic parameters from previous stages, diffuse RGB albedo and the RGB environment map are solved via analysis-by-synthesis, jointly regularized by photometric and NIR-edge-guided losses (Figure 6). The RGB formation is modeled via integral rendering of the environment and surface BRDF, approximated via Monte Carlo integration with multiple importance sampling for efficiency.
Figure 6: RGB albedo and environment optimization; red icon = trainable, blue icon = frozen.
Empirical Evaluation
Quantitative Results vs. Prior Methods
The introduction of active NIR cues enables strong performance increases over both passive and RGB-flash-based active methods in geometry, albedo, and relighting accuracy (see, e.g., Table 1 in the source). The method exhibits significant improvements in PSNR/SSIM and produces lower perceptual errors (LPIPS) in both synthetic and real data. It enables robust relighting under entirely novel ambient environment maps, demonstrating correct separation of geometry, intrinsic reflectance, and lighting under challenging illumination changes.
Qualitative Analysis on Ambient Robustness
The method achieves stable reflectance and lighting recovery across widely varying indoor/outdoor lighting for both synthetic and real scenes (Figures 10–12). RGB albedo maps and BRDF parameters remain consistent even under pronounced variation in ambient illumination, validating the effectiveness of NIR point-light cues for unmixing global illumination effects.
Figure 7: Real-world object reconstructions maintain reflectance/lighting invariance across four environment maps.
Figure 8: Synthetic object RGB albedo is robustly consistent under varying ambient illumination.
Figure 9: Stable reflectance recovery under outdoor lighting with significant NIR ambient content.
Comparative and Ablation Studies
Against prominent baselines (R3DG, GS-IR, IRGS, MaterialFusion, WildLight), the method produces less shading contamination, fewer color/bias artifacts, and superior separation of albedo from illumination (Figures 7–9). Ablations confirm the necessity of the NIR flash stage, with its removal leading to significant accuracy degradation (Figure 10). The method generalizes to complex material and scene compositions (Figure 11), maintaining robust performance for diffuse, specular, and metallic surfaces and for scenes with multiple objects.
Figure 12: Outperforms passive RGB-based methods in both geometry and reflectance under varied lighting.
Figure 13: Improved reflectance estimation over RGB-flash-based WildLight.
Figure 14: More shading-free RGB albedo compared to diffusion-based inverse rendering.
Figure 10: NIR flash inverse rendering substantially boosts reflectance/geometry fidelity.
Figure 11: Method is robust across a spectrum of material types and scene complexities.
Limitations and Implications
Despite demonstrated ambient robustness, certain limitations persist, including diminished performance in meter-scale and extreme mirror configurations, or cases of strong ambient NIR (e.g., direct sunlight) overwhelming the NIR flash dynamic range. Future directions include fusion with more powerful NIR sources, advanced environment modeling, and adaptation to nontrivial cross-spectral reflectance inconsistencies (e.g., dyes, foliage).
Theoretical and Practical Relevance
This work introduces a physically motivated, data-driven paradigm for inverse rendering that leverages unintrusive active imaging, expanding applicability to unconstrained environments and moving beyond darkroom or fixed-lab constraints. By tightly integrating NIR and RGB cues, the proposed methodology achieves a new level of robustness and accuracy, enabling practical scene acquisition for graphics, robotics, and material digitization in situ. The dataset and pipeline also provide a benchmark for future multispectral, active sensing, and differentiable rendering research.
Conclusion
By exploiting active NIR imaging in concert with multi-view RGB, this method provides an ambient-robust solution for inverse rendering of geometry and reflectance, overcoming the longstanding illumination-reflectance ambiguity inherent to passive RGB-only approaches. Systematic empirical comparisons confirm substantial accuracy and generalization benefits, establishing a foundation for future research in scalable, real-world scene acquisition and photometrically faithful rendering under uncontrolled lighting (2605.30250).