Ambient-Robust Inverse Rendering

Updated 4 July 2026

Ambient-robust inverse rendering is a technique that recovers scene geometry, reflectance, and illumination invariant to uncontrolled ambient light, shadows, and inter-reflections.
It leverages neural scene representations, physical BRDF models, and active illumination strategies to resolve ambiguities in shading and material properties.
The approach enables consistent relighting, object insertion, and photogrammetry by accurately separating lighting effects from intrinsic scene characteristics.

Ambient-robust inverse rendering denotes inverse rendering methods that recover geometry, reflectance, and often illumination while remaining consistent across different ambient or environment lighting conditions, including unknown ambient illumination, strong local lights, cast shadows, indirect illumination, and inter-reflections. In its most general form, the image formation model is written as

$L_o(p,\omega_o) = L_e(p,\omega_o) + \int_{\Omega^+} L_i(p,\omega_i)\, f_r(p,\omega_i,\omega_o)\, (\omega_i \cdot n)_+ \, d\omega_i,$

so robustness depends on how a method constrains the decomposition of incident radiance, BRDF, geometry, and visibility when the observed image conflates dark albedo with shadow, or bright highlights with either bright light or strong specularity (Song et al., 2024). Recent work addresses that ambiguity through active illumination, explicit shadow and global-illumination modeling, spatially varying lighting fields, neural scene representations, and probabilistic or generative priors (Cheng et al., 2023, Chung et al., 28 May 2026, Le et al., 2024).

1. Problem formulation and sources of ambiguity

A recurring formulation is to recover geometry, spatially varying reflectance, and lighting from images captured under uncontrolled or partially controlled illumination. In outdoor aerial photogrammetry, the outgoing radiance is modeled with Lambertian reflectance, direct sunlight, and diffuse skylight; in indoor scene inverse rendering, the incident field may instead be represented as a spatially varying environment map or as direct plus indirect transport; in active capture, the observed image may be decomposed into a static ambient component plus a dynamic component induced by a moving flash (Song et al., 2024, Li et al., 2019, Cheng et al., 2023).

The core ambiguities are stated explicitly in several papers. With only passive RGB, observed intensities conflate unknown illumination and surface reflectance: a dark region may correspond to low albedo or a shadow, while a strong highlight may correspond to bright light or strong specularity (Chung et al., 28 May 2026). Under intense indoor lighting, shadows become entangled with reflectance, and indirect or ambient lighting causes non-trivial shading even in occluded regions; if the model does not explicitly account for shadows, their effect is baked into albedo or roughness (Wei et al., 2024). In outdoor photogrammetry, sun, sky, cast shadows, indirect bounces, atmospheric scattering, and specular reflection all contribute to the ill-posedness of albedo recovery (Song et al., 2024).

A concise active formulation is given by WildLight:

$\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$

where $\mathcal{A}$ is an unknown ambient component and $\mathcal{L}$ is a flashlight reflection component (Cheng et al., 2023). This separation illustrates a general principle: ambient-robust inverse rendering typically tries to isolate an illumination-invariant material description from a lighting-dependent radiometric term.

2. Scene, BRDF, and lighting representations

Recent methods employ several scene representations. SDF-based neural radiance fields appear in SIR, WildLight, and GLOW, where geometry is encoded as a neural signed distance field and normals are obtained from the SDF gradient (Wei et al., 2024, Cheng et al., 2023, Wu et al., 28 Nov 2025). Gaussian representations are used in GUS-IR, BRDFusion, and the active RGB-NIR method: GUS-IR studies 3D Gaussian Splatting with unified shading and uses the shortest axis as normal for each particle, BRDFusion uses 3D Gaussians with per-Gaussian albedo, roughness, and metallic, and the RGB-NIR system uses 2D Gaussian primitives whose positions, scales, normals, and opacities are optimized from multi-view data (Liang et al., 2024, Liu et al., 15 Jun 2026, Chung et al., 28 May 2026).

BRDF parameterizations are similarly standardized around physically based microfacet models. WildLight uses Disney’s principled BRDF with base_color, roughness, subsurface, metallic, dielectric, clearcoat, and clearcoat_glossiness (Cheng et al., 2023). SIR uses a microfacet BRDF with Lambertian diffuse albedo and a Cook–Torrance / GGX specular term (Wei et al., 2024). BRDFusion uses a Cook-Torrance BRDF with diffuse albedo, roughness, and metallic (Liu et al., 15 Jun 2026). The active RGB-NIR method couples RGB and NIR through Disney BRDFs, sharing roughness $\sigma$ and metallic $m$ across spectra while allowing diffuse albedo to vary between RGB and NIR (Chung et al., 28 May 2026).

Lighting representations vary with scene scale and capture regime. “Inverse Rendering for Complex Indoor Scenes” predicts a spatially varying spherical Gaussian lighting representation, effectively a local environment map at each surface point, so that global illumination is baked into a local per-point lighting distribution (Li et al., 2019). SIR learns an irradiance field and a decomposable shadow field under posed HDR images (Wei et al., 2024). SIRe-IR decomposes scenes into environment map, albedo, and roughness by using non-linear mapping and regularized visibility estimation, while accurately modeling the indirect radiance field, normal, visibility, and direct light simultaneously (Yang et al., 2023). GLOW instead uses a neural radiance cache conditioned on surface point, outgoing direction, and flashlight position to approximate global illumination under a dynamic co-located point light (Wu et al., 28 Nov 2025).

3. Active illumination as a robustness mechanism

A major branch of ambient-robust inverse rendering adds a minimally controlled light source to disambiguate reflectance from ambient appearance. WildLight uses a smartphone’s built-in flashlight as a minimally controlled light source and decomposes image intensities into a static appearance corresponding to ambient flux plus a dynamic reflection induced by the moving flashlight (Cheng et al., 2023). Its practical premise is that some images are taken with the flashlight on and some with it off, but no flash/non-flash pairs from the same viewpoint are required. The flashlight component supplies physically accurate photometric constraints through co-located BRDF shading with inverse-square falloff, while the ambient component is modeled as a neural light field (Cheng et al., 2023).

The active RGB-NIR method extends the same logic cross-spectrally. It uses multi-view RGB images under ambient/environment illumination together with multi-view NIR images acquired with an active NIR flash, and computes flash-only NIR images by subtraction:

$I^{\text{NIR}} = I^{\text{NIR-on}} - I^{\text{NIR-off}}.$

The key claim is that NIR flash illumination is imperceptible to human observers and yields stable point-light shading that is largely invariant to ambient illumination (Chung et al., 28 May 2026). Geometry and NIR BRDF are first refined from flash-only NIR images, after which RGB diffuse albedo and RGB environment illumination are estimated with much less ambiguity (Chung et al., 28 May 2026).

GLOW adopts a different active regime: a dark room, a handheld smartphone, and a dynamic co-located light-camera pair whose position is calibrated by Structure-from-Motion (Wu et al., 28 Nov 2025). The method is not ambient-robust in the sense of relying on uncontrolled room light; rather, it is robustness-oriented in the sense that it explicitly models the strong inter-reflections, dynamic shadows, near-field lighting, and moving specular highlights introduced by a co-located point light. This suggests that active capture is valuable not only because it suppresses uncontrolled ambient flux, but also because it creates illumination whose geometry can be written explicitly into the inverse problem.

Method	Capture regime	Robustness mechanism
WildLight	Smartphone RGB, mixed flash-on and flash-off	Static ambient appearance plus dynamic flashlight reflection
Active RGB-NIR	RGB under ambient plus NIR-on/NIR-off	Flash-only NIR shading largely invariant to ambient illumination
GLOW	Dynamic co-located light and camera	Dynamic radiance cache and surface-angle-weighted radiometric loss

4. Shadows, indirect illumination, and global transport

Explicit shadow handling is central to ambient robustness. SIR introduces a decomposable, differentiable shadow field in which diffuse outgoing radiance is modeled as

$L_{o,d}(\hat{\mathbf{x}}) = \frac{\hat{A}(\hat{\mathbf{x}})}{\pi} \cdot I(\hat{\mathbf{x}}) \cdot S(\hat{\mathbf{x}}), \qquad S(\hat{\mathbf{x}})=S_\text{hard}(\hat{\mathbf{x}})\cdot S_\text{soft}(\hat{\mathbf{x}}),$

with hard and soft shadow components learned separately (Wei et al., 2024). SIRe-IR targets a related goal under high-illuminance scenes: by using non-linear mapping and regularized visibility estimation, it decomposes the scene into environment map, albedo, and roughness, and removes both shadows and indirect illumination in materials without imposing strict constraints on the scene (Yang et al., 2023). GUS-IR addresses indirect transport in Gaussian splatting by proposing a unified shading solution and by enhancing the probe-based baking scheme proposed by GS-IR to achieve more accurate ambient occlusion modeling (Liang et al., 2024).

Global illumination is handled in multiple ways. “Inverse Rendering for Complex Indoor Scenes” represents incident radiance at each surface point with a spatially varying spherical Gaussian lighting representation, so multiple bounces, interreflections, soft shadows, and occlusions are effectively baked into a local per-point lighting distribution rather than solved as explicit path transport during inference (Li et al., 2019). GLOW takes the opposite route and introduces a neural radiance cache constrained by a radiometric prior, together with a dynamic radiance cache split into direct and indirect components:

$L_\theta(\mathbf{x},\omega_o,\mathbf{x}_\ell) = \mathbbm{V}(\mathbf{x}_\ell\leftrightarrow\mathbf{x})\,L_\theta^{\mathrm{direct}}(\mathbf{x},\omega_o,\mathbf{x}_\ell) + L_\theta^{\mathrm{indirect}}(\mathbf{x},\omega_o,\mathbf{x}_\ell),$

so that sharp visibility discontinuities are handled analytically while indirect illumination remains a continuous function of light pose (Wu et al., 28 Nov 2025).

Outdoor ambient robustness requires a different factorization. The aerial albedo-recovery method explicitly splits lighting into direct sunlight and diffuse skylight:

$L_i(p,\omega_i,t)=L_{\text{sun}}(p,t)+L_{\text{sky}}(p,\omega_i,t),$

with geometric visibility for both, a Gaussian skylight distribution around the sun, and a penumbra-aware refinement of sun visibility near shadow boundaries (Song et al., 2024). This formulation makes the sky term an explicit environment illumination model rather than an unspecified shading residual.

5. Inference strategies, priors, and optimization schedules

A striking pattern across the literature is staged or structured optimization. SIR uses a three-stage material estimation procedure: first geometry and HDR radiance, then irradiance and hard shadow, and finally BRDF with soft shadow and instance-level regularization (Wei et al., 2024). The active RGB-NIR method uses three stages as well: geometry initialization from RGB via 2D Gaussian splatting, NIR flash inverse rendering for geometry refinement and NIR BRDF estimation, and RGB inverse rendering for RGB diffuse albedo and RGB environment map (Chung et al., 28 May 2026). GLOW begins with geometry initialization using NeuS-style rendering and an SfM-based geometry prior, then adds physically based rendering with a radiance cache, and finally performs material refinement on a fixed mesh (Wu et al., 28 Nov 2025).

Ambient robustness is also reinforced by strong priors. Robust Inverse Graphics formulates the problem as

$\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 0

where $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 1 is a clean scene prior and $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 2 is an explicit corruption latent with an uninformative uniform corruption prior (Le et al., 2024). The corruption field can absorb rain, snow, fog, floaters, unknown objects, or FOV noise, while the learned scene prior encourages clean geometry and appearance. The paper argues that full posterior inference is necessary because MAP with a flat corruption prior degenerates to a “billboard” solution in which the corruption explains the observation and the scene collapses to the highest-prior configuration (Le et al., 2024). Although framed around corruptions rather than illumination, the construction is directly relevant to ambient robustness: ambient effects can be treated as explicit nuisance variables instead of being forced into BRDF or geometry.

BRDFusion introduces a different kind of prior by combining a physical inverse/forward renderer with generative video diffusion models. Its physical model provides controllable rendering from the scene configuration, while the generative model denoises and fixes artifacts; during inverse rendering, diffusion-based G-buffer and environment-map priors regularize geometry, albedo, roughness, metallic, and lighting (Liu et al., 15 Jun 2026). This suggests that when complex ambient lighting makes purely physical optimization too ill-posed, generative priors can act as a regularizer without discarding explicit BRDF and lighting control.

6. Empirical evidence, applications, and limitations

Reported experiments show that robustness claims are not merely qualitative. On synthetic data, the active RGB-NIR method reports RGB diffuse albedo PSNR $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 3, SSIM $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 4, LPIPS $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 5, roughness RMSE $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 6, normal MAE $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 7, and relighting PSNR $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 8, with consistency across multiple ambient lighting scenarios (Chung et al., 28 May 2026). WildLight reports, on Bunny, mean/median surface distance $\mathcal{I}(\mathbf{x}, \mathbf{v}, t, s) = \mathcal{A}(\mathbf{x}, \mathbf{v}) + s \gamma \mathcal{L}(\mathbf{x}, \mathbf{v}, t),$ 9 and normal error $\mathcal{A}$ 0, and on Armadillo a novel view and relighting score of $\mathcal{A}$ 1 dB / $\mathcal{A}$ 2 compared with PhySG’s $\mathcal{A}$ 3 dB / $\mathcal{A}$ 4 (Cheng et al., 2023). SIR reports albedo PSNR $\mathcal{A}$ 5 on its synthetic evaluation, and its ablation “w/o $\mathcal{A}$ 6” drops albedo PSNR to $\mathcal{A}$ 7, directly quantifying the cost of omitting explicit shadow decomposition (Wei et al., 2024). BRDFusion reports the best PSNR and SSIM for novel-view relighting on its synthetic urban benchmark, and the best RMSE for roughness and metallic among compared methods (Liu et al., 15 Jun 2026).

The downstream uses are correspondingly broad. Several systems emphasize relighting, object insertion, and material editing as primary applications: SIR highlights free-view relighting, object insertion, and material replacement (Wei et al., 2024); GUS-IR supports relighting and retouching in computer vision, graphics, and extended reality (Liang et al., 2024); BRDFusion supports novel-view relighting, night simulation, and dynamic object insertion/editing in urban scenes (Liu et al., 15 Jun 2026). In aerial photogrammetry, albedo recovery improves image matching, dense matching, edge and line extraction, and change detection, because illumination-dependent shading is removed from textures used by downstream pipelines (Song et al., 2024). This suggests that ambient-robust inverse rendering is not only a relighting problem but also a geometry, registration, and consistency problem.

Limitations are equally consistent across papers. The active RGB-NIR method notes failure modes for meter-scale or larger scenes, extreme specular or mirror materials, strong ambient NIR such as direct sunlight, and cross-spectral BRDF mismatch when roughness or metallic are not actually wavelength-independent (Chung et al., 28 May 2026). GLOW identifies the lack of emissive-material modeling, floaters in unseen regions, and reliance on dense co-located captures in dark rooms (Wu et al., 28 Nov 2025). BRDFusion inherits limitations from using a single global environment map and a Cook-Torrance material model in large urban scenes, while the aerial method explicitly does not model indirect illumination, volumetric scattering, or specular reflection, so such effects can remain in the recovered albedo (Liu et al., 15 Jun 2026, Song et al., 2024). A plausible implication is that ambient robustness is presently strongest when the capture regime itself contributes constraints—through active illumination, calibrated geometry, or dense multi-view data—rather than when the decomposition is inferred from passive RGB alone.