Single-Image Lighting Decomposition
- Single-Image Lighting Decomposition is the process of separating an image into layers such as reflectance, illumination, shadows, and specular highlights to enable applications like relighting and material editing.
- Contemporary methods blend physics-based models, deep learning priors, and optimization strategies to overcome the inherent ill-posed nature of the problem.
- Advanced techniques utilize spatially-varying lighting representations and self-supervised frameworks to enhance layer separation, driving improvements in photorealistic object compositing and appearance editing.
Single-Image Lighting Decomposition is the process of separating a single observed image into distinct layers or parameterizations describing the underlying scene reflectance, illumination, and—optionally—specific effects such as shadows, specularities, and occlusions. This highly ill-posed problem appears in computer vision, graphics, and computational photography, enabling applications in relighting, material editing, and photorealistic object compositing. Contemporary approaches combine physically motivated decomposition models, deep learning priors, and optimization-based regularization to achieve robust separation under complex, spatially varying lighting and richly textured or non-Lambertian scenes.
1. Physical and Mathematical Foundations
The canonical forward model for single-image lighting decomposition is the intrinsic image equation: where is the observed color at pixel , is the spatially-varying surface reflectance (albedo), and is the shading image incorporating all effects of scene lighting, geometry, and possibly material-dependent interactions (Wang et al., 11 Sep 2025). Extensions account for additional phenomena—e.g., ambient and direct illumination separation, occlusion/shadow effects, and specular highlights: with an occlusion or shadow factor, the diffuse albedo, the colored diffuse irradiance, and the specular shading component (Innamorati et al., 2017).
Advanced models utilize spatially-varying lighting functions expressed as mixtures of spherical Gaussians per pixel (Li et al., 2019), or parameterizations such as spherical harmonics (SH) to compactly encode environmental illumination (Lagunas et al., 2021). For indoor environments, localized area-lights with explicit geometric and photometric parameters have been employed (Gardner et al., 2019).
2. Decomposition Algorithms and Regularization
Solving for and from a single image is ill-posed. Classical, learning-based, and hybrid techniques employ specific priors and optimization routines:
- Model-based Priors: Sparsity of reflectance gradients (ℓ₀-norm gradient counting) encourages piecewise-constant albedo, suppressing color texture leakage into shading (Wang et al., 11 Sep 2025). Smoothness priors on shading, sometimes with chromaticity constraints, prevent spurious high-frequency structure in (Lettry et al., 2018).
- Texture-Guided Priors: Pretrained networks (e.g., CRefNet) supply a lighting-free reflectance prior , yielding a candidate shading . Enforcing low total variation on refines geometric and bump-induced texture preservation in the shading (Wang et al., 11 Sep 2025).
- Learning-based Priors: Fully-convolutional neural networks (hourglass or ResNet/U-Net architectures) are trained to predict entire layer sets (, and optionally specular and occlusion components), using per-layer and recomposition losses (Innamorati et al., 2017, Yang et al., 2024).
- Optimization: The decomposition objective is formulated as a joint energy with terms for fidelity, texture, sparsity, and shading scale; solved by alternating-direction methods (ADMM), feature splitting, or closed-form updates in color and Fourier domains (Wang et al., 11 Sep 2025).
3. Deep Learning Frameworks and Architecture Variants
Modern single-image lighting decomposition exploits deep architectures for end-to-end prediction and/or physically guided intermediate representations. Key variants include:
- Cascaded and Multi-stage Networks: Cascade-refinement approaches sequentially improve albedo, normal, roughness, and lighting estimates via several network stages, integrating physically-based rendering layers and bilateral solvers for edge-aware refinement (Li et al., 2019).
- Latent Flow Matching: Parameter-efficient models utilize a VAE-guided shading manifold and latent flow matching, generating shading in a single inference pass with compact UNet backbones (Singla et al., 18 Jan 2026).
- Self-Supervised and Siamese Designs: Networks trained on pairs of images of the same scene with different lighting, or on augmentations that alter only lighting, use loss terms enforcing albedo invariance, reconstruction, and spherical harmonics constraints to disentangle content and illumination without ground-truth supervision (Liu et al., 2020, Lettry et al., 2018).
- Joint Geometry and Lighting Inference: Architectures explicitly predict per-pixel normals, parametric light sources, or local 3D geometry to support differentiable rendering-based loss functions (Song et al., 2019, Qiu et al., 2020).
- Two-Stage Intrinsic-Relighting Networks: Decomposition into () proceeds via a first network; a second network predicts new shading under target illumination for relighting, supporting both supervised and physics-constrained unsupervised learning (Yang et al., 2024).
Representative architectures and their layer outputs:
| Approach | Decomposition Layers | Lighting Parametrization |
|---|---|---|
| Texture-aware Intrinsic Decomposition (Wang et al., 11 Sep 2025) | Implicit, all illumination in shading map | |
| Neural Illumination (Song et al., 2019) | Geometry, partial panorama, lighting map | Spherical panorama, LDR→HDR mapping |
| Plausible Shading Decomposition (Innamorati et al., 2017) | Albedo, occlusion, diffuse, specular | Layered sum, optional directional variant |
| FlowIID (Singla et al., 18 Jan 2026) | Albedo, shading | VAE-guided latent, single-step generation |
| Single-image Full-body Relighting (Lagunas et al., 2021) | Albedo, shading, specular, residual | Spherical harmonics, PRT transport |
4. Lighting Representation and Parameterization
Physical fidelity and relighting capabilities strongly depend on the adopted light representation:
- Spherical Gaussians (SG): Each pixel’s lighting is an RGB-weighted mixture of spatially-varying SG lobes, capturing high-frequency effects and accommodating complex occlusion and interreflection (Li et al., 2019).
- Spherical Harmonics (SH): Environmental lighting is encoded as a band-limited SH expansion, providing a compact low-frequency approximation especially suited for diffuse or moderately glossy reflectors (Lagunas et al., 2021, Zehni et al., 2021, Liu et al., 2020).
- Explicit Parametric Lights: For indoor scenes, fitting a discrete set of area-lights (each parameterized by direction, distance, size, and RGB intensity) plus ambient terms allows compositing under spatially-varying source distributions (Gardner et al., 2019).
- Shading Maps Only: Many decomposition approaches store all illumination and geometry effects within a dense shading map , supporting direct relighting via shadow and highlight editing, but generally lacking interpretable parameters for scene understanding (Wang et al., 11 Sep 2025, Yang et al., 2024).
5. Training Objectives, Datasets, and Evaluation
Loss functions combine reconstruction fidelity, physical consistency, and task-specific supervision:
- Fidelity and Regularization: Data terms penalize deviation from input image (), with additional priors (e.g., sparsity, total-variation, chromaticity smoothness) on decomposition layers (Wang et al., 11 Sep 2025, Lettry et al., 2018).
- Physical and Perceptual Constraints: When ground-truth intrinsics are absent, unsupervised constraints such as reflectance consistency, cross-relighting losses, and physical invariance under illumination swaps are leveraged (Yang et al., 2024).
- Self-Supervision via Relighting Consistency: Siamese or multi-lit training pairs enforce layer invariances and reconstruction under swapped or synthetic lighting (Liu et al., 2020, Zehni et al., 2021, Lettry et al., 2018).
- Benchmarks and Metrics: IIW (Intrinsic Images in the Wild, WHDR), MIT Intrinsic, ARAP, and custom real/synthetic datasets quantify performance under task-specific metrics: MSE, LPIPS, SSIM, or specialized measures such as local MSE (LMSE) and weighted human disagreement rate (WHDR) (Wang et al., 11 Sep 2025, Yang et al., 2024, Singla et al., 18 Jan 2026, Liu et al., 2018).
6. Applications and Limitations
The decomposition of a single image into lighting and material layers enables a spectrum of downstream tasks: photorealistic relighting, appearance-edit simulation, AR object compositing, and shadow or gloss manipulation (Innamorati et al., 2017, Lagunas et al., 2021). High spatial and chromatic fidelity in the shading map is critical for relighting realism. State-of-the-art methods demonstrate parameter efficiency (single-step inference), robustness to spatially-varying illumination and real-scene transferability (Singla et al., 18 Jan 2026). However, challenges remain in scenarios with multi-source colored lighting, extreme specularities, or non-Lambertian effects, where either the underlying physical model or the computational representation imposes intrinsic limitations (Liu et al., 2018, Lagunas et al., 2021).
7. Current Research Directions
Recent advances emphasize improved physical modeling, dataset scale and realism, and efficiency:
- Texture-aware Priors and Fine-grained Regularization: Novel priors guide decomposition to preserve high-frequency shading cues and decouple material and illumination-driven textures (Wang et al., 11 Sep 2025).
- Highly Efficient Single-step Models: Latent flow matching and VAE-based latent shape strategies support mobile and real-time inference with comparable quality to much larger diffusion-based or autoregressive counterparts (Singla et al., 18 Jan 2026).
- Self-Supervised and Physics-driven Frameworks: Exploiting relighting and invariance constraints in unsupervised or cross-domain settings enables performance without dense ground-truth labeling (Liu et al., 2020, Yang et al., 2024).
- Holistic Inverse Rendering: Joint estimation of shape, spatially-varying lighting, and material BRDFs (SVBRDF) from a single image supports photorealistic manipulation for AR and digital content creation (Li et al., 2019).
Open research questions include the handling of severe non-Lambertian phenomena, developing interpretable high-frequency parametric lighting, and scaling to outdoor and highly unconstrained settings (Gardner et al., 2019, Song et al., 2019). The availability of large-scale, physically annotated synthetic plus real-world datasets (e.g., ISR, RSR (Yang et al., 2024)) continues to play a crucial role in advancing both method generalizability and benchmarking rigor.