Generative Image-Based Light Decomposition
- Generative image-based light decomposition is a computational technique that separates observed images into intrinsic illumination, reflectance, and other physical components using generative models.
- It integrates image formation physics with deep generative frameworks like GANs, diffusion models, and VAEs to enable accurate relighting, editing, and restoration in imaging and graphics.
- The approach leverages joint minimization of reconstruction errors and constrained latent optimization, achieving high fidelity results validated through standard benchmarks.
Generative image-based light decomposition refers to the class of computational methods that split observed images into underlying illumination and material (and potentially additional) components using generative modeling frameworks. The decomposition exploits image formation physics, data-driven priors, and increasingly, deep implicit generative models (including GANs, diffusion models, and VAEs) to yield illumination, reflectance, and other intrinsic factors, enabling physically interpretable relighting, editing, and restoration in computational imaging, vision, and graphics.
1. Image Formation Models and Decomposition Approaches
The canonical model for light decomposition is the Retinex/“intrinsic image” equation: where is the observed RGB image, is the spatially-varying albedo or reflectance, and is shading, representing light and geometric effects. Extensions incorporate specular components, illumination maps, additive shadow/lighting terms, or higher-dimensional light field or 3D radiance field variables.
Generative decomposition in this context entails learning or specifying priors for the reflectance and illumination maps, and optimizing or sampling their values so that their forward combination closely matches the observation—a paradigm enabled by deep generative models and advanced inverse imaging techniques.
Recent instantiations include:
- Joint GAN inversion per intrinsic component (Shah et al., 2023)
- Diffusion-based conditional generative decomposition (Yi et al., 2023, Jiang et al., 2024, Zeng et al., 2024)
- Variational autoencoder architectures per-layer, optimized over latent codes (Rock et al., 2016)
- Transformer- and CNN-based methods employing reconstruction and auxiliary physics-supervised terms (Meinardus et al., 2023, Weligampola et al., 2021, Shi et al., 2019)
For multi-view or 3D scenes (e.g., Gaussian Splatting or NeRF), the models decompose radiance into space- and direction-dependent terms such as albedo, roughness, normal, irradiance, direct sun, sky, and indirect light, encoded and composed by neural or parametric forms (Choi et al., 2022, Du et al., 2024, Bai et al., 28 Jul 2025, Liang et al., 21 Jan 2026).
2. Generative Priors and Learning Mechanisms
Rigid physics-based priors alone—such as smoothness of illumination and invariance of reflectance—are often insufficient due to ill-posedness and noise amplification. Recent systems integrate:
- GANs trained on large datasets per intrinsic channel, enabling code inversion constrained to plausible spaces (Shah et al., 2023)
- Conditional diffusion priors trained for denoising and reconstruction fidelity in both image and latent spaces. Notable systems (e.g., Diff-Retinex, LightenDiffusion, RGBX) couple a physics-consistent decomposition stage with a generative latent diffusion model operating on reflectance and illumination maps, producing high-quality image restoration and synthesizing plausible missing content (Yi et al., 2023, Jiang et al., 2024, Zeng et al., 2024)
- VAE-like models for jointly encoding and regularizing albedo, shading, and shading-detail layers, allowing flexible decomposition by proxy dataset “Platonic” ideals and code-based optimization (Rock et al., 2016)
These generative approaches address ambiguities in decomposition by leveraging the powerful image statistics captured in discriminative and probabilistic generative networks, learning mappings from images or features to their component “layers” even in the absence of paired ground-truth decompositions.
3. Algorithmic and Optimization Strategies
The optimization underlying generative light decomposition typically involves:
- Joint minimization of reconstruction error between recomposed maps (from reflectance/shading/lighting/specular, possibly in log or gradient space) and the observed image under the chosen generative image formation model (Shah et al., 2023, Garces et al., 2016)
- Constrained optimization over GAN or VAE latent codes, augmented with regularization to ensure outputs remain on the manifold (kNN penalties, code priors, etc.) (Shah et al., 2023, Rock et al., 2016)
- Multi-objective training for physics-model consistency (e.g., smoothness penalties, reflectance-edge preservation, structural consistency between decomposed maps, compositionality at multiple scales) (Weligampola et al., 2021, Meinardus et al., 2023, Bai et al., 28 Jul 2025)
- Alternating between global and local solver stages (e.g., 4D TV- flattening, per-view Retinex, ADMM for parallelism in light field decomposition) (Garces et al., 2016)
- Diffusion process training via denoising score matching in latent or image space, with conditioning on content or light features, and self-constrained consistency losses to prevent content leakage between illumination and reflectance (Yi et al., 2023, Jiang et al., 2024, Zeng et al., 2024)
- Multi-branch architectures (e.g., transformer + U-Net, separate heads for reflectance/shading/occlusion/light), sometimes staged via curriculum pretraining, pseudo-labels, or regularization (Meinardus et al., 2023)
For 3D or multi-view, the optimization includes explicit per-light and ambient decomposition, ray-tracing or splatting for visibility and shadow simulation, and deferred physically-based rendering (PBR) with jointly optimized environment maps, Spherical Gaussians, and material parameters (Du et al., 2024, Bai et al., 28 Jul 2025, Liang et al., 21 Jan 2026).
4. Decomposition Extensions: Lighting, Geometry, and Material Channels
Advanced generative decomposition approaches move beyond reflectance/shading to include:
- Specular highlights and residuals via explicit generative or neural modeling, supporting relighting and material editing (Shah et al., 2023, Choi et al., 2022, Zeng et al., 2024)
- Multiple per-channel decomposition: normal, roughness, metallicity, irradiance (diffuse), environmental and direct lighting, and ambient occlusion (Choi et al., 2022, Zeng et al., 2024, Du et al., 2024)
- Decomposition into explicit light sources (OLAT/ambient layers, customizable per-light chromaticity and intensity) for relightable and editable 3D gaussian splatting models (Liang et al., 21 Jan 2026, Du et al., 2024, Bai et al., 28 Jul 2025)
- Detailed generative decoupling and transfer of complex light effects (e.g., flares, rainbows, lens artifacts) via control branches (ControlNet), multi-branch diffusion setups, and carefully curated reference-content-light triplet datasets (Li et al., 20 Aug 2025)
Many approaches introduce pseudo-labels or proxy datasets (e.g., “Platonic” Mondrian for albedo, rendered shapes for shading) for unsupervised, semi-supervised, or curriculum-based training when true decompositions are not available (Rock et al., 2016, Meinardus et al., 2023).
5. Quantitative Evaluation and Empirical Results
Methodologies are validated via:
- Standard low-light and relighting benchmarks: LOL, VE-LOL-L, DICM, SICE, Hypersim, InteriorVerse (Yi et al., 2023, Zeng et al., 2024, Shi et al., 2019)
- Metrics: PSNR, SSIM, LPIPS (for perceptual similarity), FID/Light-FID for dataset-level statistics, NIQE and PI for “naturalness” and perceptual index in unsupervised enhancement (Yi et al., 2023, Li et al., 20 Aug 2025, Jiang et al., 2024)
- Ablation studies: comparison of decomposition network architectures, generative priors, additional loss terms, and channel encoding/fusion strategies (Yi et al., 2023, Meinardus et al., 2023, Li et al., 20 Aug 2025)
- User studies for subjective assessment of relighting, light transfer, content fidelity, and lighting plausibility (Li et al., 20 Aug 2025)
- Qualitative analysis: Lambertian assumption validity, specular artifact suppression, relighting and editing robustness, per-light control in multi-light and 3DGS relightable environments (Liang et al., 21 Jan 2026, Bai et al., 28 Jul 2025, Du et al., 2024)
Empirical results demonstrate that generative approaches:
- Outperform classical and feedforward supervised methods in both fidelity and flexibility (e.g., lower FID, higher PSNR/SSIM, better LPIPS)
- Enable user-interactive relighting, shadow manipulation, and spatial repositioning of light effects
- Provide real- or near-real-time inference for 3D relightable scenes, supporting scalable and user-interactive graphics pipelines
6. Limitations and Open Challenges
While generative image-based light decomposition offers notable advances, key limitations and open research directions include:
- Dependency on large, synthetic, or proxy datasets for training; bias toward common materials and scene types (Zeng et al., 2024)
- Failure modes in challenging out-of-distribution or non-Lambertian scenarios (e.g., glass, subsurface scattering), due to the lack of explicit BRDF or geometry modeling (Shah et al., 2023, Zeng et al., 2024)
- Ambiguity and instability in global versus local decomposition without adequate regularization or physics-aware constraints
- Performance degradation at high resolutions, with partial channel/missing data, or in scenes with dense source interactions (Zeng et al., 2024)
- Difficulty in generalizing diffusion/GAN priors to arbitrary or highly diverse datasets (e.g., real outdoor scenes, mixed illumination)
- Trade-off between computational cost (especially of iterative code inversion or joint optimization) and inference efficiency; diffusion and transformer-based systems may require seconds per view/image versus fast feed-forward networks (Shah et al., 2023, Yi et al., 2023)
Potential future progress areas comprise exploration of learned priors for additional intrinsic channels (normals, depth, transparency), improved joint optimization for multi-modal imaging, per-pixel and 3D scene-level decomposition for complex illumination, further integration with physically-based rendering, and enhanced control over recombination and editing of decomposed light effects.
7. Representative Methods and Frameworks
| Approach/Framework | Generative Prior | Key Decomposition Channels |
|---|---|---|
| JoIN (Shah et al., 2023) | GAN bank + joint inversion | Albedo, shading, (specular) |
| RGBX (Zeng et al., 2024) | Latent diffusion | Albedo, normal, roughness, metallicity, irradiance |
| Diff-Retinex (Yi et al., 2023) | Physics + conditional diffusion | Reflectance, illumination |
| LightenDiffusion (Jiang et al., 2024) | Latent Retinex + diffusion | Reflectance, illumination |
| Conv-VAE (Rock et al.) (Rock et al., 2016) | Per-layer VAEs on Platonic datasets | Albedo, shading, shading-detail |
| GS-ID (Du et al., 2024) | Intrinsic-diffusion, SG lights | Albedo, material params, normal, occlusion, env-map, SG lights |
| GaRe (Bai et al., 28 Jul 2025) | MLPs on 3DGS, region losses | Reflectance, sun, sky, indirect, visibility |
| LuxRemix (Liang et al., 21 Jan 2026) | Diffusion transformers, multi-view U-Nets | Per-light and ambient HDR layers in 3DGS |
| TransLight (Li et al., 20 Aug 2025) | Two diffusion-based U-Nets | Content, light effect |
| IBL-NeRF (Choi et al., 2022) | Neural “images” via NeRF MLP | Albedo, normal, roughness, irradiance, spec radiance |
These exemplify the spectrum of recent generative light decomposition methodologies, spanning 2D, multi-view, and 3D scene contexts, and leveraging generative priors for flexible, interpretable, and physically-consistent separation of image-based illumination and content channels.