LightHarmony3D: Consistent 3D Object Insertion

Updated 3 July 2026

LightHarmony3D is a framework that harmonizes illumination by combining generative HDR environment map estimation with physically based rendering for realistic 3D object insertion.
It fuses multi-view 3D Gaussian Splatting reconstructions with geometry-aware shadow compositing to achieve seamless augmented reality and virtual staging applications.
The system achieves state-of-the-art performance on benchmarks, demonstrating improved PSNR, SSIM, and LPIPS metrics along with robust multi-view coherence and light transport accuracy.

LightHarmony3D is a framework for physically consistent object insertion into 3D Gaussian Splatting (3DGS) reconstructed scenes, capable of harmonizing illumination and shadows for augmented reality, virtual staging, and digital content creation. It innovates by combining generative HDR environment map estimation with physically based rendering (PBR) and geometry-aware shadow compositing, addressing long-standing challenges of achieving multi-view coherence and faithful light transport when integrating external mesh objects into reconstructed environments (Huang et al., 31 Mar 2026).

1. System Pipeline and Workflow

The LightHarmony3D pipeline comprises four primary stages which rigorously bridge 3DGS scene reconstructions with physically-plausible, novel-view synthesis supporting inserted 3D objects and cast shadows:

3DGS Reconstruction and Surface Extraction: The system operates on posed multi-view image datasets, utilizing the MILo algorithm to simultaneously optimize view-dependent Gaussian parameters and extract an explicit, textured triangle mesh. The output is a watertight mesh $M$ and dense 3DGS point clouds.
Panoramic Radiance Synthesis via Generative Underexposure: At the desired mesh insertion site, six orthographic LDR views are rendered and stitched into a base (EV $_0$ ) equirectangular panorama. This panorama is processed by a fine-tuned latent diffusion model (Flux.1 Kontext) to synthesize bracketed, underexposed panoramas (e.g., EV $_{-3}$ , EV $_{-6}$ ), capturing high-intensity, localized illumination.
HDR Environment Map via Exposure Fusion: LDR panoramas at different exposure values are converted to linear radiance $E_i(p)$ , then fused iteratively (from darkest to base-exposure) for HDR recovery:

$\tilde{L}_i(p) = \mathbf{w}^\top I_i(p)^\gamma,\quad E_i(p)=\tilde{L}_i(p) \cdot 2^{-\mathrm{EV}_i},$

$E'_i(p)= \begin{cases} E'_{i-1}(p), & \tilde{L}_i(p) > 0.9, \ E_i(p), & \text{otherwise}, \end{cases} \quad \mathrm{HDR}(p)=\frac{E'_N(p)}{\tilde{L}_N(p)}\,I_N(p)^\gamma$
Physically Based Rendering and Compositing: Mesh-object illumination and shadowing are generated using ray-decoupled visibility—making the reconstructed mesh transparent to camera rays but opaque to shadow rays:

$\mathcal I(\omega)= \begin{cases} 1,&\tau(\omega)\in\{\mathrm{Shadow},\mathrm{Diffuse}\}, \ 0,&\tau(\omega)\in\{\mathrm{Camera},\mathrm{Glossy}\ldots\}, \end{cases}$

Synthetic renders (background only $R_0$ , object+background $R_1$ , object only $_0$ 0) yield a per-channel shadow ratio and shaping:

$_0$ 1

The final composite in linear space is $_0$ 2, followed by gamma correction to sRGB.

2. Generative HDR Lighting Module (GenEnvLighting)

The cornerstone of illumination harmonization in LightHarmony3D is GenEnvLighting—a latent diffusion-based generative model designed to predict full 360° HDR environment maps at insertion sites using a single forward pass. Key characteristics:

Conditioning: Takes an encoded base LDR panorama (EV $_0$ 3) and a fixed text prompt (“underexpose scene” in DreamBooth style).
Architecture: Incorporates a pre-trained Flux.1 Kontext U-Net backbone, augmented with Low-Rank Adaptation (LoRA) modules in cross-attention and MLP layers. LoRA adaptation is parameterized as:

$_0$ 4

Generative Priors: The model is trained to radiometrically truncate diffuse radiance and enhance specular emitters, isolating illumination sources without explicit semantic segmentation.
Output: Delivers a cascade of underexposed latents, enabling accurate HDR fusion.

This framework contrasts with iterative, per-insertion-location inverse rendering schemes by providing efficient, global illumination inference and guaranteeing temporal and spatial consistency.

3. Multi-View Consistency, PBR, and Shadow Integration

LightHarmony3D enforces multi-view harmonization through both its generative lighting approach and explicit rendering pipeline:

Single-Pass 360° Illumination: The GenEnvLighting module ensures that all mesh insertions leverage a consistent, spatially verified environment lighting field, eliminating view-to-view appearance drift.
Ray-Decoupled PBR: Physically based shading leverages a modified path tracer (e.g., Blender Cycles) where the scene mesh is transparent to camera rays (preserving visual context) but fully participates in shadow occlusion, ensuring correct contact shadows for inserted geometry.
Linear-Space Shadow Ratio Compositing: Shadow modulation occurs explicitly per channel and per pixel, refining shadow edges and depth while compositing onto high-frequency 3DGS imagery to conserve photorealistic details.
Geometry-Aware Shadowing: The system maintains hybrid Gaussian–mesh scene representations, extracting detail from 3DGS while using mesh proxies for accurate light transport and occlusion effects.

4. Quantitative Benchmarks and Comparative Results

LightHarmony3D introduces and evaluates on new benchmarks tailored for mesh insertion in 3DGS, using standardized metrics (PSNR↑, SSIM↑, LPIPS↓, and reference-free VQA):

Dataset	Method	PSNR↑	SSIM↑	LPIPS↓	VQA Pos.↑	VQA Neg.↓	VQA Ratio↑
LH3D-Ku	GIGS	17.33	0.698	0.334	-	-	-
(synthetic)	GaussianEditor	21.56	0.812	0.215	-	-	-
	MV-CoLight	15.99	0.747	0.256	-	-	-
	GaSLight	20.41	0.812	0.224	-	-	-
	LightHarmony3D	24.03	0.832	0.200	-	-	-
LH3D-Blender	GIGS	20.11	0.690	0.406	-	-	-
(synthetic)	GaussianEditor	20.68	0.696	0.357	-	-	-
	MV-CoLight	19.26	0.642	0.321	-	-	-
	GaSLight	19.17	0.626	0.399	-	-	-
	LightHarmony3D	23.99	0.744	0.335	-	-	-
Mip-NeRF360	3DGS	-	-	-	0.351	0.400	0.501
(real)	GaSLight	-	-	-	0.472	0.541	0.457
	GIGS	-	-	-	0.261	0.276	0.576
	MV-CoLight	-	-	-	0.339	0.421	0.387
	LightHarmony3D	-	-	-	0.528	0.208	0.751

In all evaluated regimes, LightHarmony3D achieves state-of-the-art realism and perceptual naturalness, most notably obtaining the highest perceptual-alignment ratio of 0.751 on Mip-NeRF360 (Huang et al., 31 Mar 2026).

5. Integration with Harmonics Virtual Lights and SH-based Rendering

The LightHarmony3D pipeline readily supports advanced SH-based illumination mechanisms such as Harmonics Virtual Lights (HVL) (Mézières et al., 2022), which facilitate analytic, band-limited, closed-form indirect lighting evaluation in dynamic 3D scenes. HVLs allow efficient modeling of one-bounce global illumination compatible with both analytic and measured BRDFs, with computational complexity tunable via SH band limits. For rapid, artifact-free rendering of medium-frequency light transport, LightHarmony3D users can trade off SH bands for rendering quality with minimal implementation friction, supporting interactive and real-time editing scenarios.

6. Qualitative Evaluation, User Studies, and Ablations

Qualitative outcomes reported in LightHarmony3D include multi-view consistent object insertions, physically plausible light direction, colored shadow reproduction, and stable appearance across camera trajectories. Comparative visualizations show correct highlight orientation, natural absence of compositing halos, and preservation of scene details under shadowing. An ablation study confirms the necessity of each system component: HDR fusion, shadow modulation, and ray-decoupled shading. User studies employing VQA metrics on real-capture datasets consistently favor LightHarmony3D over all baselines for perceived realism and coherence.

7. Relation to Contemporary and Prior Work

LightHarmony3D advances beyond previous 3DGS-based harmonization methods—such as GIGS, GaussianEditor, and GaSLight—by explicitly addressing HDR environment inference, PBR-integrated shadow synthesis, and robust multi-view compositing. Compared to transformer-based harmonization pipelines like MV-CoLight (Ren et al., 27 May 2025), which rely on learned lighting and shadow priors without direct physical light transport modeling, LightHarmony3D leverages a generative approach for global illumination and explicit PBR evaluation for inserted mesh geometry. The unified pipeline supports emerging applications in AR/VR, virtual staging, and machine-augmented content design, providing a standardized benchmark and evaluation protocol for this domain (Huang et al., 31 Mar 2026).