LightHarmony3D: Harmonizing Illumination and Shadows for Object Insertion in 3D Gaussian Splatting
Abstract: 3D Gaussian Splatting (3DGS) enables high-fidelity reconstruction of scene geometry and appearance. Building on this capability, inserting external mesh objects into reconstructed 3DGS scenes enables interactive editing and content augmentation for immersive applications such as AR/VR, virtual staging, and digital content creation. However, achieving physically consistent lighting and shadows for mesh insertion remains challenging, as it requires accurate scene illumination estimation and multi-view consistent rendering. To address this challenge, we present LightHarmony3D, a novel framework for illumination-consistent mesh insertion in 3DGS scenes. Central to our approach is our proposed generative module that predicts a full 360° HDR environment map at the insertion location via a single forward pass. By leveraging generative priors instead of iterative optimization, our method efficiently captures dominant scene illumination and enables physically grounded shading and shadows for inserted meshes while maintaining multi-view coherence. Furthermore, we introduce the first dedicated benchmark for mesh insertion in 3DGS, providing a standardized evaluation framework for assessing lighting consistency and photorealism. Extensive experiments across multiple real-world reconstruction datasets demonstrate that LightHarmony3D achieves state-of-the-art realism and multi-view consistency.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What is this paper about?
This paper shows a way to place a new 3D object into a captured 3D scene so it looks like it truly belongs there—same lighting, same shadows, and consistent from every camera angle. The method is called LightHarmony3D. It works with a popular scene format called “3D Gaussian Splatting” (think of a scene made of millions of tiny glowing dots that together look like a real place) and standard 3D “meshes” (the solid 3D models used in games and movies).
What questions did the researchers ask?
In simple terms, they asked:
- How can we figure out the real lighting in a 3D scene so a new object looks naturally lit?
- How can we make the object’s shadows fall correctly on the scene, without breaking the original look?
- How can we keep the object looking consistent from different viewpoints, not changing brightness or direction of light as the camera moves?
- Can we do all this quickly, without slow, heavy optimization for each scene?
How did they do it?
They built a pipeline with three main parts. Here’s the idea using everyday analogies.
1) Rebuilding the scene so physics can work
- Starting from multiple photos of a place, they reconstruct the scene in 3D using “3D Gaussian Splatting” and also extract a clean triangle mesh (like turning a point cloud into a solid shell).
- Why both? The Gaussian points look great and render fast, but the mesh gives clear surfaces for physical light and shadow simulation, like in modern video games.
2) Figuring out the scene’s lighting with a 360° “light dome”
- Imagine standing where you want to put the new object and looking around with a 360° camera. That panoramic image becomes a “light dome” that tells you from which directions light comes.
- But normal images can’t capture very bright lights well. So they train an AI (a “diffusion model,” basically a very smart image editor) to “underexpose” the panorama—like dimming the photo so only the brightest light sources remain visible.
- They generate several increasingly dim versions and then combine them with the original panorama to build a High Dynamic Range (HDR) map. HDR maps store both very bright and very dark light correctly—perfect for realistic shading and reflections.
- This whole step runs in a single quick pass of the AI model, rather than long, complicated optimization.
3) Making the object fit the light and cast the right shadows
- They use a physically based renderer (PBR), the same kind of lighting math behind today’s realistic games and movies, to light the new object with the HDR map.
- Indoor scenes are often “closed shells” (walls, ceilings). That can block outside light. To fix this, they use a clever trick: to the camera, the walls can be treated as see-through so light can “enter,” but to shadow rays they are solid so shadows still look right. Think of it as the walls being invisible only for the parts of the calculation that let light in.
- Finally, they compute a “shadow ratio map” (a per-pixel dimming mask), which slightly darkens the original scene where the new object’s shadow should fall. This keeps the original scene’s details and colors intact and avoids double shadows or washed-out areas.
4) Building a fair test
- They also created a new benchmark dataset with ground-truth images (with and without the inserted object) so they can measure how realistic and consistent the results are. This is important because existing datasets didn’t have the exact pairs needed to test shadows and lighting accurately.
What did they find?
In tests on both synthetic and real scenes, LightHarmony3D:
- Produced more realistic lighting and shadows for inserted objects than other methods.
- Kept the object’s appearance consistent across many viewpoints (no flickering or changing light directions).
- Scored higher on common image quality metrics and on a vision-language “realism” score that checks if the result looks natural without needing a ground-truth reference.
They also ran ablation studies (turning off pieces of their system) and showed each part—HDR fusion, the special visibility trick, and the shadow ratio compositing—was needed for the best results.
Why does this matter?
- Better AR/VR and virtual staging: You can drop a virtual chair into a real room and it will look like it truly belongs there.
- Faster creative workflows: Because the lighting is estimated with a single AI pass, it’s more efficient than older methods that need heavy per-scene optimization.
- Consistent results: Multi-view coherence is crucial for 3D experiences. Users can walk around a scene and the object will keep looking right.
Final takeaway
LightHarmony3D combines three strengths—accurate scene rebuilding, smart AI-based lighting estimation, and physically based shadowing—to make inserted 3D objects look naturally lit and grounded. It’s a step toward simpler, faster, and more reliable 3D editing for games, films, AR/VR, and digital content creation. The authors note that results still depend on how good the scene reconstruction is, and some complex lighting situations remain challenging, but the approach already sets a strong new standard.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a consolidated list of concrete gaps that remain unresolved and can guide future work:
- Absolute radiometric calibration: No mechanism to align the HDR environment map’s absolute intensity with the 3DGS scene’s (often non-linear) radiometry; effects of exposure/white-balance mismatch between 3DGS renders and Blender’s PBR pipeline are unquantified and uncontrolled.
- Additive light transport is missing: The multiplicative, per-channel shadow ratio is clipped to [0,1], preventing positive deltas such as indirect brightening, color bleeding from the inserted object, specular glints on receivers, or fill light—only darkening (attenuation) is injected.
- Specular/reflective and transparent interactions are not modeled: The receiver is treated effectively as diffuse; mirror/glossy reflections of the inserted object, transmission, and caustics on scene surfaces are not composited back into the 3DGS background.
- Volumetric and participating media are unsupported: Shadows and scattering in fog/smoke, subsurface transport, or translucent materials are not handled in either HDR estimation or compositing.
- Near-field and non-environment lighting: The environment map approximation cannot capture localized luminaires (e.g., lamps, LEDs) and near-field shadow penumbrae; no mechanism exists to recover or instantiate explicit area lights.
- Spatially varying illumination fields: A single environment map at the insertion point cannot represent strong spatial lighting variation across large scenes or for moving objects; no strategy for multi-probe estimation, interpolation, or consistency constraints across space.
- Temporal/dynamic lighting: Illumination is assumed static; there is no approach for handling time-varying lights or dynamic scenes, nor for updating HDR maps on-the-fly with stability guarantees.
- Geometry dependence and uncertainty: The approach relies on MILo mesh accuracy; there is no uncertainty-aware shadow compositing, geometry refinement during insertion, or robustness to missing/thin structures and misalignments at boundaries.
- Occlusion fidelity limits: While occlusion by the scene mesh is accounted for in rendering, thin/fine structures (foliage, wires) not captured by the proxy mesh can cause haloing or shadow leakage; no method is provided to reconcile fine 3DGS alpha/opacity with mesh-based PBR for shadows.
- Ray-decoupled shader generality: The custom shader’s physical validity, energy conservation, and portability across renderers are not analyzed; failure modes under diverse path types (e.g., glossy, transmission) remain unexplored.
- Generative illumination priors and data bias: The diffusion model is fine-tuned on ~800 PolyHaven HDRs with synthetic underexposure supervision; generalization to out-of-distribution conditions (e.g., neon, multi-colored LEDs, extreme HDR ranges, complex indoor emissives) is untested.
- Uncertainty and confidence in lighting predictions: The method lacks uncertainty estimates for HDR map predictions, confidence-aware fusion, or mechanisms to fall back to optimization when generative predictions are unreliable.
- Scale and color consistency across engines: There is no explicit color-management pipeline ensuring consistent tone curves and color spaces between 3DGS (often non-physically based sRGB) and Blender’s linear/PBR workflow.
- Evaluation limitations: The new benchmark is synthetic and relatively small; there are no real-world, measured ground-truth insertions (e.g., with calibrated HDR probes) and no task-specific metrics for shadow realism, shadow directionality, or interreflection accuracy.
- Runtime and interactivity: Path tracing introduces notable cost; real-time or near-real-time insertion for AR/VR use is not demonstrated, and scalability to high-resolution multi-view sequences is not characterized.
- Material realism and harmonization: Inserted meshes require known PBR materials; there is no method to estimate or adapt object BRDFs to match scene lighting and camera response, nor to decompose the receiver’s albedo from baked shading in 3DGS-derived textures.
- Handling extreme dynamic range: The exposure fusion uses a fixed bracket (e.g., down to EV−6) and fixed thresholds; behavior under ultra-bright light sources (sun, stage lights) and adaptive bracketing strategies are not explored.
- Depth-of-field and motion blur: Compositing does not account for scene camera DOF or motion blur mismatch between 3DGS renders and PBR inserts; temporal artifacts and edge consistency under DOF are not addressed.
- Multi-object interactions beyond shadows: While multiple objects can be inserted, mutual interreflections and complex light exchanges among inserted objects are not injected back into the 3DGS background (again limited by multiplicative ratio).
- Occluder visibility through semi-transparent media: Cases where an object sits behind glass or within refractive media (and should appear with correct distortions and Fresnel effects) are not supported by the current compositing strategy.
- Probe placement and selection: The method assumes a user-specified insertion location; there is no automatic strategy to choose optimal probe positions, density, or orientation to minimize lighting error for large or articulated objects.
- Robustness to sparse/limited reconstruction: Performance when 3DGS training views are sparse or biased (e.g., limited coverage, strong speculars, miscalibrated cameras) is not quantified; failure modes and mitigation (e.g., lighting regularizers) are open.
- Integration with 3DGS updates: Insertion does not update the Gaussian field; how to recondition Gaussians (colors, opacities) to remain consistent with new light transport or to avoid conflicts with view-dependent effects is left unexplored.
- Automatic parameter tuning: Shadow shaping parameters (γ, s_min, λ) and luminance thresholds (e.g., 0.9) are fixed; there is no learning-based or data-driven approach to adapt them per scene to avoid under/over-shadowing.
- Multi-view, multi-location consistency: When multiple insertion points are used, there is no joint optimization ensuring HDR maps are mutually consistent with a global lighting model; cross-probe coherence and constraints are absent.
- Failure diagnosis and editing tools: The system lacks diagnostic feedback (e.g., predicted light directions/intensities) and interactive controls to correct mispredicted light sources or fine-tune environment components with user guidance.
Practical Applications
Overview
LightHarmony3D introduces a practical pipeline to insert explicit mesh objects into 3D Gaussian Splatting (3DGS) scenes with physically consistent shading and shadows. Key innovations include:
- GenEnvLighting: a diffusion-based module that predicts 360° HDR environment maps at the insertion site via exposure-bracketed underexposure synthesis and HDR fusion.
- A hybrid Gaussian–mesh reconstruction (via MILo) to provide explicit geometry for light transport and visibility.
- Ray-decoupled visibility shaders that let environment light penetrate enclosed reconstructions while preserving correct shadow reception.
- PBR-guided, linear-color “shadow ratio” compositing that injects physically plausible, colored cast shadows into 3DGS renderings.
- LH3D-Bench: the first dedicated benchmark for mesh insertion in 3DGS, with tools (e.g., Blender-to-COLMAP export) for reproducible evaluation.
Below are concrete, real-world applications grouped by immediacy, with sectors, outputs, and dependencies spelled out.
Immediate Applications
The following can be deployed now with current desktop/cloud pipelines and standard DCC tools (e.g., Blender Cycles), given sufficient multi-view captures and compute.
- Virtual staging for real estate and interior design (Real estate, Architecture, E‑commerce furniture)
- What it enables: Photorealistic placement of furniture/fixtures into room scans with view-consistent lighting and cast shadows; improved buyer visualization and conversion.
- Tools/workflow: Phone/camera sweep → 3DGS+mesh via MILo → GenEnvLighting HDR → PBR render object + shadow ratio compositing → export images/tours.
- Assumptions/dependencies: Adequate multi-view coverage; reasonably accurate mesh extraction; GPU for PBR renders; static scene.
- VFX previs and post for set extensions and props (Film/TV/VFX, Virtual production)
- What it enables: Fast, physically plausible CG prop previews inside photogrammetry/3DGS set scans; consistent shadows without full inverse rendering.
- Tools/workflow: On-set scan or photogrammetry → 3DGS reconstruction → HDR env estimation → PBR renders composited over 3DGS plates.
- Assumptions/dependencies: Good geometry in regions receiving shadows; manageable path-tracing times; static or slow-changing lighting.
- Product visualization in user environments (E‑commerce, Advertising)
- What it enables: Brand assets inserted into customer-captured spaces (e.g., appliances, decor) with realistic lighting for campaign creatives or configurators.
- Tools/workflow: Customer captures a short sweep → cloud 3DGS reconstruction → single-pass HDR → batch multi-view renders for galleries/AR previews.
- Assumptions/dependencies: Cloud GPUs; privacy-safe media handling; consistent capture quality across customers.
- Scalable synthetic data with realistic shadows for vision tasks (Robotics, CV/ML)
- What it enables: Generation of labeled multi-view data where inserted objects cast correct, colored shadows and interact with scene occlusions—beneficial for detection, segmentation, and shadow-aware perception.
- Tools/workflow: Curate scans → batch insertion of varied assets/materials → render multi-view frames and annotations (masks, depth, normals).
- Assumptions/dependencies: Static scenes; domain fit between synthetic lighting distributions and target tasks.
- Interactive AR/VR scene editing (XR content creation, Games/UGC)
- What it enables: Offline or near-real-time placement of assets in captured spaces for machinima, XR experiences, and UGC with consistent multi-view appearance.
- Tools/workflow: Desktop tool or cloud backend using the LightHarmony3D pipeline; export frames or precomputed textures for XR engines.
- Assumptions/dependencies: Latency acceptable for offline or near-real-time edits; PBR asset materials available.
- Museum/cultural heritage reconstructions (Cultural heritage, Education)
- What it enables: Inserting digitized artifacts into scanned rooms/galleries with lighting faithful to the venue for exhibits and scholarship.
- Tools/workflow: Scan venue → 3DGS+mesh → insert artifact meshes → HDR-guided compositing → educational renders/VR tours.
- Assumptions/dependencies: High-fidelity captures (controlled access); static exhibit lighting.
- Lighting/compositing education (Education, Training)
- What it enables: Teaching physically based lighting, HDR, and compositing using a controlled pipeline where HDR estimation and shadows are explainable and tunable.
- Tools/workflow: Course exercises leveraging LH3D-Bench scenes and the shadow-ratio parameters (γ, s_min, λ) for hands-on learning.
- Assumptions/dependencies: Access to GPUs; familiarity with Blender/Cycles or equivalent.
- R&D benchmarking and reproducibility (Academia, Industry research)
- What it enables: Objective comparison of insertion methods with pixel-accurate multi-view ground truth and a reference-free VQA protocol.
- Tools/workflow: Use LH3D-Bench, Blender-to-COLMAP export, and VQAScore prompts; integrate into CI pipelines for model evaluation.
- Assumptions/dependencies: Community adoption; consistent metric reporting; release of code/checkpoints.
- Asset pipelines and DCC integrations (Software tools)
- What it enables: Plugins for Blender or DCCs that wrap GenEnvLighting, ray-decoupled shaders, and shadow-ratio compositing into a “Insert Object into 3DGS” operator.
- Tools/workflow: Add-ons that import 3DGS reconstructions (via MILo/point cloud), run HDR generation, and produce composites in one panel.
- Assumptions/dependencies: Access to MILo-like hybrid recon; path tracer with ray-type hooks (Cycles, Mitsuba).
Long-Term Applications
These require additional research, engine support, or system integration (e.g., real-time constraints, dynamic scenes, sparse capture).
- Real-time mobile AR with lighting-consistent insertions (AR consumers, Retail)
- What it enables: On-device or edge-assisted 3DGS reconstruction and single-pass HDR prediction to render insertions with correct shadows live.
- Tools/workflow: 3DGS on-device + fast HDR estimation + real-time PBR (mobile RT cores) + lightweight ray-decoupled visibility in engine.
- Assumptions/dependencies: Mobile hardware acceleration; real-time ray-type control in AR engines; robust results from sparse/quick captures.
- Game engine integration for live neural scene editing (Games, XR engines)
- What it enables: Unity/Unreal plugins that import neural reconstructions and support asset insertion with ray-decoupled visibility and HDR lighting at runtime.
- Tools/workflow: Neural scene renderer interop (3DGS/NeRF proxies), engine ray/path tracing APIs, GPU-resident HDR predictors.
- Assumptions/dependencies: Mature neural rendering components in engines; real-time PBR with consistent ray semantics; memory budgets.
- On-set virtual production and LED volume control (Film/TV)
- What it enables: Instant HDR estimation from a scan location to drive both CG insertions and LED wall lighting for tighter CG–plate match.
- Tools/workflow: Fast scan/update → HDR feeds → bidirectional sync between CG renders and stage lighting.
- Assumptions/dependencies: Low-latency pipelines; calibration between estimated HDR and physical light rigs; dynamic scene adaptation.
- Autonomous robotics training-in-the-loop (Robotics, Simulation)
- What it enables: Capture-real–augment–retrain cycles where objects/tools are inserted in workplace scans with faithful shadows/occlusions; improves robustness to real lighting.
- Tools/workflow: Automated capture stations, cloud training, domain randomization over asset materials and HDR cues.
- Assumptions/dependencies: Automated quality checks on geometry; efficient batching; alignment with robot sensor models.
- Scene relighting and editing beyond insertion (Software, Creative tools)
- What it enables: Using GenEnvLighting and HDR fusion as a building block for global scene relighting, light editing, and interactive what-if illumination design.
- Tools/workflow: Extend pipeline to modulate inferred HDR, re-render 3DGS scenes with proxy meshes as shadow catchers.
- Assumptions/dependencies: Stable coupling between implicit Gaussians and explicit PBR render for global edits; better handling of complex indirect light.
- Dynamic scenes and deformables (XR, VFX)
- What it enables: Insert moving/deforming assets (e.g., characters) with time-consistent shadows and lighting as the camera moves.
- Tools/workflow: Temporal HDR stabilization, per-frame ray-decoupled visibility, motion-aware compositing.
- Assumptions/dependencies: Temporal consistency of 3DGS and mesh proxies; efficient multi-frame rendering; motion blur support.
- Web-scale “insertion-as-a-service” platforms (Cloud services, Marketplaces)
- What it enables: APIs that accept a short video sweep plus a mesh and return multi-view images/videos with consistent lighting; marketplaces for staging.
- Tools/workflow: Managed 3DGS training, HDR estimation, and PBR rendering at scale; cost-optimized GPU orchestration.
- Assumptions/dependencies: Robust batching and QA; handling diverse capture quality; content rights management.
- Standards and policy for synthetic content disclosure (Policy, Trust & Safety)
- What it enables: Benchmarks and metrics (e.g., LH3D-Bench + VQAScore) inform standards for evaluating photorealistic insertions; guidelines for disclosure/watermarking where photorealism can affect consumer decisions (ads/real estate).
- Tools/workflow: Metric suites, dataset curation guidelines, model cards detailing failure cases (e.g., sparse capture, indirect lighting).
- Assumptions/dependencies: Multi-stakeholder adoption; integration with provenance (C2PA) and watermarking schemes.
Key Dependencies and Assumptions (Cross-cutting)
- Multi-view capture quality: Coverage and parallax heavily influence 3DGS quality and mesh accuracy; thin structures and reflective surfaces are challenging.
- Static scenes: Pipeline assumes static geometry/lighting during capture; dynamic content needs temporal extensions.
- PBR materials for assets: Inserted meshes must have reasonable BRDFs; mismatched materials limit realism.
- Compute budget: HDR generation is fast, but PBR rendering and high-res multi-view outputs require GPUs; real-time variants need hardware ray tracing and optimized shaders.
- Engine support for ray types: Ray-decoupled visibility relies on path tracer ray-type controls; not all engines expose equivalent APIs.
- Illumination priors and dataset fit: GenEnvLighting trained on ~800 HDRs; rare lighting setups or extreme indirect lighting may reduce accuracy.
- Enclosed scene handling: Ray-decoupled visibility avoids topology edits but requires careful shader implementation to prevent light leaks or double counting.
By combining fast HDR estimation, hybrid geometry, and physically guided compositing, LightHarmony3D lowers the barrier for realistic object insertions in reconstructed spaces, enabling immediate offline workflows and charting a clear path toward real-time, engine-integrated solutions.
Glossary
- 3D Gaussian Splatting (3DGS): An explicit point-based scene representation using anisotropic Gaussians that enables fast, high-fidelity view synthesis and reconstruction. "3D Gaussian Splatting (3DGS) enables high-fidelity reconstruction of scene geometry and appearance."
- Alpha compositing: A technique for combining images using per-pixel opacity (alpha) to blend foreground over background. "Eliminating the linear-space shadow-ratio formulation and relying on direct alpha-compositing of the raw PBR shadow mask causes the resulting shadows to appear unphysically dark..."
- Bidirectional Scattering Distribution Function (BSDF): A function describing how light is reflected, transmitted, or absorbed at a surface for given incoming and outgoing directions. "For any surface point on the reconstructed scene mesh, the effective Bidirectional Scattering Distribution Function (BSDF), denoted as , is computed as a linear interpolation between a physically-based opaque material and a perfectly transparent medium, governed by the incoming ray type :"
- Delaunay triangulation: A meshing method that maximizes the minimum angle of triangles, often used to create well-shaped meshes from point sets. "Recent differentiable frameworks like MILo~\cite{guedon_milo_2025} integrate meshing directly into the 3DGS optimization loop using Delaunay triangulation."
- DreamBooth: A fine-tuning technique for diffusion models to learn subject- or concept-specific transformations. "We therefore adopt a DreamBooth-style training strategy~\cite{ruiz2023dreambooth} combined with LoRA (Low-Rank Adaptation)~\cite{hu2022lora}."
- Equirectangular panorama: A mapping of spherical imagery to a 2D rectangular image where latitude and longitude correspond to vertical and horizontal coordinates. "These views are subsequently stitched into a unified equirectangular panorama."
- EXR panorama: A high-dynamic-range image stored in the OpenEXR format capable of representing wide luminance ranges for lighting. "The resulting 32-bit floating-point EXR panorama robustly encodes both ambient illumination and extremely high-intensity light sources, yielding the wide radiometric range essential for accurate physically based shadow casting."
- HDR environment map: A high dynamic range spherical map capturing scene illumination from all directions for image-based lighting. "A hybrid Gaussian-mesh representation captures scene structure, diffusion-based panorama prediction recovers dominant lighting as an HDR environment map..."
- Image-based lighting (IBL): Rendering technique that uses captured or synthesized environment maps to light scenes. "A fundamental challenge in applying image-based lighting (IBL) to reconstructed scenes is the topological closure of the extracted geometry."
- Inverse gamma correction: The process of converting gamma-encoded images (e.g., sRGB) back to a linear color space. "Where performs inverse gamma correction to linearize the sRGB inputs,"
- Inverse rendering: Estimating scene properties (geometry, materials, illumination) from images by “inverting” the rendering process. "Existing approaches attempt to address this either through inverse rendering~\cite{gao_relightable_2024, chen_gi-gs_2025}..."
- Isosurfacing: Extracting a surface from a scalar field by finding points of equal value (isosurface), commonly used to derive meshes. "Other works attempt to extract meshes post-optimization~\cite{guedon2024sugar, yu2024gaussian}, yet naive isosurfacing frequently introduces structural artifacts in thin or high-frequency regions."
- Latent diffusion model: A diffusion model operating in a learned latent space rather than pixel space for efficiency and quality. "We fine-tune a latent diffusion model to map standard-exposure panoramas to a simulated reduced exposure regime (e.g., )."
- LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning method that adapts large models using low-rank updates. "We therefore adopt a DreamBooth-style training strategy~\cite{ruiz2023dreambooth} combined with LoRA (Low-Rank Adaptation)~\cite{hu2022lora}."
- NeRF (Neural Radiance Fields): A neural scene representation that models volumetric density and radiance to render novel views via volume rendering. "Neural Radiance Fields (NeRFs)~\cite{mildenhall2021nerf} pioneered continuous volumetric scene representations"
- Path tracer: A physically based renderer that simulates light transport by tracing stochastic light paths. "Given the recovered HDR environment map, we render both the reconstructed receiver mesh and the virtual object using a path tracer (Blender Cycles~\cite{blender})."
- Physically based rendering (PBR): Rendering that adheres to physical laws of light transport and material response for realism. "Leveraging a hybrid Gaussian–mesh representation~\cite{guedon_milo_2025}, the scene is rendered in a physically based rendering (PBR) engine~\cite{shirley2009fundamentals} to compute a shadow ratio map..."
- Radiant exitance: The total radiant power emitted per unit area from a surface. "Our objective is to generate an underexposed representation where surfaces with lower radiant exitance-such as diffuse background reflections-naturally attenuate below the visibility threshold."
- Radiometric truncation: A process of reducing exposure to suppress low-intensity radiance, isolating dominant light sources. "Rather than relying on heuristic semantic extraction, we formulate the identification of dominant light emitters as a physical radiometric truncation process."
- Ray-Decoupled Visibility Formulation: A rendering strategy that treats geometry as transparent to certain ray types (e.g., camera rays) while opaque to others (e.g., shadow rays) to enable interior IBL. "We introduce a Ray-Decoupled Visibility Formulation."
- Shadow ratio map: A per-pixel multiplicative map describing the attenuation of light due to shadows, applied during compositing. "To compute a shadow ratio map that captures mesh-scene interactions."
- Tone-mapping: The process of mapping HDR values to LDR for display or supervision while controlling exposure. "For training data synthesis, each HDR environment map is tone-mapped to multiple exposure levels."
- VAE encoder (Variational Autoencoder): The encoder component of a VAE that maps images to a latent space used by latent diffusion models. "The image is encoded into the latent space via the modelâs VAE encoder, while a fixed text prompt guides the model to perform the exposure reduction."
- Vision-LLM: A model aligning visual and textual modalities to evaluate or generate content conditioned on language. "We pioneer a reference-free evaluation protocol for object insertion in the absence of ground truth by leveraging a Vision-LLM."
- VQAScore: A vision-language-based metric for evaluating visual question answering or alignment, used here to assess realism without references. "Specifically, we employ VQAScore~\cite{lin2024evaluating} to formulate a contrastive metric that evaluates the realism and harmonization of the insertion."
- Watertight enclosure: A mesh with no holes that forms a closed surface, potentially blocking exterior lighting from entering interiors. "Explicit meshes derived from 3DGS or SDFs often form continuous, watertight enclosures without explicitly modeled architectural openings (like windows or doors)."
Collections
Sign up for free to add this paper to one or more collections.