Relightable Gaussian Model
- Relightable Gaussian model is a 3D scene representation that uses anisotropic Gaussians to encapsulate both geometric and photometric attributes for accurate, physically-based relighting.
- It decouples material properties like albedo, roughness, and metallicity from lighting, enabling efficient rasterization and real-time, interactive photorealistic rendering.
- Generative and optimization pipelines integrate diffusion models and transformer-based architectures to produce detailed, editable, and dynamic scenes, advancing text-to-3D asset creation.
A relightable Gaussian model is an explicit, physically-based 3D scene representation in which point-based primitives—typically parameterized as anisotropic or elliptical 3D Gaussians—carry both geometric and photometric attributes sufficient for physically accurate relighting under arbitrary scene illumination. Unlike conventional color-only splatting or neural field methods, relightable Gaussians are endowed with decoupled material properties (such as albedo, roughness, and metallicity) and support efficient rasterization while enabling real-time or interactive photo-realistic rendering with physically based rendering (PBR) equations. The research frontier in this domain covers both generative models (e.g., text-to-3D, large-scale asset pipelines) and optimization-based pipelines for avatars, dynamic scenes, and general novel view synthesis.
1. Mathematical Representation of Relightable Gaussians
The core of relightable Gaussian models is the volumetric point primitive. Each Gaussian is parameterized by its world-space center , covariance (factorized into rotation and scale ), opacity or density , and a set of per-primitive attributes that may include:
- Albedo : diffuse color, ideally lighting-free.
- Roughness : microfacet roughness parameter.
- Metallic : metallicity (for PBR rendering).
- Normal : local surface orientation estimate.
- Radiance Coefficients: Spherical Harmonics (SH) or Spherical Gaussians (SG), for view-dependent color/luminance fields.
- BRDF Parameters: learned or analytic sets for physically-based shading.
A canonical Gaussian “splatting” operation projects each 3D Gaussian to a 2D ellipse on the image plane, splats its properties with -compositing, and blends overlapping splats front-to-back. In the relightable case, the rasterization may be deferred: attribute maps (albedo, roughness, normal, metallic) are composited in G-buffers, followed by pixel-wise PBR shading (Ye et al., 26 Sep 2025, Zhang et al., 11 Mar 2025, Zhang et al., 10 Aug 2024).
2. Material and Appearance Decomposition
Central to relightable Gaussian models is the explicit or learnable separation of geometry and material from baked lighting. Best-in-class pipelines disallow uncontrolled light baking by decoupling the following:
- Albedo: In MGM, a latent diffusion pipeline is trained to output geometry-consistent, lighting-free albedo maps instead of shaded textures; roughness and metallic channels are likewise output by a multi-channel diffusion backbone (Ye et al., 26 Sep 2025).
- Physically-based BRDF Factors: Attributes are assembled per Gaussian, either as strictly decoupled scalar fields (e.g., [f_a, f_r, f_m]) or as neural latent codes for non-Lambertian appearance.
- PBR Consistency: Higher-order SH components for view dependence are avoided in albedo to prevent light-baking; only diffuse (l=0) terms are retained for material channels in strict PBR settings (Ye et al., 26 Sep 2025, Zhang et al., 7 Aug 2025).
In avatar/face/body models, further specialization includes normal maps for mesostructure and separated attributes for UV-mapped editing or blendshape-driven dynamics (Baert et al., 9 Dec 2025, Zhang et al., 11 Mar 2025, Wang et al., 24 Jan 2025).
3. Reconstruction and Generation Pipelines
Techniques for constructing relightable Gaussian representations fall into two broad categories: optimization-based (inverse rendering, per-object fitting) and generative (diffusion or transformer-based large models from multiview or text cues).
For large-scale 3D content generation, the MGM pipeline is instructive:
- Material Diffusion: Multiview latent diffusion models (e.g., MVDream) are fine-tuned to synthesize five-channel PBR images (albedo, roughness, metallic) from a text prompt, conditioning on rendered depth and normal maps via ControlNet (Ye et al., 26 Sep 2025).
- Volume Transformer: Multiview PBR images and geometry cues are encoded, then passed through a “volume transformer” that holds 3D tokens and aggregates view-aligned features in a coarse-to-fine transformer architecture, culminating in a dense volume from which Gaussians are decoded.
- Coarse-to-Fine Decoding: Lightweight MLPs decode for each voxel’s Gaussians, followed by learnable SH-based residual refinement. Losses balance image appearance, geometry, distortion, and normal consistency across multiple views. A two-stage training protocol, freezing roughness/metallic then unfreezing, stabilizes convergence (Ye et al., 26 Sep 2025).
Generalization to humans, avatars, and animated scenes involves articulated Gaussian deformation (dynamic skinning, blendshapes) and per-pose nonrigid offsets for view/pose-dependent shading (Choi et al., 10 Dec 2025, Wang et al., 24 Jan 2025). For editable avatars, UV-embedded splats allow material maps to be directly painted, regularized, and relit via deferred PBR shading (Baert et al., 9 Dec 2025).
4. Physically Based Relighting Mechanisms
Core relighting is enabled via explicit rasterization of the composited material maps, followed by per-pixel evaluation of the rendering equation:
with as the PBR BRDF, typically the Cook-Torrance microfacet model:
where (microfacet distribution), (Fresnel), and (geometric shadowing) are parameterized by roughness, and is metallicity. The normal is composited from Gaussian attributes. Advanced pipelines substitute the environment map on-the-fly without retraining, supporting true physically correct relighting (Ye et al., 26 Sep 2025, Zhang et al., 11 Mar 2025, Zhang et al., 10 Aug 2024).
Some methods utilize precomputed radiance transfer (PRT) in SH or SG basis for rapid evaluation of low-frequency (diffuse) and high-frequency (specular) components under changing illumination (Zhang et al., 10 Aug 2024, Guo et al., 7 Aug 2024, Saito et al., 2023). For avatars or articulated bodies, deformation-aware transfer functions are mapped in local frames (zonal harmonics), supporting pose-varying radiance and efficient rotation (Wang et al., 24 Jan 2025, Choi et al., 10 Dec 2025). Realistic shadows are produced via analytic visibility terms, fast ray tracing (BVH), or neural shadow networks for non-local occlusion (Zhang et al., 11 Mar 2025, Guo et al., 7 Aug 2024, Wang et al., 24 Jan 2025).
5. Comparative Evaluation and Empirical Results
Quantitative evaluation of relightable Gaussian models centers on both geometric/photometric fidelity and efficiency.
| Model | Geometry-CLIP | Appearance-CLIP | FID | Inference Time |
|---|---|---|---|---|
| MGM (Ye et al., 26 Sep 2025) | 29.87 | 30.48 | 89.55 | 30 s |
| LGM | 27.84 | 29.31 | 101.6 | - |
| LaRa | 27.02 | 28.77 | 97.95 | - |
| Paint-it | - | - | - | 40 min |
| DreamMat | - | - | - | 75 min |
Relighting accuracy: On synthetic and real benchmarks, relightable Gaussians match or surpass mesh- or field-based baselines, with higher Geometry/Appearance-CLIP and lower FID metrics (Ye et al., 26 Sep 2025). The ablation of geometry guidance (depth/normal) degrades both metric and perceptual detail.
Efficiency: Real-time relighting is attained by precomputing all material and transfer coefficients as per-Gaussian buffers, reducing test-time to dot products and image splatting—orders of magnitude faster than NeRF-based approaches or UV-map optimization (Zhang et al., 10 Aug 2024, Guo et al., 7 Aug 2024).
Generalization and Editing: Pipelines like GTAvatar (Baert et al., 9 Dec 2025) demonstrate UV-mapped Gaussians supporting real-time relighting, high-frequency editing, and attribute manipulation without retraining or optimization.
Limits: Weaknesses remain in capturing fully transparent, very thin, or extremely high-frequency geometric features. Quality of indirect illumination depends on Monte Carlo density per-Gaussian, and geometric consistency across material channels is sensitive to coordinate supervision loss (Ye et al., 26 Sep 2025, Sun et al., 27 May 2025).
6. Representative Applications
- Generative 3D asset creation: MGM establishes a scalable approach for text-to-3D workflows with fully relightable, PBR-ready 3D Gaussians (Ye et al., 26 Sep 2025).
- Avatar modeling and animation: Pipelines such as RnD-Avatar, HRAvatar, GTAvatar, and Relightable Full-Body Gaussian Codec Avatars incorporate relightable Gaussians in combination with mesh-driven deformation, enabling relightable, editable, and animatable avatars from monocular video or multiview scans (Choi et al., 10 Dec 2025, Zhang et al., 11 Mar 2025, Wang et al., 24 Jan 2025, Baert et al., 9 Dec 2025).
- Dynamic relightable volumetric video: BEAM presents a pipeline for recovering full per-frame materialized Gaussian models with diffusion-generated roughness, supporting both real-time PBR rendering and offline path-tracing (Hong et al., 12 Feb 2025).
- Relightable outdoor scene reconstruction: Models targeting unconstrained or outdoor datasets employ hybrid lighting models (SG+SH, per-Gaussian transfer) and large-scale intrinsic decomposition for relightable visualization under variable daylight (Bai et al., 28 Jul 2025, Liao et al., 14 Sep 2025).
- Medical imaging: PR-ENDO adapts the relightable Gaussian paradigm to endoscopic data, learning scene-specific reflectance and microfacet parameters to handle view-aligned, high-intensity lighting and surface scattering (Kaleta et al., 19 Nov 2024).
7. Current Trends and Outlook
The field is progressing along lines of scaling, generalization, efficiency, and adaptability:
- Feed-forward and generative architectures: Integration of transformer or diffusion models conditioned on text or sparse multiview images obviates the need for slow per-scene optimization, enabling rapid asset generation at scale (Ye et al., 26 Sep 2025, Zhang et al., 8 Oct 2024).
- Data-driven isolation of materials: Fine-tuning multi-channel PBR diffusion backbones on depth/normal conditioning enforces strict separation of intrinsic material factors from external illumination (Ye et al., 26 Sep 2025).
- Efficient relighting: Precomputed radiance transfer (SH/SG) reduces relighting to matrix-vector multiplication per Gaussian, yielding real-time performance at FPS even at $1080p$ (Zhang et al., 10 Aug 2024, Guo et al., 7 Aug 2024).
- UV and attribute editing: Embedding Gaussians into mesh UV atlases enables intuitive material editing and high-resolution relightable editability (Baert et al., 9 Dec 2025).
- Dynamic/dataset-wide models: Universal relightable Gaussian codecs trained on large-scale scans enable rapid adaptation to new identities or novel environments (Li et al., 31 Oct 2024).
- Robustness via auxiliary geometry: Bidirectional guidance between Gaussians and SDFs ensures accurate normal/depth supervisions and removes floaters, especially important for reflective or challenging geometries (Zhu et al., 22 May 2024).
Relightable Gaussian models now constitute a leading paradigm for scalable, physically grounded 3D reconstruction and content creation compatible with interactive rendering, high-quality relighting, and advanced material manipulations (Ye et al., 26 Sep 2025, Choi et al., 10 Dec 2025, Baert et al., 9 Dec 2025, Zhang et al., 11 Mar 2025, Zhang et al., 10 Aug 2024).