Relightable Holoported Characters
- Relightable Holoported Characters (RHC) are photorealistic, dynamic digital avatars that enable high-quality novel-view synthesis and physically accurate relighting.
- They integrate explicit 3D Gaussian splatting, transformer-driven neural architectures, and physics-informed UV canonicalization for precise geometric and appearance representation.
- RHC systems achieve robust performance with high PSNR, SSIM, and LPIPS metrics while supporting interactive editing and real-time telepresence applications.
A Relightable Holoported Character (RHC) is a photorealistic, dynamic, and animatable digital avatar that supports high-quality novel-view synthesis and physically accurate relighting from sparse, real-time RGB or RGBD video streams. RHCs are central to telepresence, volumetric video, and holographic display systems, combining advances in neural rendering, physically-based reflectance modeling, and geometric representation. The current state-of-the-art leverages 3D Gaussian Splatting (3DGS), transformer-driven neural architectures, and efficient decompositions of geometry and appearance to realize real-time, editable, free-viewpoint holographic avatars that respond plausibly to arbitrary environment maps and changing lighting conditions, even in challenging full-body and dynamic contexts.
1. Geometric and Appearance Representation
Modern RHC pipelines employ explicit geometric representations anchored in learnable 3D Gaussian splats or volumetric primitives, parametrized by spatial means (), orientations, covariances (), and per-splat material attributes. In the RHC pipeline of (Singh et al., 29 Nov 2025), texel-aligned 3D Gaussians are attached to a dynamically deformed mesh proxy. Key appearance quantities—diffuse albedo , roughness, specular parameters, normals, and view direction encodings—are estimated per Gaussian or per mesh texel. This approach enables photorealistic detail, including pose-dependent facial and cloth wrinkles, and sharp shadow boundaries under changing pose and illumination.
The geometry is frequently initialized and temporally tracked using canonical human meshes (e.g., SMPL or custom per-subject templates) with learned non-linear deformations for capturing high-frequency dynamics or wrinkles, as in (Singh et al., 29 Nov 2025) and (Sun et al., 27 May 2025). For efficiency and robustness, coarse-to-fine representations are used, with physics-informed feature extraction fusing normal fields, albedo, shading, and camera views into a UV-domain canonicalization ((Singh et al., 29 Nov 2025), Section 2).
2. Physically Based Neural Relighting
At the core of relightable holographic synthesis is implicit or explicit estimation of the rendering equation,
where is a microfacet BRDF (Disney/PBR or GGX model), the incoming radiance from environment lighting, a learned visibility/shadowing field, and all terms are parameterized or estimated per-Gaussian, per-pixel, or per-voxel (Sun et al., 27 May 2025, Singh et al., 29 Nov 2025, Lin et al., 2023).
The physically-grounded neural rendering module (PGNR) infers material parameters and shading using a small MLP, leveraging a differentiable 2D-to-3D supervision pipeline. Shadows and indirect illumination are handled using learned or SH-projected visibility functions and ambient/indirect maps (Sun et al., 27 May 2025), supporting real-time evaluation:
- Direct/ambient lighting is convolved with learned visibility and BRDF coefficients.
- Indirect terms are predicted by a shallow network or encoded in low-frequency SH (Sun et al., 27 May 2025, Singh et al., 29 Nov 2025).
- Final shading is composited by splatted Gaussian blending in the image domain following alpha compositing (“EWA splatting”).
In transformer-driven approaches (Singh et al., 29 Nov 2025), texel-resolved UV space features are fused with tokens from the flattened HDR environment map via cross-attention layers. This allows the network to implicitly learn the integral of the rendering equation for arbitrary environments in a single feed-forward pass, enabling low-latency inference.
3. End-to-End Training and Supervision
Modern RHC pipelines employ end-to-end supervised learning over large-scale, sparsely-viewed or light-staged dynamic datasets. Training objectives combine:
- Photometric reconstruction loss
- Deep perceptual (e.g., VGG) losses
- Material map smoothness and regularization (bilateral, total variation)
- Surface normal and depth consistency losses with lighting-invariance constraints
- Splat position/scale deviations and color deviation penalties from template priors (Singh et al., 29 Nov 2025)
- Ambient occlusion, direct, and indirect lighting losses between synthesized and ground-truth per-pixel maps
Physics-informed supervision in the UV domain promotes disentanglement of geometry, albedo, normal, and shading, stabilizing training even under multiview sparsity and high-frequency texture artifacts. Point densification and pruning adapt the Gaussian population to surface complexity and motion (Sun et al., 27 May 2025, Zhang et al., 11 Mar 2025). In (Singh et al., 29 Nov 2025), the training regime alternates between uniformly lit tracking frames and relit frames under hundreds of randomly sampled HDR environment maps, enabling dense temporal alignment and appearance generalization.
4. Inference, Editing, and Holographic Display
For real-time RHC deployment, the pipeline proceeds as follows (Sun et al., 27 May 2025, Singh et al., 29 Nov 2025):
- Sparse multi-view (e.g., 4–16 cameras) RGB or RGBD streams are processed for pose/shape estimation and mesh proxy deformation per frame.
- Physics-informed features are extracted and passed to the RelightNet, which, along with the desired lighting code (e.g., HDR environment map), predicts Gaussian splat parameters (position, scale, color, opacity).
- Gaussians are projected into frame buffer coordinates and composited using front-to-back EWA alpha blending, yielding physically plausible novel-view renderings.
- Scene illumination can be edited interactively: users drop/move virtual lights, or swap HDR maps. The RelightNet implicitly recomputes relit imagery for the new environment on-the-fly.
- The resulting RGBA buffers are sent to holographic or light-field displays for volumetric telepresence or AR/VR integration.
Latency is reduced through feed-forward, parallelized computation (≤0.5 sec/frame at px in (Singh et al., 29 Nov 2025)), and the pipeline supports streaming, interactive preview, and direct relighting/editing.
5. Quantitative and Qualitative Evaluation
RHC systems are assessed on synthetic and real datasets using established metrics:
- PSNR (peak signal-to-noise-ratio)
- SSIM (structural similarity index)
- LPIPS (learned perceptual image patch similarity)
In (Singh et al., 29 Nov 2025), RHC achieves PSNR dB, SSIM , LPIPS , outperforming prior methods such as Relighting4D, IntrinsicAvatar, and MeshAvatar under free-view and novel illumination. Qualitatively, RHC captures pose-dependent fine-scale geometry, accurate shadows, consistent relighting, and free-viewpoint rendering with only four input cameras.
6. Comparison to Related Architectures and Limitations
The RHC paradigm supersedes deferred neural rendering and mesh-based parametric BRDF methods which lack either full relighting generalization or volumetric holographic output (Li et al., 31 Oct 2024, Zhang et al., 11 Mar 2025, Iqbal et al., 2022). Neural volumetric avatars anchored to mixture-of-primitives or SDF+deformation fields (Lin et al., 2023, Xu et al., 2023, Yang et al., 2023) have highlighted the importance of canonical-space disentanglement, explicit light visibility networks, and mesh-guided UV atlases for supporting both animation and relighting.
However, limitations remain. Generalization to accessories (e.g., hair, glasses), near-field or global illumination, and monocular or in-the-wild video remain challenging (Yang et al., 2023, Zhang et al., 11 Mar 2025). Some pipelines require light stage hardware or dense annotations; others trade fine-scale accuracy for optimization speed. Intrinsic decomposition (separation of albedo versus shading) is imperfect under harsh shadows or complex interreflections.
7. Outlook and Future Research
Active research directions include generalizing physics-informed UV features to unscripted captures, integrating multi-bounce or global illumination without explicit ray tracing, and expanding Gaussian Splatting to dynamic, arbitrary-scene volumetric holography. Methodologies from RHC are converging toward generalizable, real-time, editable avatars that can support multi-person interaction, commodity hardware, and interactive telepresence holography, with continued emphasis on efficiency, generalization, and physical plausibility (Singh et al., 29 Nov 2025, Sun et al., 27 May 2025, Zhang et al., 11 Mar 2025, Li et al., 31 Oct 2024).