Video HDR Gaussian Representation

Updated 22 October 2025

Video HDR Gaussian representation is a learnable encoding that models spatial and temporal radiance using Gaussian primitives to achieve high-fidelity dynamic scene rendering.
It integrates hybrid tone mapping and exposure modeling, bridging HDR to LDR conversion for efficient dual dynamic range synthesis and improved color fidelity.
Parallel rendering and temporal regularization ensure real-time performance and robust HDR reconstruction, enabling photorealistic outputs in dynamic video sequences.

A video high dynamic range (HDR) Gaussian representation is an explicit, learnable encoding of dynamic scenes or sequences that models space, (optionally time), and radiance information using Gaussian primitives with support for high dynamic range. This representation is a recent evolution in the intersection of novel view synthesis, video compression, 3D scene modeling, and lighting estimation, offering efficiency and fidelity improvements over previous neural implicit field methods (such as NeRF) and traditional low dynamic range (LDR) encoding. The video HDR Gaussian field captures temporally and spatially varying lighting, enables efficient synthesis of novel HDR frames, and supports photorealistic rendering with real-time capability.

1. Foundational Structure: Gaussian Splatting for HDR Video

Video HDR Gaussian representation leverages the Gaussian Splatting paradigm, in which dynamic scenes are represented as a set of Gaussians—each with spatial means, anisotropic covariance, opacity, and color/radiance attributes. In classical 3DGS, this yields photorealistic rendering at real-time rates. For HDR content, the representation is augmented to efficiently encode the full range of scene radiance:

In (Cai et al., 24 May 2024), HDR-GS introduces a Dual Dynamic Range (DDR) Gaussian model, where each Gaussian holds both an HDR color (modeled as view-dependent spherical harmonics) and a separately tone-mapped LDR color (mapped via an MLP-based per-channel tone-mapper that accounts for exposure).
In (Wu et al., 13 Aug 2024), HDRGS reinterprets the “color” attribute as radiance in the log domain, and includes explicit luminance information, thereby increasing representational “color dimensionality.”
The 3DGS extension to video ((Smolak-Dyżewska et al., 17 Nov 2024), VeGaS) and 2D/4D splatting ((Pang et al., 8 Jul 2025), GSVR; (Bond et al., 8 Jan 2025), GaussianVideo) further parameterize each Gaussian primitive as a function of time, enabling temporal dynamics and frame interpolation.

The core HDR video rendering operation is the forward compositing (alpha blending) of Gaussian splats (projected onto the target frame's image plane), enabling exposure-adaptive synthetic views and real-time, high-fidelity generation.

2. Tone Mapping, Dynamic Range, and Exposure Modeling

A critical innovation in video HDR Gaussian fields is a learnable or hybrid tone mapping module, which bridges the radiometry of HDR scenes and displayable LDR intensities:

In (Cai et al., 24 May 2024), the DDR model applies MLP-based tone mappers to transform log-HDR radiance with user-specified $\log(\Delta t)$ (exposure time) into LDR channel values, learning to emulate unknown camera response functions (CRFs).
HDRGS (Wu et al., 13 Aug 2024) employs an asymmetric grid-based tone mapper $g$ , providing high resolution in value-dense regions, with “leaky” boundary extensions for stability outside the nominal exposure range:

$g_\text{leaky}(x) = \begin{cases} \beta (x-a) & x < a \ g(x) & a \leq x \leq b \ -\beta / (\sqrt{x-b+1}+\beta+1) & x > b \end{cases}$

GaussHDR (Liu et al., 13 Mar 2025) unifies 3D and 2D local tone mapping via residual MLPs on log irradiance and spatial context features, combining dual LDR renderings (3D- and 2D-mapped) at the loss level with learned uncertainty modulation. This results in improved robustness to local CRF variations, scene-dependent tone-mapping, and spatially consistent outputs.
Cinematic Gaussians (Wang et al., 11 Jun 2024) includes analytical tone mapping after simulating thin-lens DoF blur, parameterizing scene brightness as $C_\text{LDR}(x) = [\min (t/a^2 \cdot C_\text{HDR+DoF}(x), 1)]^{1/\gamma}$ to allow direct exposure/aesthetic manipulation.

By explicitly parameterizing tone mapping and exposure dependence in the HDR Gaussian field, these methods overcome legacy limitations of NeRF-based HDR approaches—offering both adaptivity to new exposures and perceptually stable LDR/HDR rendering.

3. Temporal Consistency and Dynamic Scene Modeling

For video, maintaining spatiotemporal consistency and distinguishing dynamic objects from camera motion is central.

HDR-GS (Cai et al., 24 May 2024) supports parallel rasterization branches for both LDR (tone-mapped) and HDR outputs, maintaining gradient flow and stable optimization even in long dynamic sequences.
Mono4DGS-HDR (Liu et al., 21 Oct 2025) introduces a two-stage approach: dynamic video HDR Gaussian representations are first learned in an orthographic camera coordinate system (pose-free, aligned to tracked 2D points and depth priors), and later transformed into world space and refined jointly with camera pose estimates. This approach enables robust HDR reconstruction in the presence of unknown/alternating exposure and ambiguous pose information typical in monocular videos.
Temporal luminance regularization (Liu et al., 21 Oct 2025) is used to align HDR appearance over time via a flow-guided photometric loss:

$\mathcal{L}_\text{tlr} = \| V \odot \frac{\tilde{H}_{t-1 \rightarrow t} - \tilde{H}_t}{\tilde{H}_{t-1 \rightarrow t} + \tilde{H}_t} \|_1$

where $\tilde{H}$ denotes the rendered HDR, and warping is guided by rendered flows.

VeGaS (Smolak-Dyżewska et al., 17 Nov 2024) generalizes temporal dynamics using folded-Gaussian distributions, permitting nonlinear mean/variance shifts in the spatial domain as functions of time (see Section 6 for formulaic detail).

Such mechanisms ensure that the HDR Gaussian field not only maintains temporally stable radiance estimates, but can robustly recover HDR information in dynamic, non-static, or exposure-varying content.

4. Training, Optimization, and Computational Efficiency

One of the distinguishing features of Gaussian Splatting for HDR video is its order-of-magnitude increase in computational throughput and scalable optimization:

HDR-GS (Cai et al., 24 May 2024) reports 1000× inference speed and only 6.3% training time relative to the NeRF-based baseline; real-time rates are achieved due to explicit splatting, as opposed to slow volumetric ray marching.
HDRGS (Wu et al., 13 Aug 2024) employs a coarse-to-fine strategy, using an initial fixed tone-mapper for rapid Gaussian field convergence and later introducing (otherwise highly-coupled) asymmetric grid learning. This method typically reconstructs in 4–8 minutes per scene.
GSVR (Pang et al., 8 Jul 2025) and GaussianVideo (Bond et al., 8 Jan 2025), by leveraging explicit 2D/3D (and even 4D) Gaussian splatting with time-dependent deformation fields or Neural ODEs for camera trajectory modeling, enable real-time decoding (800+ FPS), efficient model scaling, and frame interpolation.

Table: Representative Efficiency Metrics

Method	FPS (Rendering)	Relative Training Time	Temporal Regularization
HDR-GS (Cai et al., 24 May 2024)	126	6.3%	—
HDRGS (Wu et al., 13 Aug 2024)	Real-time	Minutes (4–8)	—
GSVR (Pang et al., 8 Jul 2025)	>800	2 sec/frame	Dynamic time slicing
Mono4DGS-HDR (Liu et al., 21 Oct 2025)	161	—	Luminance regularization

*All figures are as stated in their respective sources; performance varies with dataset and hardware.

Video HDR Gaussian fields have been rapidly adopted in multiple research and application domains:

Real-time novel HDR view synthesis: film, VR/AR, dynamic scene editing (Cai et al., 24 May 2024, Wu et al., 13 Aug 2024)
Dynamic lighting/environment map reconstruction for rendering, global illumination (Clausen et al., 9 Dec 2024, Bolduc et al., 15 Apr 2025)
Robust HDR scene estimation and exposure editing from monocular, casually captured, or auto-exposure videos (Gong et al., 24 Apr 2025, Liu et al., 21 Oct 2025)
Video compression and HDR content streaming with high-fidelity temporal interpolation (Lee et al., 6 Mar 2025, Bond et al., 8 Jan 2025, Pang et al., 8 Jul 2025)

Several frameworks also integrate advanced exposure fusion (SeHDR (Li et al., 23 Sep 2025)), uncertainty-based fusion (GaussHDR (Liu et al., 13 Mar 2025)), explicit luminance encodings, and continuous camera pose learning via Neural ODEs.

Emerging directions include 4D HDR splatting with diffusion priors and uncertainty distillation for fast-moving objects (Xiao et al., 4 Aug 2025), spatially-varying lighting (GaSLight (Bolduc et al., 15 Apr 2025)), and single-exposure HDR bracketing optimization (SeHDR (Li et al., 23 Sep 2025)).

6. Mathematical Representation and Key Formulations

A typical HDR Gaussian field for video frame $t$ is defined as:

$\mathcal{G}^t = \{ G_i(\mu_i^t, \Sigma_i^t, \alpha_i^t, c_i^{h, t}, c_i^{l, t}, \Delta t, \theta) \mid i = 1,\dots,N_p \}$

HDR color via spherical harmonics:

$c_i^h(d) = \exp\left(\sum_{l=0}^L \sum_{m=-l}^{l} k_l^m Y_l^m(\theta, \phi) \right)$

Tone mapping (HDR to LDR, exposure-adapted):

$c_i^l = g_\theta(\log c_i^h + \log \Delta t + b)$

Parallel differentiable rasterization (PDR) for pixel $p$ :

$I(p) = \sum_{j \in \mathcal{N}_p} c_j \sigma_j \prod_{k=1}^{j-1}(1-\sigma_k)$

with opacity $\sigma_j = \alpha_j \cdot P(x_j \mid \mu_j, \Sigma_j)$ , where $P$ is the normalized Gaussian pdf.

Video-specific, nonlinear temporal deformation (VeGaS (Smolak-Dyżewska et al., 17 Nov 2024)):

$s | t \sim \mathcal{N}(m_s + f(m_t-t), a(t)\sigma_s^2) \ t_k = f_t(k) = \sum_{1}^{k} \sigma(w_i)$

These mathematical principles enable the representation, tone mapping, exposure simulation, and temporal articulation necessary for state-of-the-art HDR video reconstruction and rendering.

7. Limitations and Future Directions

Current video HDR Gaussian fields excel in efficiency, photorealism, and temporal consistency, but several open challenges and research frontiers remain:

Handling of extreme exposure gaps, limited view sampling, or highly specular dynamic lighting (Wu et al., 13 Aug 2024, Clausen et al., 9 Dec 2024)
Precise CRF calibration and tone mapping across arbitrary camera pipelines, especially in consumer-captured or raw video (Gong et al., 24 Apr 2025, Liu et al., 21 Oct 2025)
Robust separation of camera and object motion in severely unstructured or non-synchronized scenes (Pang et al., 8 Jul 2025, Liu et al., 21 Oct 2025)
Advanced uncertainty and temporal propagation schemes for fast-changing, occluded, or partially captured regions (Liu et al., 13 Mar 2025, Xiao et al., 4 Aug 2025)
Scaling to real-time HDR video editing, semantic-aware processing, and cross-modal fusion (depth/LiDAR, event cameras).

A plausible implication is that future work will further integrate neural field advances, active exposure control, and downstream rendering pipelines to achieve even broader generalization and editing capabilities in real-world HDR video scenarios.