Dynamic Neural Radiance Fields

Updated 28 February 2026

Dynamic Neural Radiance Fields are an extension of static NeRFs that model time-varying, non-rigid 3D scenes using spatiotemporal functions.
They employ deformation networks to warp points into a canonical space, enabling photorealistic rendering via integrated view-dependent color and density predictions.
Advanced techniques like tensor factorization, grid encodings, and regularization methods accelerate training while effectively handling occlusion and topological changes.

Dynamic Neural Radiance Fields (Dynamic NeRFs) generalize the static NeRF framework to model time-varying scenes, enabling photorealistic novel-view and novel-time synthesis of dynamic, non-rigid, and long-duration 3D environments. They represent a function mapping spatiotemporal coordinates and viewing direction to view-dependent emitted color and volume density, supporting advanced tasks such as dynamic reconstruction, free-viewpoint video, and editable 3D content.

1. Mathematical Formulation and Canonicalization

Dynamic NeRFs extend the static NeRF formulation $F: \mathbb{R}^3 \times S^2 \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ to a spatiotemporal function $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ . Given a 3D point $x$ , view direction $d$ , and continuously-valued time $t$ , the network predicts color $c$ and density $\sigma$ for volume rendering.

Many dynamic NeRFs, starting with D-NeRF (Pumarola et al., 2020), decompose this mapping into two stages:

Deformation Network $\Psi_t$ predicts a residual displacement $\Delta x = \Psi_t(x, t)$ , which warps the query point back to a shared "canonical" space: $x' = x + \Delta x$ .
Canonical NeRF $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 0 predicts density and radiance in canonical space: $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 1.

The volumetric color is rendered by sampling points along each camera ray at time $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 2, warping via $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 3, querying $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 4, then integrating with the standard transmittance-weighted quadrature.

This canonicalization approach is robust to non-rigid deformation and supports time- and view-continuous rendering. The architectural motif is preserved in subsequent advances (Jang et al., 2022, Guo et al., 2022).

2. Representation Strategies and Acceleration

Dynamic NeRF research has diverged into several representational classes:

MLP-Based Deformations: D-NeRF and H-NeRF (Xu et al., 2021) parameterize both deformation and radiance fields purely with multi-layer perceptrons and often enforce constraints such as $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 5 at canonical time.
Voxel and Grid-Based Encodings: To accelerate training/inference, methods such as Neural Deformable Voxel Grid (NDVG) (Guo et al., 2022) and D-TensoRF (Jang et al., 2022) replace most of the MLPs with explicit 3D or 4D tensor grids, utilizing trilinear/quadrilinear interpolation and lightweight decoders. D-TensoRF introduces tensor decomposition by Canonical Polyadic (CP) and Matrix–Matrix (MM) methods, representing the joint space (xyz, time) as a low-rank set of factors, offering 10–40x speedup and enabling real-time dynamic NeRFs.
Particle Encodings: Online and high-adaptation approaches, such as ParticleNeRF (Abou-Chakra et al., 2022), use a set of dynamic neural particles whose positions and features are optimized continuously via backpropagated photometric gradients interpreted as physics-like velocity updates.

Method	Canonicalization	Representation	Acceleration
D-NeRF	MLP deform.	fully MLP	-
D-TensoRF	none (tensor grid time)	4D grid + CP/MM	tensor factoriz.
NDVG	MLP+grid deform.	3D grids + small MLP	explicit trilinear
ParticleNeRF	none	dynamic particles	physics-based online

Efficient approaches such as InstantNGP-based hash grids and multi-resolution factorization are frequently employed for both speed and compactness (Quartey et al., 2022, Abou-Chakra et al., 2022).

3. Handling Occlusion, Topology, and Scene Flow

Dynamic NeRFs incorporate specialized modules to account for occlusions, disocclusions, and topological changes:

Occlusion Modeling: NDVG (Guo et al., 2022) augments the deformation network with an occlusion weight $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 6 per sample, modulating the contribution of each point after deformation to suppress ghosts and background leaking into foreground regions.
Temporal Regularization: D-TensoRF applies smoothing regularization to time factors and matrix slices to promote continuity across frames, essential for temporal coherence and mitigating motion artifacts (Jang et al., 2022).
Scene Flow and Motion Fields: Methods like VDNeRF (Zou et al., 9 Nov 2025) and H-NeRF (Xu et al., 2021) explicitly integrate forward/backward scene flow or leverage pre-fitted body models (imGHUM) to provide physically meaningful temporal correspondences, vital for disambiguating camera and object motion in dynamic, real-world urban or articulated scenes.

4. Training Objectives and Data Regimes

Dynamic NeRFs are trained end-to-end with photometric reconstruction losses over batches of rays drawn from multi-view (or monocular) sequences. Losses include:

Photometric L2/SSIM/LPIPS: core pixelwise MSE or perceptual losses between rendered and ground-truth colors (Pumarola et al., 2020, Guo et al., 2022).
Deformation/Regularization Losses: smoothness, total variation, or L1/L2 on the deformation field, or on grid elements, to constrain geometry and motion (Jang et al., 2022, Guo et al., 2022).
Scene Flow and Consistency Losses: cycle-consistency, temporal aggregation, or mask-guided loss to disentangle camera/object motion and encourage temporal smoothness (Zou et al., 9 Nov 2025, Zhang et al., 2024).
Replay Buffers/Continual Learning: For long video, NeVRF (Wu et al., 2023) employs continual learning with buffer-based rehearsal and experience mixing to avoid catastrophic forgetting and maintain per-frame fidelity.

Dynamic NeRFs operate on varying acquisition regimes, from sparse camera arrays (H-NeRF, VDNeRF) to monocular videos with inferred poses (D-NeRF), to on-the-fly continually streaming inputs (Yan et al., 2023, Abou-Chakra et al., 2022).

5. Editing, Compression, and Streaming

Recent work expands dynamic NeRFs beyond reconstruction towards interactive applications:

Editing: SealD-NeRF (Huang et al., 2024) enables interactive, pixel-level editing of dynamic sequences, mapping single-frame user edits to temporally consistent canonical changes via a teacher-student scheme with the deformation network frozen, ensuring that edits propagate seamlessly along prescribed motion without distorting dynamics.
Compression: Methods such as D-TensoRF and VideoRF (Jang et al., 2022, Wang et al., 2023) serialize dynamic radiance fields into highly compressible grids or 2D video streams amenable to hardware codecs. Techniques include low-rank tensor decompositions (CP/MM), 3D-to-2D Morton packing, and spatial/temporal TV regularization, achieving model sizes down to 1–10 MB for hundreds of frames and enabling real-time mobile playback.
On-the-Fly/Online Adaptation: OD-NeRF (Yan et al., 2023) and ParticleNeRF (Abou-Chakra et al., 2022) support low-latency streaming and rapid reconstruction from sequential video, employing occupancy grid transitions, projected-color conditioning, and particle-based adaptation for framewise retraining at 6–200 ms per update.

6. Specialized Extensions: HDR, Specularities, and Hybrid Rendering

HDR Dynamic NeRFs: HDR-HexPlane (Wu et al., 2024) extends HexPlane to accommodate dynamic scenes with variable exposure, learning a per-image exposure mapping and fixing a monotonic camera response function (CRF) for stable optimization. Volumetric HDR and LDR rendering is performed with a 4D grid factored into six planes, achieving high-quality, exposure-robust free-viewpoint renderings.
Specular/Dynamic Reflective Objects: NeRF-DS (Yan et al., 2023) addresses the domain gap for non-Lambertian, specular dynamic objects by conditioning the radiance branch on observation-space position and surface normal, combined with a mask-guided deformation field to handle correspondence in challenging reflective motion.
Hybrid Mesh-Volumetric Systems: Dynamic Mesh-Aware Radiance Fields (Qiao et al., 2023) proposes a two-way coupling of NeRF volumes and explicit meshes, developing a unified light-transport system by interleaving NeRF ray marching with mesh path tracing, HDR training, and GPU-accelerated physics, allowing physically consistent dynamic scene simulation and real-time hybrid rendering.

7. Limitations, Quantitative Performance, and Future Directions

Dynamic NeRFs face challenges including:

Ambiguity from Sparse Data: Monocular or sparsely-viewed dynamic captures (especially with fast motion or occlusion) remain ill-posed without additional constraints or priors.
Computational Efficiency: While grid and tensor factorization methods (e.g., D-TensoRF, NDVG) offer significant speed and compactness improvements, high-fidelity online streaming with large spatiotemporal coverage remains demanding (Jang et al., 2022, Guo et al., 2022).
Topological Change and Long Duration: Methods such as NeVRF (Wu et al., 2023) and VideoRF (Wang et al., 2023) demonstrate scalability to long temporal sequences and complex topology, but with tradeoffs in reconstruction granularity and storage.

Empirically, state-of-the-art dynamic NeRFs achieve PSNR $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 7–33 dB, SSIM $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 8–0.98, and LPIPS $F: \mathbb{R}^3 \times S^2 \times [0,1] \rightarrow \mathbb{R}^3 \times \mathbb{R}_+$ 9–0.08 on synthetic and real benchmarks, with training times reducing from tens of hours (D-NeRF) to minutes (D-TensoRF, NDVG, ParticleNeRF), and storage below 10 MB for 100–200 frame sequences in compressed representations (Jang et al., 2022, Abou-Chakra et al., 2022).

Anticipated future advances include deeper integration of physical priors (scene flow, SDFs, human models), real-time editable dynamic scenes, end-to-end joint compression and representation, and extension to unbounded environments and continual streaming video scenarios (Zheng et al., 2024, Zou et al., 9 Nov 2025, Yan et al., 2023).