Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 178 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 38 tok/s Pro
GPT-5 High 40 tok/s Pro
GPT-4o 56 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Static-Dynamic 4D Hash Encoding

Updated 13 November 2025
  • Static-dynamic decoupled 4D decomposed hash encoding is a technique that factorizes spatio-temporal features through overlapping 3D hash grids to model static geometry and dynamic motion efficiently.
  • The method employs spatial and temporal MLPs with a multi-head decoder to predict deformation parameters, enhancing physical consistency and rendering quality.
  • Its integration with physics-informed losses and optical flow supervision results in improved PSNR, SSIM, and LPIPS metrics compared to traditional 4D grids.

Static-dynamic decoupled 4D decomposed hash encoding is a core computational construct designed for efficient and physically consistent representation of spatio-temporal geometry and motion in dynamic 3D Gaussian Splatting systems. It appears as a central technique in Physics-Informed Deformable Gaussian Splatting (PIDG) (Hong et al., 9 Nov 2025), which aims to unify explicit 3D representations with continuum-mechanics principles for the reconstruction and simulation of dynamic materials from monocular video.

1. Formal Definition and Conceptual Overview

In the PIDG framework, the objective is to represent both static and dynamic scene aspects over the 4D domain (x,y,z,t)(x, y, z, t). Unlike monolithic 4D feature grids or solely MLP-based encoding, static-dynamic decoupled 4D decomposed hash encoding factorizes this 4D space via four overlapping 3D hash grids: Gxyz(x,y,z),Gxyt(x,y,t),Gyzt(y,z,t),Gxzt(x,z,t)G_{xyz}(x, y, z),\quad G_{xyt}(x, y, t),\quad G_{yzt}(y, z, t),\quad G_{xzt}(x, z, t) This structure enables localized, high-frequency encoding of scene geometry and temporally-varying motion fields, while separating static features (via GxyzG_{xyz}) from dynamically evolving ones (via grids involving tt).

A spatial MLP fsf_s operates on GxyzG_{xyz} to compute an "attention" scalar aa modulating the influence of static geometry. For dynamic features, a temporal MLP ftf_t processes the concatenated features from the three temporal grids. The output hh is a weighted combination: h=aft(Gxyt,Gyzt,Gxzt)h = a \odot f_t(G_{xyt}, G_{yzt}, G_{xzt}) A multi-head decoder D(h)D(h) subsequently predicts deformation increments: rotation Δr\Delta r (quaternion), translation TxT_x, and scale update Δs\Delta s, which update the canonical Gaussian parameters (μ\mu, SS, RR).

This encoding scheme facilitates the separation and differential optimization of static and dynamic regions during training, typically implemented in two stages:

  • Densification: all Gaussians are optimized for both static geometry and dynamic deformation.
  • Refinement: a dynamic mask freezes static Gaussians (parameter fine-tuning only) and allows dynamic Gaussians to update their deformation fields.

2. Architectural Implementation

Feature extraction, modulation, and parameter update within the encoding paradigm use shallow neural networks:

  • Spatial MLP fsf_s and Temporal MLP ftf_t: Each is a single hidden layer (256 units, ReLU).
  • Decoder D(h)D(h): A lightweight two-layer MLP (256 units each, ReLU) outputs {Δr,Tx,Δs}\{\Delta r, T_x, \Delta s\} per Gaussian.

Hash grids are optimized with hierarchical learning rates (MLP rate 2×1032 \times 10^{-3}, hash grids up to 50× higher) via Adam (β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999).

Memory scaling is determined by the number of hash grids, each being O(n3)O(n^3), which, compared to a full 4D grid (O(n4)O(n^4)), saves significant resources while still capturing high-frequency spatio-temporal details.

3. Separation of Static and Dynamic Scene Components

The key advantage of the decoupled encoding is rigorous separation of static and dynamic regions within the optimization process. Initially, all Gaussians participate in joint optimization of both static and dynamic fields. After a dynamic mask is computed (often based on magnitude or consistency of deformation vectors), the system "freezes" parameters of static Gaussians, whereas dynamic Gaussians continue updating deformation fields.

This enables more stable learning, better generalization, and artifact-free rendering in static regions, while permitting flexible motion modeling in dynamic zones.

4. Integration with Physics-Informed Constraints

Static-dynamic decoupled hash encoding is intertwined with PIDG’s physics-informed loss, specifically the Cauchy momentum residual: LCMR=1Mi=1Mr(xi,ti)22\mathcal{L}_{\text{CMR}} = \frac{1}{M}\sum_{i=1}^{M} \|\,\mathbf{r}(x_i, t_i)\,\|_2^2 where residual r(x,t)\mathbf{r}(x, t) is computed pointwise from momentum and stress balance: r(x,t)=ρ(vt+(v)v)σ\mathbf{r}(x,t) = \rho\Bigl(\tfrac{\partial \mathbf{v}}{\partial t} + (\mathbf{v} \cdot \nabla)\mathbf{v}\Bigr) - \nabla \cdot \boldsymbol{\sigma} Hash-encoded features feed into a multi-head MLP fθf_\theta, which predicts both v(x,t)R3\mathbf{v}(x, t) \in \mathbb{R}^3 and σ(x,t)R6\boldsymbol{\sigma}(x, t) \in \mathbb{R}^6 for each Gaussian.

This approach supports unified learning of diverse constitutive laws (fluid-like or solid-like) across the scene without manual switching, controlled exclusively by the material field parameters inferred from hash grid embeddings.

5. Supervision via Optical Flow

Supervision is augmented with camera-compensated 2D optical flow and Lagrangian particle flow matching:

  • Pretrained 2D flow networks estimate pixelwise motion, used as pseudo-ground-truth.
  • For each pixel, the top-KK Gaussians (by splatting weight) are identified to compute Gaussian flow and velocity flow, both compared with the pseudo-ground-truth via a flow-matching loss: Lflow=λgp1flowg(p1)flowgt(p1)1+λvp1flowv(p1)flowgt(p1)1\mathcal{L}_{\text{flow}} = \lambda_g \sum_{p_1}\|flow_g(p_1) - flow_{gt}(p_1)\|_1 + \lambda_v \sum_{p_1}\|flow_v(p_1) - flow_{gt}(p_1)\|_1 with typical λg=λv=0.5\lambda_g = \lambda_v = 0.5.

6. Empirical Performance and Evaluation

Experimental results on the PIDG synthetic dataset, D-NeRF synthetic scenes, and real-world HyperNeRF videos demonstrate that static-dynamic decoupled 4D decomposed hash encoding leads to statistically superior PSNR, SSIM, and LPIPS metrics versus prior art (Grid4D, MotionGS, GaussianPredict). PIDG attains visibly sharper static details, more coherent motion, and better generalization for future prediction in dynamic scenes.

On the real-world HyperNeRF scenes, the method achieves +0.5 dB PSNR and +0.01 MS-SSIM over the strongest baselines, while isolating dynamic processes with high physical consistency.

7. Limitations, Advantages, and Ongoing Research

Key advantages include:

  • Physical consistency and continuum mechanical realism via Cauchy-residual loss and flow matching
  • Efficient, scalable memory usage (O(n3)O(n^3) instead of O(n4)O(n^4))
  • Unified representation for multiple constitutive behaviors

Notable limitations are:

  • High computational and memory demands for joint optimization over millions of particles and physics-based losses
  • Limited expressiveness for complex material behaviors (nonlinear elastoplasticity, fracture)
  • Reliance on accurate optical flow and masks for supervision, which may be error-prone

Planned future work includes real-time inference, richer constitutive modeling (viscoelasticity, plasticity), multi-scale hierarchical renormalization, and multi-view/RGB-D extensions.

This encoding paradigm is foundational for PIDG’s advances in dynamic reconstruction quality and physical generalization from monocular video, formalizing a robust methodology for 4D scene decomposition in physics-informed explicit scene representations (Hong et al., 9 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Static-Dynamic Decoupled 4D Decomposed Hash Encoding.