Axes-Aligned Spatiotemporal Gaussian Models

Updated 21 November 2025

Axes-aligned spatiotemporal Gaussian representations are explicit, parameter-efficient models that encode high-dimensional dynamic fields using diagonal covariance matrices.
They leverage normalized Gaussian interpolation to ensure stability and universal approximation, delivering significant computational speed and memory efficiency.
Adaptive density control and scale-aware residual coding further enhance performance in physics-informed MRI super-resolution and real-time dynamic scene rendering.

Axes-aligned spatiotemporal Gaussian representations comprise a class of explicit, parameter-efficient models for encoding complex fields or dynamic phenomena in high-dimensional space–time. Their formulation restricts the parametric support of each Gaussian to have no cross-covariance between coordinate axes, resulting in diagonal covariance matrices. This simplification yields computational tractability while retaining universal approximation guarantees for smooth target functions. Recent advances leverage these representations to accelerate physics-informed super-resolution (PINGS-X) of 4D flow MRI (Jo et al., 14 Nov 2025) and real-time rendering of temporally complex dynamic scenes (SaRO-GS) (Yan et al., 9 Dec 2024). The axes-aligned parameterization, combined with density control strategies, offers substantial improvements in training speed, memory efficiency, and stability in both scientific and graphics applications.

1. Structure of Axes-Aligned Spatiotemporal Gaussian Elements

In axes-aligned spatiotemporal Gaussian models, the field (e.g., flow velocity) at a location $\v x\in\mathbb{R}^q$ is represented as a convex sum of $N$ Gaussian basis functions:

$\mathcal{G}_i = (\bmu_i, \m\Sigma_i, \v v_i), \quad i=1,\ldots,N,$

where $\bmu_i$ is the mean, $\m\Sigma_i$ is the covariance matrix, and $\v v_i$ encodes the field value carried by each Gaussian. The axes-alignment constraint forces

$\m\Sigma_i = \operatorname{diag}(h_{i1}^2, h_{i2}^2, \ldots, h_{iq}^2),$

so each dimension is scaled independently and no rotations are learned. In dynamic scene rendering, this extends to 4D with means $(\mu_i^{3D}, \tau_i)$ and covariance $(\m\Sigma_i, \sigma_i^2)$, where $\sigma_i$ parameterizes the temporal Gaussian "lifespan" (Yan et al., 9 Dec 2024). The temporal dimension is always axis-aligned, resulting in no cross-covariance between space and time.

2. Normalized Gaussian Splatting and Convergence Properties

To ensure stability and avoid the collapse problem of unnormalized schemes, axes-aligned spatiotemporal Gaussians in both PINGS-X and SaRO-GS utilize normalized Gaussian interpolation. For PINGS-X:

$\widehat{\v v}(\v x) = \sum_{i=1}^N w_i(\v x)\v v_i,\quad w_i(\v x) = \frac{z_i(\v x)}{\sum_{j=1}^N z_j(\v x)},$

where $z_i(\v x) = \exp\left(-\frac12 (\v x-\bmu_i)^\top\m\Sigma_i^{-1} (\v x-\bmu_i)\right)$. This partition-of-unity construction guarantees that $\widehat{\v v}(\v x)$ lies within the convex hull of the field values and provides a formal universal-approximation rate:

$\widehat{\v v}(\v x) - \v v(\v x) = O_p\left(\frac{1}{N}\sum_{i=1}^N [\operatorname{tr}\m\Sigma_i]^{\beta/2} + \sqrt{\frac{1}{N^2}\sum_{i=1}^N \frac{1}{\det \m\Sigma_i^{1/2}}}\right)$

for $\beta$ -smooth target functions (Jo et al., 14 Nov 2025). The convergence rate remains unchanged by the axis-aligned constraint, indicating that diagonal Gaussians retain full universal-approximation power for sufficiently smooth functions.

3. Complexity Reduction and Computational Efficiency

Parameterizing $\m\Sigma_i$ as diagonal reduces the number of free parameters from $q(q+1)/2$ to $q$ per Gaussian, eliminating orthonormality constraints and gradient-based rotation learning. This yields several computational benefits:

Forward evaluation and backpropagation are greatly accelerated, as matrix inverses and multiplications become simple scalar operations for each axis.
Memory footprint is reduced, enabling explicit representation of large sets, e.g., tens of thousands of 4D Gaussians in SaRO-GS (Yan et al., 9 Dec 2024).
Explicit forms allow for analytic computation of all required spatial and temporal derivatives, eliminating the need for MLP-based feature extraction in the forward path (PINGS-X) (Jo et al., 14 Nov 2025). A plausible implication is that axes alignment broadly facilitates scalable, high-dimensional modeling, but may underfit sharply anisotropic phenomena unless further relaxed.

4. Density Adaptation: Gaussian Merging and Densification

Redundancy arises when multiple Gaussians cluster in regions of similar influence. PINGS-X implements a merging algorithm that computes cosine similarity between the influence vectors of Gaussians on training points, merges highly similar Gaussians, and replaces clusters with a single representative whose parameters are averaged over the cluster:

$\bmu_{\rm new} = \frac{1}{|C|}\sum_{i\in C} \bmu_i, \qquad \vh_{\rm new} = \frac{1}{|C|}\sum_{i\in C} \vh_i, \qquad \v v_{\rm new} = \widehat{\v v}(\bmu_{\rm new}),$

where $C$ denotes a connected component in the similarity graph (Jo et al., 14 Nov 2025). In dynamic rendering (SaRO-GS), adaptive optimization assigns per-Gaussian learning rates and densification thresholds based on the temporal "activity" integral, so fast-changing primitives are densified and adapted more aggressively (Yan et al., 9 Dec 2024). This actively controls computational complexity and enhances convergence.

5. Residual Coding and Scale-Aware Feature Fields

SaRO-GS introduces a scale-aware residual field to encode spatial and spatiotemporal features sensitive to the size and position of each Gaussian. Six hex-plane feature grids are used:

Three spatial planes ( $P_{xy}$ , $P_{xz}$ , $P_{yz}$ ) each constructed as MipMap pyramids over multiple scales.
Three spatiotemporal planes ( $P_{xt}$ , $P_{yt}$ , $P_{zt}$ ) implemented as flat grids. Given a Gaussian’s axis scales, features are interpolated from the corresponding planes at the appropriate scale, forming input codes for small attribute-modifying MLPs (Yan et al., 9 Dec 2024). This design aligns the field with the self-splitting and merging operations, ensuring parent–child consistency of features.

6. Implementation and Empirical Performance

PINGS-X eliminates MLPs from the forward path and directly trains the tuple $(\bmu_i, h_{ij}, \v v_i)$ using Adam, alternating between explicit Gaussian interpolation, loss minimization (data plus PDE residual), parameter update steps, and adaptive density manipulation every fixed interval. Hyperparameters typically involve hundreds to thousands of Gaussians and learning rates in the range $10^{-2}$ (Jo et al., 14 Nov 2025). SaRO-GS implements its feature pyramids and rasterization via PyTorch and nvdiffrast, maintaining buffers of tens of thousands of Gaussians (Yan et al., 9 Dec 2024). Gaussians with negligible temporal survival weight are culled before rendering, halving execution costs.

Performance benchmarks demonstrate marked gains relative to neural or PINN-based baselines. In synthetic CFD (lid-driven cavity), PINGS-X achieves 1.13% $L^2$ error in 21.9 min, compared to PINN's 12.20% in 51.4 min. In 4D flow MRI, PINGS-X cuts training times by over 10x and lowers errors substantially (e.g., 8.98% vs PINN's 25.93% for $8 \times$ upsampling) (Jo et al., 14 Nov 2025). In dynamic scene rendering, SaRO-GS sustains high PSNR and SSIM at 40–182 FPS with rapid training (45 min) (Yan et al., 9 Dec 2024).

Framework	Task/Scene	Error Metric	Time/FPS
PINGS-X	MRI ×8 upsample	Rel. $L^2$ : 8.98%	2.6 h
PINN (tanh)	MRI ×8 upsample	Rel. $L^2$ : 25.93%	30.1 h
SaRO-GS	D-NeRF (Mono)	PSNR: 36.13 dB	182 FPS

7. Limitations and Prospective Extensions

Several limitations are recognized. PINGS-X depends on manual tuning of initial Gaussian count and split/merge thresholds; automated density management remains future work. Axis alignment may underfit highly anisotropic local structures, and possible relaxation to mixtures or low-rank models could address this (Jo et al., 14 Nov 2025). In the graphics setting, SaRO-GS achieves temporal complexity management via scale-aware residual coding and density-adaptive schedules, but further optimization for multi-phase or highly nonlinear spatiotemporal phenomena is an open direction (Yan et al., 9 Dec 2024). Both frameworks are extensible to turbulent flows, boundary-layer modeling, or radiance field representations in view synthesis.

In summary, axes-aligned spatiotemporal Gaussian representations constitute an efficient and theoretically robust class of explicit models. Their diagonal parameterization and adaptive control mechanisms enable scalable, high-fidelity modeling of complex spatiotemporal fields, with applications ranging from physics-informed MRI super-resolution to dynamic scene rendering, as substantiated in PINGS-X (Jo et al., 14 Nov 2025) and SaRO-GS (Yan et al., 9 Dec 2024).

PDF Markdown Chat (Pro)

References (2)

PINGS-X: Physics-Informed Normalized Gaussian Splatting with Axes Alignment for Efficient Super-Resolution of 4D Flow MRI (2025)

4D Gaussian Splatting with Scale-aware Residual Field and Adaptive Optimization for Real-time Rendering of Temporally Complex Dynamic Scenes (2024)

Follow Topic

Get notified by email when new papers are published related to Axes-Aligned Spatiotemporal Gaussian Representations.