Data-driven Spatiotemporal Encoding (DaSE)

Updated 26 November 2025

Data-driven Spatiotemporal Encoding (DaSE) is a machine learning framework that separates spatial content from temporal dynamics using structured latent codes.
It employs methods like PDE-inspired encoders, linear dynamical systems, and autoencoder variants to improve forecasting, reconstruction, and interpretability.
DaSE is applied in geospatial analytics, biomedicine, and physical simulations, offering robust insights and enhanced performance in complex dynamical tasks.

Data-driven Spatiotemporal Encoding (DaSE) refers to a class of machine learning frameworks and representations designed to extract, compress, and disentangle spatial and temporal components from high-dimensional, time-evolving data. DaSE architectures are widely adopted across physical sciences, geospatial analytics, biomedicine, and video modeling, with methodological foundations in classical separation of variables, statistical modeling, frequency and spectral decompositions, deep neural manifold learning, and nonlinear dynamics. By learning structured latent spaces that reflect the underlying spatial and temporal regularities, DaSE systems enable forecasting, classification, parameter estimation, and generative modeling of complex dynamical phenomena with high fidelity and interpretability.

1. Core Principles of Spatiotemporal Disentanglement

The fundamental objective of DaSE is to learn low-dimensional, interpretable codes that separate spatially-invariant “content” from temporally-varying “dynamics.” This draws direct inspiration from the classical PDE technique of variable separation, where solutions are represented as $u(x,t) = \phi(x)\psi(t)$ or more generally $u(x, t) = U(\xi(\phi(x), \psi(t)))$ , splitting the original problem into spatial and temporal subproblems (Donà et al., 2020). In machine learning terms, DaSE postulates that for any trajectory $v_t$ , one can infer a time-invariant spatial code $S$ (content) and a time-dependent dynamical code $T_t$ (motion):

$v_t \approx D(S, T_t)$

where $D$ is a nonlinear decoder (e.g., MLP, U-Net) (Donà et al., 2020, Xu et al., 2019). The dynamical evolution of $T_t$ may be governed by an ODE, a Linear Dynamical System (LDS), or a neural temporal model, and the spatial code $S$ is enforced or regularized to remain constant over time. This explicit disentanglement confers interpretability, facilitates long-term forecasting, and mitigates the “shortcut learning” often observed in black-box recurrent architectures.

2. Representative DaSE Architectures and Learning Frameworks

DaSE encompasses a broad family of architectures, unified by their intent to factor spatiotemporal data into separate latent spaces and by their reliance on data-driven, rather than hand-crafted, representations.

PDE-Inspired Encoders: A direct instantiation (Donà et al., 2020) employs two encoders $E_S$ and $E_T$ for spatial and temporal codes, a learnable ODE on $T_t$ , and a decoder $D$ . The training objective is a weighted combination of prediction loss, latent alignment, and regularization penalties that enforce invariance and disentanglement.
Statistical Shape Modeling (LDS): In anatomical motion, a linear Gaussian state-space model is used to represent evolving shapes (Adams et al., 2022). Given observed geometric landmark vectors $x_{n, t}$ , a latent code $s_{n, t}$ follows

$\begin{align*} s_{n, t} &= A_t s_{n, t-1} + \varepsilon^s_{n, t} \ x_{n, t} &= W_t s_{n, t} + \varepsilon^x_{n, t} \end{align*}$

with EM-based parameter learning alternating with landmark optimization.

Autoencoder Variants: Multi-level convolutional autoencoders compress spatial fields via ConvNet encoders, use temporal convolutional autoencoders (TCAE) with dilated convolutions for sequence modeling, and couple to parameter networks for inference over governing parameters (Xu et al., 2019).
Spectral and Frequency Domain Approaches: DaSE can be realized by encoding time series into Fourier domains, compressing spectrograms via contractive autoencoders, and rasterizing embeddings for downstream tasks (Cao et al., 2023, Cao et al., 2023).
Dynamic Mode Decomposition (DMD): Recent methods use DMD to derive spectral temporal embeddings from the data itself, constructing Vandermonde-based embeddings from dominant modes, and inject these directly into forecasting architectures (Kong et al., 1 Jun 2025).

These architectures are often trained end-to-end using a combination of data-fidelity, reconstruction, regularization, and, where present, ODE- or LDS-alignment losses.

3. Specialized Approaches: Physics Encoding and Phase Reduction

A defining characteristic of certain DaSE variants is the explicit incorporation of physical structure or control invariances:

Hard Physics Encoding: The PeRCNN framework (Rao et al., 2021) forcibly encodes known PDE operators as convolutional kernels (“highway convolutions”) and boundary/initial conditions through architectural constraints rather than loss penalties. Nonlinearity is realized by elementwise products of conv-filter outputs (“Π-block”). This coercive approach is distinct from penalty-based physics-informed learning and demonstrates marked robustness and generalizability in high-noise or low-data regimes.
Phase Autoencoders for Synchronization: For reaction-diffusion and rhythmic systems, DaSE can adopt an autoencoder mapping from high-dimensional fields to low-dimensional phase–amplitude codes, with loss functions that enforce rigid rotation on phase variables and exponential decay on amplitudes. Phase-based latent spaces enable manipulation and synchronization by tangentially driving the system in the learned latent manifold (Yawata et al., 15 Jun 2025).

4. Applications in Physical, Biological, and Geospatial Domains

DaSE frameworks have demonstrated efficacy across diverse domains:

Physical Systems: PDE-driven DaSE models have been validated on synthetic wave equations, sea-surface temperature (SST) datasets, crowd flow analytics (TaxiBJ), and canonical video benchmarks (e.g., Moving MNIST, KTH) (Donà et al., 2020, Xu et al., 2019). Hard-encoded physics models provide robust long-range forecasting in fluid and reaction-diffusion systems even under severe data scarcity (Rao et al., 2021).
Biomedical Imaging: In neuroimaging, DaSE with Approximate Rank Pooling (ARP) and progressive curriculum learning achieves state-of-the-art performance in Alzheimer’s classification and brain segmentation tasks, demonstrating both parameter efficiency and interpretability relative to large pretrained models (Zhou et al., 19 Nov 2025). In cardiac shape modeling, LDS-based DaSE outperforms image-based approaches by 10–15% in generalization/specificity, accurately capturing organ dynamics (Adams et al., 2022).
Geospatial Analytics: Self-supervised temporal embeddings via contractive autoencoders of DFT spectrograms deliver improved precision and recall for land-use segmentation in urban-to-rural settings. Fusion with multispectral, SAR, or road-graph modalities facilitates multimodal learning in semantic segmentation and classification (Cao et al., 2023, Cao et al., 2023).
Spatiotemporal Forecasting: DMD-based embeddings address the limitations of fixed calendar-based or sinusoidal time encodings, improving long-horizon forecasts for urban mobility and climate tasks by automatically extracting all dominant seasonalities from observed data (Kong et al., 1 Jun 2025).

5. Quantitative Performance and Comparative Insights

Extensive benchmarks establish that DaSE methods, by enforcing explicit separation, yield superior forecasting, reconstruction, and generalization:

Domain	Baseline	DaSE Variant	Score Type	Performance
WaveEq (PDE)	No $S$ ablation	PDE-inspired DaSE	MSE	$O(10^{-4})$ (orders above baselines) (Donà et al., 2020)
SST (Climate)	PKnl, PhyDNet	PDE-inspired DaSE	MSE/SSIM	Low MSE, high SSIM; lower error by margin (Donà et al., 2020)
Moving MNIST	MIM	PDE-inspired DaSE	PSNR	$16.5$ (vs $13$–$14$) (Donà et al., 2020)
Cardiac SSM	Image-based atlas	LDS-based DaSE	Gen./Spec. Error	$10$–$15$\% improvement (Adams et al., 2022)
Urban Mobility Forecast	Graph WaveNet+DoW	DMD-based DaSE	RMSE (12-step)	$93.62 \to 87.79$ (6.0% gain) (Kong et al., 1 Jun 2025)
MRI Classification	DAMNet, SwinT	DCL-SE DaSE	Accuracy/AUC	$99.94\%$ accuracy, AUC $0.97+$ (Zhou et al., 19 Nov 2025)
Geospatial Segmentation	Raw DFT	Contractive AE DaSE	F1 / AP	$10$–$15$pp higher than raw counts (Cao et al., 2023, Cao et al., 2023)

Ablations confirm necessity of separate code structure, the latent dynamic module, and, where present, unit-norm or contractive penalties for maintaining interpretability and performance.

6. Limitations, Extensions, and Open Directions

While DaSE frameworks demonstrate broad empirical success, certain challenges and areas for extension persist:

Model Assumptions: Effective separation assumes phenomena are at least approximately decomposable; in highly entangled or stochastic settings, disentanglement may break down, necessitating additional mechanisms (e.g., VAEs) (Donà et al., 2020).
Deterministic vs. Stochastic Dynamics: Natural video or medical trajectories may contain irreducible randomness, which current deterministic frameworks do not capture.
Partial Observability and Data Scarcity: Many models assume dense or accurately synchronized measurements, whereas partial or irregular sampling remains a challenge.
Physical Priors and Hybridization: Future work includes injecting explicit differential operators, adopting multi-scale separations, and combining hard and soft physics constraints (Rao et al., 2021).
Generality and Modularity: Methods such as DMD-based DaSE are model- and domain-agnostic and can be incorporated into any architecture with time covariates (Kong et al., 1 Jun 2025).
Computational Overhead: Some methods involve large SVDs or iterative EM steps, though most are scalable on moderate-length sequences or grids.

DaSE lies at the intersection of reduced-order modeling, unsupervised manifold learning, and structured neural sequence modeling. Compared to adversarially-constrained or VAE-style disentanglement, PDE- and LDS-inspired DaSE leverages explicit dynamical structure, enabling transparent factors and consistent information flow over time (Donà et al., 2020, Adams et al., 2022). DMD-based time embeddings capture empirically all data-driven periodicities, outperforming sinusoidal or calendar-based time features (Kong et al., 1 Jun 2025). The use of contractive penalties or ODE/LDS architectures imparts additional robustness and interpretability.

Principled inductive bias (mirroring separation of variables or enforcing phase-invariance) is central to the generalization ability of DaSE. Lipschitz continuous ODE flows preserve information-theoretic separation over time, while hard-encoding of physics or group convolutions guarantees stability, interpretability, and efficiency in high-dimensional and noisy domains (Rao et al., 2021, Zhou et al., 19 Nov 2025).

In summary, Data-driven Spatiotemporal Encoding provides a rigorous, flexible, and empirically validated paradigm for learning low-dimensional dynamic models from complex temporal-spatial data, with broad impact in physical simulation, biomedical analysis, geospatial classification, and long-horizon forecasting (Donà et al., 2020, Adams et al., 2022, Zhou et al., 19 Nov 2025, Cao et al., 2023, Rao et al., 2021, Kong et al., 1 Jun 2025, Xu et al., 2019).