DK-STN: Domain Knowledge Spatio-Temporal Models

Updated 29 December 2025

DK-STN is a modeling approach that explicitly integrates human-derived domain knowledge, such as epidemic spread laws and semantic priors, into deep spatio-temporal networks.
It employs methods like physics-informed modules, structural side-constraints, and dynamic graph convolutions to fuse multi-modal signals and enforce domain-specific rules.
Empirical results show that DK-STN architectures achieve significant improvements in accuracy, with 20–40% MAE reductions and enhanced interpretability over purely data-driven models.

A Domain Knowledge Embedded Spatio-Temporal Network (DK-STN) is a class of models that explicitly incorporates structured, human-derived domain knowledge into the architecture, learning objective, or feature representations of deep spatio-temporal networks. DK-STNs have been successfully instantiated in diverse application areas including epidemic forecasting, brain activity decoding, traffic prediction, climate oscillation forecasting, and video understanding. By embedding domain-specific priors—such as laws of epidemic spread, semantic structure of tasks, or regional-geographic signals—these architectures achieve improved generalization, interpretability, and robustness over purely data-driven models.

1. Core Approaches to Domain Knowledge Embedding

DK-STN frameworks encompass several mechanisms that operationalize domain knowledge for spatio-temporal modeling:

Physics-Informed Modules: Compartmental or dynamical models (e.g., single-patch and metapopulation SIR in epidemic modeling) are parameterized via neural networks and integrated into the deep architecture and loss function (Mao et al., 2023).
Structural Side-Constraints: In neuroimaging, category priors or semantic similarity graphs regularize decomposition or latent representations (e.g., via Laplacian constraints in tensor decompositions) (Liu et al., 2022).
Multi-Modality Fusion and Feature Ingestion: Regional static and dynamic features such as POIs, satellite images, and real-time sensor traces are embedded as explicit inputs, and network modules are designed to mediate knowledge transfer from region- to point-level predictions (Han et al., 2024).
Domain-Driven Data Augmentation and Preprocessing: Model training utilizes augmented or pre-filtered data conforming to known domain structure (e.g., harmonic anomaly extraction for MJO, mixed NWP assimilation) (Li et al., 22 Dec 2025).
Statistical Priors in Attention Mechanisms: Spatial co-occurrence and temporal transition matrices encode prior relational knowledge in video scene graph transformers, directly modulating cross-attention computations (Pu et al., 2023).

2. Representative DK-STN Architectures

Table: DK-STN Instantiations Across Domains

Domain	Model Name (Paper)	Knowledge Embedded
Epidemics	MPSTAN (Mao et al., 2023)	Metapopulation SIR, interaction-adaptive parameters
Brain Decoding	STN (Liu et al., 2022)	Stimulus semantic kernel, tensor Laplacian constraint
Traffic	DK-STN (Han et al., 2024)	POIs, satellite, LTE traces, bipartite region mapping
Climate (MJO)	DK-STN (Li et al., 22 Dec 2025)	Fourier/ENSO anomaly filtering, NWP data assimilation
Video SGG	STKET (Pu et al., 2023)	Spatial co-occurrence, temporal transitions in attention

These models illustrate the breadth of DK-STN: from neural ODE and graph models with physics priors to tensor factorization integrated with semantic constraints and transformers with statistical relational knowledge.

3. Detailed Workflow and Mathematical Formulation

Epidemic Forecasting via Metapopulation Dynamics

DK-STN as realized by MPSTAN (Mao et al., 2023) formulates epidemic spread over a graph $G=(V,E)$ with $N$ patches. The SIR-based knowledge modules operate as:

Single-patch SIR:

$\begin{aligned} \frac{dS_i}{dt} &= -\beta_i I_i \frac{S_i}{N_i} \ \frac{dI_i}{dt} &= \beta_i I_i \frac{S_i}{N_i} - \gamma_i I_i \ \frac{dR_i}{dt} &= \gamma_i I_i \end{aligned}$

Metapopulation SIR: Adds mobility-driven terms

$\frac{dX_i}{dt} = \text{intra-patch terms} - D_i^X X_i + \sum_{j \in \mathcal{N}_i} P(j|i) D_j^X X_j\quad \big(X\in\{S,I,R\}\big)$

All intra- and inter-patch parameters are generated by neural submodules without requiring exogenous mobility data.

DK-STN’s recurrent cell couples a spatio-temporal GRU and multi-head GAT, projecting outputs to learn both (a) purely deep predictions and (b) MP-SIR rollouts, combined in a hybrid loss: $L(\Theta)=\frac{1}{N T'} \sum_{i=1}^N \sum_{\tau=1}^{T'} \left|\hat Y_{i\tau}^{st} - Y_{i\tau}\right| + \left|\hat Y_{i\tau}^{phy} - Y_{i\tau}\right|$ Embedding domain knowledge in both structure and loss yields strict improvements in MAE (20–40% over baselines), stability, and long-horizon performance.

Stimulus-Constrained Tensor Factorization for Brain Decoding

In (Liu et al., 2022), DK-STN factorizes third-order tensors $X \in \mathbb{R}^{m \times m \times n}$ (brain region $\times$ region $\times$ trial):

Objective:

$\min_{B,S,W}\;\;\| X - C\times_1 B\times_2 B\times_3 S\|_F^2 + \alpha\,\operatorname{tr}(S^T L_Z S) + \beta\|DSW-Y\|_F^2 + \gamma\|W\|_F^2\ \text{subject to } S^T S = I$

where $L_Z$ is the Laplacian of the semantic similarity kernel among stimuli.

Domain knowledge: semantic pairwise similarity and supervised labels jointly regularize the temporal factors.
Optimization: Alternating update of $N$ 0, $N$ 1, $N$ 2, with gradient on $N$ 3 incorporating both side-information and orthogonality constraints.

There are significant accuracy gains (up to +21.9% over baselines) on MEG and fMRI decoding tasks, especially with high tensor rank and strong side-constraint regularization.

4. Knowledge Propagation and Spatio-Temporal Attention

In road traffic forecasting (Han et al., 2024), DK-STN explicitly propagates region-level knowledge to road-level prediction via:

Dynamic Graph Convolution: Learns region adjacency by joint history correlation and spatial distance, yielding an adaptive convolution operator on region grids.
Temporal Attention: Self-attention over region/road histories, capturing dynamic and seasonal effects.
Bipartite Spatial Transform Attention: A masked attention mechanism maps region embeddings with static and dynamic domain features (POI, satellite, LTE) onto the road graph, enforcing spatial locality via a learnable Gaussian mask.
Road-level Stack: Combines region-informed and road-intrinsic attention outputs to generate final predictions.

Ablation studies confirm each modality’s quantitative contribution; removal of LTE traces or POI/satellite features degrades MAE by up to 5%, while the bipartite masking is essential for enforcing correct spatial scope.

5. Knowledge-Guided Preprocessing and Data Selection

The MJO forecasting DK-STN (Li et al., 22 Dec 2025) incorporates domain knowledge both in data preparation and deep model supervision:

Training Data Augmentation: Mixture of observational records and NWP hindcasts, increasing sample diversity.
Physically-Motivated Preprocessing: Harmonic filtering (Fourier modes 0–3), 120-day running mean subtraction for non-stationarity/ENSO removal, followed by batch normalization.
Deep Stack: Residual CNNs for local spatial feature extraction, followed by temporal attention in a sequence-to-sequence encoder-decoder with LSTM and self-attention to produce MJO index predictions.
Losses: Multi-channel MSE on the two RMM indices; ablation shows additive gains from each knowledge module (+10 skill days when both are active).

DK-STN delivers accurate 28-day MJO forecast skill, matching ECMWF performance with orders of magnitude less compute and enhanced seasonal stability.

6. Statistical Knowledge Injection in Attention (Video SGG Example)

The STKET architecture for video scene graphs (Pu et al., 2023) injects spatial and temporal statistical priors directly into transformer attention:

Spatial Prior: Co-occurrence statistics $N$ 4 for each object pair and predicate class, embedded via MLP.
Temporal Prior: Transition matrices $N$ 5 encoding the probabilities of relational state transitions for object pairs, also embedded.
Multi-Head Cross-Attention: These knowledge embeddings are added to Q/K projections in SKEL and TKEL, modulating attention over both spatial and temporal axes.
Aggregation: Fused spatial-temporal features are self-attended across recent frames to form final scene graph predictions.

Experimental benchmarks show +8.1% mR@50 improvement on rare predicate classes over standard transformers, with per-component ablation confirming the incremental utility of each knowledge-embedded layer.

7. Empirical Efficacy and Practical Considerations

DK-STN designs consistently yield:

Improved accuracy and stability: Documented 10–40% relative MAE reductions in forecasting, significant gains in classification/recall for indirect supervision (Mao et al., 2023, Liu et al., 2022).
Superior long-horizon or rare-event performance: Long-term forecast skill, out-of-distribution robustness, and tail predicate recovery all improve with domain knowledge embedding (Mao et al., 2023, Pu et al., 2023, Li et al., 22 Dec 2025).
Interpretability: Explicit domain-guided modules enable model diagnosis, parameter visualization, and post hoc understanding not available in black-box baselines (Mao et al., 2023).
Efficiency: Knowledge-guided architectures can reduce necessary training data volume, accelerate convergence, and in climate applications, deliver 1–2 second inference for problems where NWP requires hours (Li et al., 22 Dec 2025).

A salient implication is that embedding domain knowledge at multiple architectural junctures—feature space, model structure, and loss—yields consistently superior results to single-point or absent knowledge approaches.

References

(Mao et al., 2023): "MPSTAN: Metapopulation-based Spatio-Temporal Attention Network for Epidemic Forecasting"
(Liu et al., 2022): "STN: a new tensor network method to identify stimulus category from brain activity pattern"
(Han et al., 2024): "Spatio-Temporal Road Traffic Prediction using Real-time Regional Knowledge"
(Li et al., 22 Dec 2025): "DK-STN: A Domain Knowledge Embedded Spatio-Temporal Network Model for MJO Forecast"
(Pu et al., 2023): "Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation"