PAST: Primary-Auxiliary Spatio-Temporal Network

Updated 24 November 2025

The paper demonstrates that decomposing spatio-temporal data into primary internal dependencies and auxiliary contextual influences enables robust imputation under diverse missingness regimes.
It leverages a Graph-Integrated Module to learn dynamic node relations and a Cross-Gated Module to fuse external timestamps and node metadata effectively.
Empirical results reveal up to 26.2% RMSE and 31.6% MAE improvements over state-of-the-art baselines, highlighting its practical significance in traffic data reconstruction.

The Primary-Auxiliary Spatio-Temporal Network (PAST) is a neural framework designed for traffic time series imputation under severe and heterogeneous missing data conditions. Unlike conventional methods that separately or statically model spatial and temporal patterns, PAST explicitly decomposes dependencies into primary patterns—internal relationships between time series nodes—and auxiliary patterns—influences from external features such as timestamps or node metadata. Its architecture integrates a Graph-Integrated Module (@@@@3@@@@) for learning primary spatio-temporal dependencies and a Cross-Gated Module (CGM) for embedding and modulating auxiliary contextual information, enabling robust estimation of missing values across random, block, and fiber-type missingness regimes (Hu et al., 17 Nov 2025).

1. Conceptual Foundations

PAST formalizes the imputation challenge by distinguishing between primary and auxiliary spatio-temporal patterns. Primary patterns refer to intrinsic sensor or node relations inferred directly from available measurements, while auxiliary patterns encode exogenous covariates, including temporal (e.g., time-of-day, day-of-week) or static node characteristics. This dichotomy addresses limitations in prior imputation models that use static, hand-crafted graphs or simply concatenate auxiliary information, which impedes adaptability to dynamic and irregular missingness.

The model is structured into two interacting modules:

The Graph-Integrated Module (GIM) is responsible for dynamic inference of node-to-node relations and propagation of hidden representations capturing primary patterns.
The Cross-Gated Module (CGM) injects auxiliary features via learnable gating and bidirectional fusion with the shared representation.

Configuring the model in this manner allows PAST to adaptively reconstruct missing data even when intrinsic or extrinsic dependencies are inconsistently observable in the input.

2. Graph-Integrated Module (GIM): Dynamic Spatio-Temporal Modeling

GIM operationalizes primary pattern extraction by constructing a time-varying, fully-connected node adjacency matrix $A_t \in \mathbb{R}^{N \times N}$ at each time step, computed directly from the shared hidden state $H_{t-1}$ . This is achieved using a learnable projection $W_E \in \mathbb{R}^{d \times d'}$ : $Z_t = H_{t-1} W_E$

$e_{ij}^t = -\frac{ \| Z_t^i - Z_t^j \|_2^2 }{\tau }$

$A_t[i, j] = \text{softmax}_j( e_{ij}^t )$

where $\tau$ is a temperature parameter. This adjacency is interpreted as encoding the current, context-dependent affinities between all node pairs.

An alternative attention-based variant employs a LeakyReLU attention: $s_{ij}^t = \text{LeakyReLU}(a^T [W_1 h_{t-1}^i \, \| \, W_1 h_{t-1}^j])$

$A_t[i, j] = \text{softmax}_j(s_{ij}^t)$

Crucially, GIM does not assume prior spatial proximity but learns emergent, possibly non-local dependencies at each time step.

3. Interval-Aware Dropout: Regularization under Missingness

To prevent overfitting to direct node connections and to model the disruptions induced by missing data, GIM integrates an interval-aware dropout mechanism that masks edges in $A_t$ probabilistically based on the history of observed co-occurrence for each node pair. For each $(i, j)$ pair: $p_{ij}^t = p_0 \cdot (1 - \exp(-\Delta_{ij}^t / \lambda))$

$M_t[i, j] \sim \text{Bernoulli}(1 - p_{ij}^t)$

$\widetilde{A}_t = A_t \circ M_t$

where $\Delta_{ij}^t$ is the time since the last observed co-occurrence, $p_0$ is a base dropout rate, and $\lambda$ controls decay. This mechanism compels the model to utilize multi-hop, indirect interactions—especially when strong direct evidence is unavailable.

In ablation experiments, removing dynamic adjacency (switching to static/geographic $\mathcal{S}$ ) increased RMSE by 9.4% and MAE by 8.7% under 50% random missing; removing interval-aware dropout increased RMSE by 4.1% (Hu et al., 17 Nov 2025).

4. Multi-Order Graph Convolutions

The information propagation backbone of GIM leverages layered graph convolutions of up to $K$ orders: $A_t^{(0)} = I$

$A_t^{(k)} = D_t^{-1} \widetilde{A}_t \, A_t^{(k-1)}, \quad k \geq 1$

where $D_t$ is the diagonal node degree matrix. The $(l+1)$ -th layer update is: $H_t^{(l+1)} = \sigma\left(\sum_{k=0}^K A_t^{(k)} H_t^{(l)} W_k\right)$ Here, $K$ controls the receptive field, enabling non-local aggregation as $K$ increases. Ablation results demonstrate that reducing $K$ from 3 to 1 increased RMSE by 6.8%, emphasizing the importance of multi-hop message passing in settings with extensive missingness.

5. Interaction with the Cross-Gated Module (CGM) and Fusion Mechanism

At each time $t$ , GIM generates an updated hidden state $H_t^{(G)}$ encoding primary dependencies; CGM computes $H_t^{(C)}$ based on auxiliary inputs. These are fused via a gated network: $H_t = G_{\text{fusion}}( H_t^{(G)}, H_t^{(C)} )$ $G_{fusion}$ consists of sigmoid gates and linear projections, determining for each node and timepoint the mixing proportion of primary and auxiliary signals. This fusion enables the unified hidden state $H_t$ to simultaneously propagate internal and external information through backpropagation.

6. Self-Supervised Training and View Ensemble Strategy

PAST is trained via a masked reconstruction objective: at each time $t$ , a random subset of the traffic matrix $X_t$ is masked, and the model seeks to reconstruct these entries: $\mathcal{L}_{\text{GIM}} = \| (X_t - \hat{X}_t) \odot \text{Mask}_{rec} \|_F^2$ where $\text{Mask}_{rec}[i, t]$ indicates unmasked entries.

Each training minibatch ensembles $V$ different random masking patterns, forwarding $V$ versions of input and averaging reconstructions: $\bar{X}_t = \frac{1}{V} \sum_{v=1}^V \hat{X}_t^{(v)}$ Loss is computed relative to $\bar{X}_t$ to mitigate mask variance and promote robustness.

7. Hyperparameters, Computational Aspects, and Empirical Performance

Notable hyperparameters and implementation settings in PAST include:

Number of GIM layers $L = 2$
Max graph order $K = 2$ or 3
Hidden dimensionality $d = 64$
Interval-aware dropout base rate $p_0 = 0.2$ , decay $\lambda = 10$
Number of masked views per minibatch $V = 4$

Computational complexity per step is $O(N^2 d')$ for adjacency calculation and $O(K E d)$ for graph convolutions (with $E \approx N^2$ in dense graphs but reduced after sparsification). Sparsification is achieved by keeping only the top- $r$ edges per node, and high-order normalized adjacencies are precomputed by recursive sparse-matrix multiplication.

Empirically, PAST outperforms seven state-of-the-art baselines by up to 26.2% in RMSE and 31.6% in MAE across three datasets and 27 missing data conditions, especially excelling under large-scale and non-random missing regimes (Hu et al., 17 Nov 2025). Its dynamic graph construction, interval-aware dropout, and multi-order propagation enable it to adaptively recover missing values where static or shallow models fail.

8. Comparative Analysis and Practical Significance

The central innovation of PAST lies in the synergy between dynamic, order- $K$ graph modeling for primary dependencies and learnable cross-gating with auxiliary signals. Unlike fixed graph approaches, PAST’s GIM adapts to shifting spatial-temporal correlations and routes information via alternative paths in the presence of missingness. Interval-aware dropout further regularizes the propagation process, forcing the model to plan for multi-hop contingencies. GIM’s multi-order convolutions grant a tunable receptive field, crucial for long-range dependency modeling under extended block or fiber missing.

A plausible implication is that the primary-auxiliary decomposition in PAST may generalize to other temporal graph imputation settings where external contextual information is available and internal dependencies are non-stationary across time. This architectural separation and integration mechanism addresses the frequently observed brittleness of previous methods when confronted with complex missingness patterns in large spatio-temporal systems (Hu et al., 17 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

PAST: A Primary-Auxiliary Spatio-Temporal Network for Traffic Time Series Imputation (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Primary-Auxiliary Spatio-Temporal Network (PAST).