Papers
Topics
Authors
Recent
Search
2000 character limit reached

CMuST: Continuous Multi-task Spatio-Temporal

Updated 3 December 2025
  • CMuST is a continuous multi-task spatio-temporal learning framework that integrates new urban forecasting tasks without catastrophic forgetting.
  • It employs a Multi-dimensional Spatio-Temporal Interaction network using cross- and self-attention mechanisms to capture spatial, temporal, and contextual interactions.
  • The framework uses a rolling adaptation training scheme to accumulate stable weights and enable rapid cold-start adaptation in dynamic urban systems.

The Continuous Multi-task Spatio-Temporal (CMuST) Learning Framework is designed to address the challenges of urban intelligence by enabling robust and continuous multi-task forecasting on dynamic, multi-sourced urban spatiotemporal data. CMuST reformulates traditional single-domain spatiotemporal learning into a coordinated, multi-dimensional, and multi-task paradigm that explicitly models interdependencies across spatial, temporal, and contextual facets, while supporting continual assimilation of new tasks and domains.

1. Motivation and Objectives

Urban systems present high-dimensional data streams—such as traffic volume, speed, crowd flow, and risk indices—that inherently evolve due to environmental shifts, infrastructure changes, and emerging events. Conventional spatiotemporal models typically operate with the assumption of an i.i.d. data distribution between training and testing, which causes generalization failures when faced with distribution shifts, task proliferation, or data sparsity. CMuST’s key objectives are:

  • Continuous learning: Seamlessly integrate new tasks without catastrophic forgetting or retraining from scratch.
  • Multi-dimensional interaction modeling: Capture explicit cross-interactions among context and observations, and self-interactions within spatial and temporal domains, leading to discriminative representations.
  • Multi-task cooperation: Extract and leverage commonalities across tasks for joint performance enhancement while retaining task-specific personalization (Yi et al., 2024).

2. Multi-dimensional Spatio-Temporal Interaction Network (MSTI)

The MSTI network underpins CMuST’s multi-dimensional representation capacity by deploying a structured sequence of embedding, cross-attention, and self-attention blocks.

Data Representation

  • Observational features Xobs∈RT×N×CX_{obs} \in \mathbb{R}^{T \times N \times C} are processed via an MLP into Eobs∈RT×N×dobsE_{obs} \in \mathbb{R}^{T \times N \times d_{obs}}.
  • Spatial indicators XsX_s and temporal indicators XtX_t are mapped into EsE_s and EtE_t via dedicated nonlinear layers.
  • Each task TkT_k receives a prompt vector P(Tk)∈R1×1×dpP^{(T_k)} \in \mathbb{R}^{1 \times 1 \times d_p} constructed from summarized task history.
  • Overall input: H(Tk)=[Eobs∥Es∥Et∥P(Tk)]∈RT×N×dhH^{(T_k)} = [E_{obs} \| E_s \| E_t \| P^{(T_k)}] \in \mathbb{R}^{T \times N \times d_h}.

Cross-Attention and Self-Attention Mechanisms

  • Spatial–Context Cross-Interaction (SCCI): Two multi-head cross-attention modules are alternately applied—first querying spatial features with observational keys/values and vice versa. The outputs are fused via residual connections, feed-forward networks, and layer normalization.
  • Temporal–Context Cross-Interaction (TCCI): Similar cross-attention is executed across temporal dimensions, with sinusoidal positional encoding introduced.
  • Self-interactions: Temporal self-attention (TSI) attends over sequences of time, while spatial self-attention (SSI) operates over the nodes.
  • Fusion and Output: The resultant spatial, temporal, and observational tensors are aggregated using 1×11 \times 1 convolutions and further processed with task prompts for final multi-task predictions Y^\hat{Y}, trained end-to-end with a Huber loss objective (Yi et al., 2024).

3. Rolling Adaptation (RoAda) Training Scheme

RoAda is a continual learning algorithm within CMuST, engineered to address task uniqueness retention, stable commonality accumulation, and cold-start adaptation.

Task Summarization as Prompts

For each task TkT_k, periodic data summaries Xsamp(Tk)X_{samp}^{(T_k)} are extracted and encoded via an autoencoder bottleneck; the output is projected to form the prompt P(Tk)P^{(T_k)} representing the temporal signature of that task.

Stable Weight Accumulation

  • After the convergence of initial task training (T1T_1), the network parameters are adapted for subsequent tasks with prompts and the weight trajectories are tracked.
  • Weight elements with low variance across tasks are labeled as WstableW_{stable} and frozen, while high-variance elements (WdynamicW_{dynamic}) are re-initialized per new task.
  • This process is iterated through all tasks, after which the accumulated stable weights form the multi-task knowledge core.

Task-specific Refinement

For final optimization, only WdynamicW_{dynamic} parameters are fine-tuned on each task with its prompt, preserving the shared backbone (Yi et al., 2024).

4. Evaluation Protocols and Benchmark Datasets

CMuST was tested across urban spatiotemporal benchmarks from three cities, comprising multiple forecasting tasks with varied granularity and temporal resolution.

City Nodes/Grid Interval Tasks
NYC 206 30 min Crowd In, Crowd Out, Taxi Pick, Taxi Drop
SIP 108 5 min Traffic Flow, Traffic Speed
Chicago 220 30 min Taxi Pick, Taxi Drop, Risk

Protocols included standard multi-step forecasting with splits (7:1:2), few-shot streaming (spatial/temporal sparsity), and domain adaptation under task cold-start (Yi et al., 2024).

5. Quantitative Performance and Robustness

CMuST delivered high-fidelity forecasting on all benchmarks:

  • Main results: Achieved or closely matched best MAE and MAPE scores across tasks, e.g., NYC Crowd-Out (MAE=12.91\mathrm{MAE}=12.91, MAPE=0.4265\mathrm{MAPE}=0.4265), SIP Speed (MAE=0.6843\mathrm{MAE}=0.6843, MAPE=0.2585\mathrm{MAPE}=0.2585), outperforming single-task baselines or matching them.
  • Data sparsity robustness: Maintained lower error under both spatial (25% nodes, MAE=12.16\mathrm{MAE}=12.16, MAPE=0.4506\mathrm{MAPE}=0.4506) and temporal sparsity, compared to GWNET and PromptST.
  • Cold-start adaptation: Demonstrated faster convergence and reduced error (e.g., NYC Taxi Pick, MAE=6.803\mathrm{MAE}=6.803, MAPE=0.3225\mathrm{MAPE}=0.3225) when adding new tasks versus retraining from scratch (Yi et al., 2024).

6. Scope, Limitations, and Future Directions

The present deployment of CMuST covers urban transportation domains within single urban systems. Extension opportunities include:

  • Integration of additional urban modalities (energy, air quality, water, etc.).
  • Enhancement of scalability for high-task cardinality and large-scale graphs through efficient attention mechanisms.
  • Transfer learning across genuinely open or cross-city systems is an active research direction.

A plausible implication is that adopting CMuST for broader urban environments will require both architectural generalization and computational augmentation for heterogeneous, real-time urban data streams (Yi et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CMuST.